Internet DRAFT - draft-deng-ppsp-bfbitmap
draft-deng-ppsp-bfbitmap
PPSP L. Deng
Internet-Draft J. Peng
Intended status: Standards Track China Mobile
Expires: August 16, 2014 Y. Zhang
CoolPad
Y. Huang
Huawei
February 12, 2014
Efficient Chunk Availability Compression for PPSP
draft-deng-ppsp-bfbitmap-05.txt
Abstract
Bloom filters are proposed to be used in compressing chunk
availability information, periodically exchanged between peers and
the tracker through PPSP-TP and PPSPP protocols, to reduce relevant
cost and enhance scalability.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on August 16, 2014.
Copyright Notice
Copyright (c) 2014 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
Deng, et al. Expires August 16, 2014 [Page 1]
Internet-DraftEfficient Chunk Availability Compression for February 2014
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Background on Bloom Filter . . . . . . . . . . . . . . . . . 3
3. BF-based Chunk Availability Exchange . . . . . . . . . . . . 5
3.1. A non-BF PPSP session . . . . . . . . . . . . . . . . . . 5
3.2. A PPSP Session with BF-bitmaps . . . . . . . . . . . . . 7
3.3. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.3.1. Base TP protocol . . . . . . . . . . . . . . . . . . 9
3.3.2. Extended TP protocol . . . . . . . . . . . . . . . . 9
3.4. Peer protocol . . . . . . . . . . . . . . . . . . . . . . 10
4. Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . 12
5. Security Considerations . . . . . . . . . . . . . . . . . . . 12
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12
7. References . . . . . . . . . . . . . . . . . . . . . . . . . 13
7.1. Normative References . . . . . . . . . . . . . . . . . . 13
7.2. Informative References . . . . . . . . . . . . . . . . . 13
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 14
1. Introduction
As it is pointed out by [I-D.ietf-ppsp-problem-statement], current
P2P streaming practices often use a "bitmap" message to exchange
chunk availability. The message is of kilobytes in size and
exchanged frequently, e.g., an interval of several seconds or less.
To begin with, in a mobile environment with scarce bandwidth, the
message size may need to be shortened or it may require more
efficient methods for expressing and distributing chunk availability
information, which is different from wire-line P2P streaming.
Even in a wire-line P2P streaming application, frequent exchange of
large volume of bitmap information, is among the key factors that set
a limit to the system's efficiency and scalability [P2P-limit].
Therefore, the following requirements for PPSP protocols in terms of
chunk availability exchange are stated in
[I-D.ietf-ppsp-problem-statement] :
PPSP.TP.REQ-3: The tracker protocol MUST take the frequency of
messages and efficient use of bandwidth into consideration, when
communicating chunk availability information.
Deng, et al. Expires August 16, 2014 [Page 2]
Internet-DraftEfficient Chunk Availability Compression for February 2014
PPSP.PP.REQ-7: The peer protocol MUST take the frequency of
messages and efficient use of bandwidth into consideration, when
communicating chunk information.
In this draft, we propose an efficient bitmap compression scheme for
chunk availability information in PPSP protocols. Given the Bloom
Filters' wide applications in Internet and demonstrated efficiency
with highly compacted data structure and low complexity and cost in
terms of information storage, transportation and computation, it is
expected to relieve a PPSP implementer from the dilemma between "the
frequency of messages" (i.e. the timely exchange of information that
contributes to better user experience) and "efficient use of
bandwidth" (i.e. the limit of a single node/peer that holds the
system's overall scalability by throat).
2. Background on Bloom Filter
Bloom Filter (or BF for short) was first introduced in 1970s
[BF-bloom], which makes use of multiple hashing functions to build a
mapping from a set of elements to a compact binary array, to realize
highly efficient member queries with a tolerably low error rate of
wrongly reported hits. Despite their extraordinary efficiency in
terms of storage reduction and query acceleration, BFs suffer from
the fact that there is possibility that a non-member of the set be
wrongly taken as a member after the query. However, research
[BF-analysis] shows that the odds that a BF-based membership query
makes an erroneous hit can be suppressed to near zero, by a tactful
configuration of various system parameters, including the hash
functions used, the number of hash functions to be used, and the
length of the bit array.
Deng, et al. Expires August 16, 2014 [Page 3]
Internet-DraftEfficient Chunk Availability Compression for February 2014
------------------------------------------------
BF(set S, integer m, hash set H)
1 filter=allocate m bits initialized to 0;
2 for each element xi in S do
3 for each hash functions hi in H do
4 filter[hi(xi)]=1;
5 return filter;
-------------------------------------------------
MT(element elm, BF filter, integer m, hash set H)
1 for each hash functions hi in H do
2 if (filter[hi(elm)]!=1)
3 return false;
4 return true;
-------------------------------------------------
ST(BF query, BF filter)
1 temp=query OR filter;
2 if (temp!=filter)
3 return false;
4 return true;
-------------------------------------------------
Figure 1: Basic algorithms for BF-bitmap
As shown in Figure 1, the BF(S,m) algorithm takes a n-membered sub-
set S={x1,x2,...,xn} from a universal set U as input, and outputs a
m-bit binary array B as a compacted representation of S. In order to
do that, it makes use of k independent random hash functions, each of
which maps a member to a marked bit in B (i.e hj: U-> [1,m],
j=1...k). The BF algorithm is highly efficient in the following
aspects:
o It is quite simple and straightforward to generate the BF
representation of a set S, B=BF(S): initially, all the bits in B
is set to 0; then, for each member x of the set S, mark each bit
in B, to which a hash function maps x (as shown in Figure 1 as the
BF algorithm).
o It is highly efficient to check whether or not a given element x
is in any BF-represented set B=BF(S): for each hash function hj,
check the value of B[hj(x)] against 1. It is always safe to
exclude the element x out of set S, once there is a zero-valued
hash bit. Otherwise it is assumed that x is a member of S (the MT
algorithm in Figure 1).
o It is also highly efficient to check whether or not a given
element set S is contained by another set S' if they are both
represented as BF-bitmaps, say query(=BF(S)) and filter(=BF(S'))
for instance. It is always safe to return "false" to the
requestor if there's any marked bits in BF(S') and not marked in
Deng, et al. Expires August 16, 2014 [Page 4]
Internet-DraftEfficient Chunk Availability Compression for February 2014
BF(S), which can be realized by two simple bitwise operations(as
shown by the ST algorithm in Figure 1).
For instance, given a 2GB movie file, the original bitmap for a
sharing peer would be 1024-bit (if the system is using 2MB-sized
segments). By simply using 4 uniform random hash functions and a
128-bit BF-bitmap, the possibility of erroneous hits by MT algorithm
would be lower than 3%.
As for a simple illustration, the 4 hash functions may be established
through the MD5 message-digest algorithm [RFC1321], a widely used
cryptographic hash function that produces a 128-bit (16-byte) hash
value from an arbitrary binary input. MD5 has been utilized in a
wide variety of security applications, and is also commonly used to
check data integrity.
Specifically, the processing of 4 hash functions is as follows: use
the MD5 algorithm to turn a given chunk_ID into a 128-bit binary
array, further separate the 128 bits into 4 arrays (32-bit each), and
finally divide each of them using 128 to yield 4 integers in the
range of [1,m].
3. BF-based Chunk Availability Exchange
We first construct a general message flow (shown in Figure 2) from
PPSP protocols, and then discuss how to integrate BF-bitmap algorithm
with it.
3.1. A non-BF PPSP session
Deng, et al. Expires August 16, 2014 [Page 5]
Internet-DraftEfficient Chunk Availability Compression for February 2014
+--------+ +--------+ +--------+ +--------+ +-------+
| Player | | Peer 1 | | Portal | | Tracker| | Peer 2|
+---+----+ +----+---+ +-----+--+ +----+---+ +----+--+
| | | | |
|--Page request----------------->| | |
|<--------------Page with links--| | |
|--Select stream (MPD Request)-->| | |
|<--------------------OK+MPD(x)--| | |
|--Start/Resume->|--CONNECT(join x)------------>| |
|<-----------OK--|<----------------OK+Peerlist--| |
: : : : :
| |<-------------------- HANDSHAKE,HAVE(S2)--|
|-Get(Chunk s1)->| | | |
| |-- REQUEST(s1)--------------------------->|
|<-----Chunk s1--|<-------------------------DATA(Chunk s1)--|
| |-- ACK(s1),HAVE(S1)---------------------->|
: : : : :
| |--STAT_REPORT---------------->| |
| |<-------------------------Ok--| |
: : : : :
| |--FIND(Chunk subset)--------->| |
| |<-------------OK+PeerList-----| |
: : : : :
Figure 2: A typical PPSP session for watching a streaming content.
When a peer wants to receive streaming of a selected content (Leech
mode):
1. Peer connects to a connection tracker (which may be located
through a web portal) and joins a swarm.
2. Peer acquires a list of other peers in the swarm from the
connection tracker (via the tracker protocol) through the CONNECT
message.
3. Peer exchanges its content availability with the peers on the
obtained peer list (via peer protocol) through the HAVE message.
4. Peer requests content from the identified peers (via peer
protocol) through the REQUEST-DATA messages.
5. Peer periodically reports its status and chunk availability with
the tracker (via the tracker protocol) through the STAT_REPORT
message.
6. Peer acquires a list of other peers for a specific subset of
media chunks in the swarm from the connection tracer (via the
tracker protocol) through the FIND message.
Deng, et al. Expires August 16, 2014 [Page 6]
Internet-DraftEfficient Chunk Availability Compression for February 2014
3.2. A PPSP Session with BF-bitmaps
This document proposes to employ bloom filter algorithms in
compressing chunk availability information exchanged and stored
between peers and the tracker through the PPSP-TP protocols and PPSPP
protocol. Relevant extensions to the current protocols are
summarized as follows: (as shown in Figure 3)
+--------+ +--------+ +--------+ +--------+ +-------+
| Player | | Peer 1 | | Portal | | Tracker| | Peer 2|
+--------+ +--------+ +--------+ +--------+ +-------+
| | | | |
(a1) |--Page request----------------->| | |
|<----Page with links(+BF conf)--| | |
|--Select stream (MPD Request)-->| | |
|<----------OK+MPD(x)(+BF conf)--| | |
|--Start/Resume->|--CONNECT(join x)------------>| |
(b1) |<-----------OK--|<--OK(Peerlist(BF)------------| |
| | | | |
: : : : :
(c1) | |<----------- HANDSHAKE(BF conf)---------->|
(c2) | |<------------ HAVE(BF(S2))----------------|
|-Get(Chunk s1)->| | | |
(c3) | |------------- REQUEST(BF(s1))------------>|
|<-----Chunk s1--|<-------------------------DATA(Chunk s1)--|
: : : : :
(b2) | |-STAT_REPORT(BF(ContentMap))->| |
| |<-------------------------Ok--| |
: : : : :
| |--FIND(Chunk subset S')------>| |
(b1) | |<---------OK+PeerList(BF)-----| |
: : : : :
Figure 3: A typical PPSP session with BF-bitmaps.
a. Configuration Setup:
* (a1)Configuration Setup: m, The length of the output bit array
and H, the hash functions in use, are system level parameters
that should be configured globally: (a1) the BF configurations
(or BF conf for short) be stored at the web portal and
published to a requesting peer through the web page or MPD
file transaction, which could be incorporated to the
"Installation and Initial Setup" in PPSP TP protocol
[I-D.ietf-ppsp-base-tracker-protocol].
Deng, et al. Expires August 16, 2014 [Page 7]
Internet-DraftEfficient Chunk Availability Compression for February 2014
b. Integration to the extended TP protocol
[I-D.ietf-huang-extended-tracker-protocol]: In the extended TP
protocol, a new "ContentGroup" element is added in requests,
i.e., CONNECT and FIND, if the request includes a content scope.
In addition, this new element is also added to the
"StatisticsGroup" element for containing chunkmap information in
STAT_REPORT messages. To enable the BF-based schemes to be used,
the relevant interactions are described as follows:
* (b1)In response to a CONNECT(join)/FIND request from a peer,
the tracker may accompany the returned peer list with each
recommended peer's BF-formed chunk availability bitmap, as the
initial guidance for the requestor to start looking for
neighbors in the same swarm. The additional cost for bearing
the chunk-level availability information is constant (O(m))
for each peer in the returned peer list.
* (b2)STAT_REPORT: Peers use the BF(S,m,H) algorithm for
compressing the subset of locally stored and integrity
verified chunks (set S) in terms of a given swarm-ID, whenever
reporting or updating its chunk availability information with
the tracker. As the length of each BF-bitmap is constant
(O(m)), this will greatly reduce the tracker's resource
expenditure in communicating and storing such information for
a large peer population.
c. Integration to the peer protocol
[I-D.ietf-ietf-ppsp-peer-protocol]:
* (c1)HANDSHAKE: Peers exchange their local settings for the
chunk compression schemes via HANDSHAKE messages.
* (c2)HAVE: Peers use the BF(S,m,H) algorithm for compressing
the subset of locally stored and integrity verified chunks
(e.g. set S2 for Peer 2 in Figure 3) in terms of a given
swarm-ID, whenever sharing its chunk availability information
with another peer. The length of each BF-bitmap is constant
(O(m)).
* (c3)REQUEST: For a downloading peer to decide which neighbor
to request for a given chunk_ID s, it uses the member query
algorithm MT(s,bf,m,H) on each neighbor's BF-bitmap bf. The
computation cost for this member check is constant (O(m)). It
is also optional for a requesting peer to use BF-bitmap to
indicate its data request to another peer, if needed.
3.3. Summary
Deng, et al. Expires August 16, 2014 [Page 8]
Internet-DraftEfficient Chunk Availability Compression for February 2014
3.3.1. Base TP protocol
In view of the wish that the "configuration processes for the PPSP
Tracking facility, the service Portal and content sources are not
standardized, enabling all the flexibility for implementers. But as
there could have been different options to realize the chunk
availability information (i.e. the chunk bitmap), it should be
systematically configured so as to be mutually understandable to both
parties. Therefore, we propose to add the following sentence to
[I-D.ietf-ppsp-base-tracker-protocol] Section 5.1.1 "Installation and
Initial Setup": "In case of a peer or the tracker wishes to exchange
further information about the available peers in a flexible way, e.g.
the chunk availability information of a specific peer in the same
swarm could be represented in a various ways, there should be a way
of negotiation/indication about the specific method/parameters in
use, e.g. in the MPD file downloaded by the requesting peer from the
web portal."
3.3.2. Extended TP protocol
The "ContentGroup" element in STAT_REPORT message has been extended
to contain contentmap information, as shown in Figure 4).
Deng, et al. Expires August 16, 2014 [Page 9]
Internet-DraftEfficient Chunk Availability Compression for February 2014
+----------------------+---------+----------------------------------+
| Element Name or | Use | Description |
| Attribute Name | | |
+----------------------+---------+----------------------------------+
| ContentGroup | 0...1 | Provides information on content. |
| CAM | 1 | Describes the chunk addressing |
| | | method of this content. The value|
| | | is identical with the value of |
| | | Table 6 of [I-D.ietf-ppsp-peer |
| | | -protocol] |
| Representation | 1...N | Describes a component of content.|
| @id | M | Unique identifier for this |
| | | Representation. |
| SegmentInfo | 1 | Provides segment information. |
| @startIndex | M | The index of the first media |
| | | segment in the request scope for |
| | | this Representation. |
| @endIndex | OP | The index of the last media |
| | | segment in the request scope for |
| | | this Representation. |
+----------------------+---------+----------------------------------+
| Legend: |
| Use for attributes: M=Mandatory, OP=Optional, |
| CM=Conditionally Mandatory |
| Use for elements: minOccurs...maxOccurs (N=unbounded) |
| Elements are represented by their name (case-sensitive) |
| Attribute names (case-sensitive) are preceded with an @ |
+-------------------------------------------------------------------+
Figure 4: Semantics of ContentGroup.
In order to incoporate BF-based representation, we propose to
o use the "SegmantInfo" field for the BF-formatted bitmap; and
o the three attributes of "startIndex", "endIndex" and
"chunkmapSize" are meaningless in this case and are allowed to be
absent.
3.4. Peer protocol
In peer protocols, the bitmap representation schemes are called
"chunk addressing schemes", whose choice and configuration is
included in the protocol options information exchanged between peers
in communication initiation via HANDSHAKE messages. As shown by
Figure 5), there are currently five options defined for bitmap
representations.
Deng, et al. Expires August 16, 2014 [Page 10]
Internet-DraftEfficient Chunk Availability Compression for February 2014
+--------+---------------------+
| Method | Description |
+--------+---------------------+
| 0 | 32-bit bins |
| 1 | 64-bit byte ranges |
| 2 | 32-bit chunk ranges |
| 3 | 64-bit bins |
| 4 | 64-bit chunk ranges |
| 5-255 | Unassigned |
+--------+---------------------+
Figure 5: PPSP Peer Chunk Addressing Methods.
After downloading and verifying a chunk, a peer updates its local
chunk availability information to each neighboring peer in the same
swarm using HAVE message, which consists of a single chunk
specification that states that the sending peer has those chunks and
successfully checked their integrity.In order to incorporate BF-based
representation, we propose to
o add a defined value (e.g. 5) from "unassigned" value range of the
PPSP peer chunk addressing methods for the BF-formatted bitmap;
and
o use HAVE/REQUEST message to convey the BF-format array for the
overall local chunk bitmap in the way as shown in Figure 6.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| BF Length (16) | ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~ BF array (variable) ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 6: BF formatted bitmap.
The BF Length field contains the length of the BF-formatted bitmap
that follows in bytes. The Length field is 16 bits wide to allow for
flexible configuration to allow for less erroneous BF compression for
large chunk groups by using longer binary arrays.
Deng, et al. Expires August 16, 2014 [Page 11]
Internet-DraftEfficient Chunk Availability Compression for February 2014
4. Open Issues
As the BF-based scheme is highly controllable and stable in terms of
resource efficiency, it is viable to incorporate the contentmap
information about each returned peer in the Peerlist without
introducing considerable extra overhead. The current definition of
PeerGroup element in the extended TP protocol needs to be extended to
include "SegmentInfo" field as defined in ContentGroup element and
used accordingly. However, the necessacity of introducing other peer
specific information into the TP protocol via Peerlist is still
subject to discussion. In particular, these info exchange is far
less frequent and accurate as compared to the peer protocols's
HANDSHAKE and HAVE messages.
For the sake of simplicity, the above integration with tracker
protocol does not consider the case of employing more than one bitmap
compression algorithms for a single swarm. Meanwhile, it is noted
that there are considerable content in the peer protocol dealing with
chunk addressing schemes inter-working between peers employing
different schemes. However, in BF-based scheme, as for the
irreversible nature of hash functions, it is not feasible to
translate a BF-formatted array back into an original bitmap.
Therefore, in order to keep compatible and open to peers using other
addressing schemes, it is required that there be an original bitmap
maintained by the local peer to allow the ability of communicating
with peers using another addressing scheme.
Even if the chunk addressing scheme can be uniquely specified in a
MPD file for a give swarm, there is still possibility that a peer who
cannot support the relevant addressing scheme comes along and contact
the tracker for joining intention. Therefore, there is need for such
configuration exchange channel via TP protocol to allow mismatch
detection and error handling. A possible solution could be to add an
algorithm identifier attribute in the "SwarmId" field, and keep the
value definition for different schemes consistent with the peer
protocol's "Chunking Addressing Method" field.
5. Security Considerations
TBA
6. IANA Considerations
None.
Deng, et al. Expires August 16, 2014 [Page 12]
Internet-DraftEfficient Chunk Availability Compression for February 2014
7. References
7.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax
Specifications: ABNF", RFC 2234, November 1997.
7.2. Informative References
[BF-analysis]
Broder, A. and M. Mitzenmacher, ""Network applications of
Bloom Filters: a survey"", Internet Mathematics Vol. 1,
No. 4, pp. 485-509, 2004.
[BF-bloom]
Bloom, B., "Space/time trade-offs in hash coding with
allowable errors.", Communications of ACM Vol. 13, No. 7,
pp. 422-426, 1970.
[I-D.ietf-huang-extended-tracker-protocol]
Huang, R., Zong, N., Cruz, R., Nunes, , and J. Taveira,
"PPSP Tracker Protocol-Extended Protocol", draft-ietf-
huang-extended-tracker-protocol-02 (work in progress),
February 2013.
[I-D.ietf-ietf-ppsp-peer-protocol]
Bakker, A., Petrocco, R., and V. Grishchenko, "Peer-to-
Peer Streaming Peer Protocol (PPSPP)", draft-ietf-ppsp-
peer-protocol-06 (work in progress), February 2013.
[I-D.ietf-p2psip-base]
Jennings, C., Lowekamp, B., Rescorla, E., Baset, S., and
H. Schulzrinne, "REsource LOcation And Discovery (RELOAD)
Base Protocol", draft-ietf-p2psip-base-26 (work in
progress), February 2013.
[I-D.ietf-ppsp-base-tracker-protocol]
Cruz, R., Nunes, M., Gu, Y., Xia, J., and J. Taveira,
"PPSP Tracker Protocol-Base Protocol (PPSP-TP/1.0)",
draft-ietf-ppsp-base-tracker-protocol-00 (work in
progress), February 2013.
Deng, et al. Expires August 16, 2014 [Page 13]
Internet-DraftEfficient Chunk Availability Compression for February 2014
[I-D.ietf-ppsp-problem-statement]
Zhang, Y. and N. Zong, "Problem Statement and Requirements
of Peer-to-Peer Streaming Protocol (PPSP)", draft-ietf-
ppsp-problem-statement-12 (work in progress), January
2013.
[P2P-limit]
Feng, C., Li, B., and B. Li, "Understanding the
performance gap between pull-based mesh streaming
protocols and fundamental limits", in Proc. of IEEE
INFOCOM , 2009.
[RFC1321] Rivest, and Newport, "RFC 1321: The MD5 message-digest
algorithm", April 1992.
Authors' Addresses
Lingli Deng
China Mobile
Email: denglingli@chinamobile.com
Jin Peng
China Mobile
Email: pengjin@chinamobile.com
Yunfei Zhang
CoolPad
Email: hishigh@gmail.com
Yihong Huang
Huawei
Email: rachel.huang@huawei.com
Deng, et al. Expires August 16, 2014 [Page 14]