Internet DRAFT - draft-jin-content-deduplication-optimization
draft-jin-content-deduplication-optimization
CDNI W. Jin
Internet-Draft W. Wang
Intended status: Informational Z. Hao
Expires: July 21, 2012 Y. Meng
ZTE Corporation
January 18, 2012
CDNi Content De-duplication Optimization
draft-jin-content-deduplication-optimization-00
Abstract
Some CDNi deployments are likely to lead to content repetition in the
same dCDN. This document gives the cases and then discusses the
optimization approach to de-duplicate of the repeated content in CDNi
network. To implement the optimization, the enhancement to CDNi
metadata model and interface is required.
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on July 21, 2012.
Copyright Notice
Copyright (c) 2012 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
Jin, et al. Expires July 21, 2012 [Page 1]
Internet-Draft CDNi Content De-duplication Optimization January 2012
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3
2. Deployment Scenarios . . . . . . . . . . . . . . . . . . . . . 4
2.1. Scenarios 1 . . . . . . . . . . . . . . . . . . . . . . . 4
2.2. Scenarios 2 . . . . . . . . . . . . . . . . . . . . . . . 4
2.3. Scenarios 3 . . . . . . . . . . . . . . . . . . . . . . . 5
3. Content Naming for CDNi . . . . . . . . . . . . . . . . . . . 6
3.1. Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2. Ownership . . . . . . . . . . . . . . . . . . . . . . . . 7
4. CDNi Content De-duplication Optimization Implementation . . . 7
4.1. Constant URL . . . . . . . . . . . . . . . . . . . . . . . 7
4.2. Content Naming Mechanism . . . . . . . . . . . . . . . . . 8
4.3. Details for Content De-duplication . . . . . . . . . . . . 9
4.3.1. Pre-Positioned Content Acquisition . . . . . . . . . . 9
4.3.2. Dynamic Content Acquisition . . . . . . . . . . . . . 11
4.3.3. Content Removal . . . . . . . . . . . . . . . . . . . 13
5. Security Considerations . . . . . . . . . . . . . . . . . . . 14
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14
7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 15
8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 15
8.1. Normative References . . . . . . . . . . . . . . . . . . . 15
8.2. Informative References . . . . . . . . . . . . . . . . . . 15
Appendix A. Additional Stuff . . . . . . . . . . . . . . . . . . 15
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 15
Jin, et al. Expires July 21, 2012 [Page 2]
Internet-Draft CDNi Content De-duplication Optimization January 2012
1. Introduction
In some CDNi deployments, the dCDN may be required to cache the same
content copy from the same Content Service Provider (CSP). For
example, the CSP has the agreement with two Authoritative CDNs, and
both are the upstream CDN of the same dCDN. Another example appears
in the cascaded mode. The top-layer uCDN establishes connection with
two intermediate-layer uCDNs, and both connect to the same bottom-
layer dCDN. In such scenarios, the dCDN may be requested to download
the content from one uCDN, then be requested to deliver the same
content from another uCDN. Content repetition wastes the dCDN's the
memory or storage resource, and wastes bandwidth to deliver the
repeated content. So it is necessary to avoid delivering the
repeated content from separated uCDNs to dCDN. In this draft, we
list scenarios where content repetition may happen. And a solution
named content de-duplication is discussed.
To address the content repetition problem, several issues may need to
be considered.
* How to detect content repetition by dCDN, and a content naming
mechanism is required for CDNi network.
* How to avoid content repetition, when one or more uCDNs selects one
dCDN to deliver the same content to multiple User Agents.
This document provides detailed analysis for the issues of content
de-duplication.
1.1. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
This document reuses the terminology defined in [I-D.jenkins-cdni-
problem-statement].
Resource Id: a metadata object (e.g. URL) which identifies the
resource on a specific CDNi interface associated with a particular
Authoritative CDN.
Content Id: a metadata object (e.g. URN) which uniquely identifies
the content in the scope of CDNi.
Jin, et al. Expires July 21, 2012 [Page 3]
Internet-Draft CDNi Content De-duplication Optimization January 2012
2. Deployment Scenarios
This section shows several CDNi deployments that typically lead to
content repetition.
2.1. Scenarios 1
As depicted in Figure 1, two interconnected CDN-A(uCDN) and CDN-
B(dCDN) both have contracts with CSP. CDN-B play two roles at the
same time: down streaming CDN of CDN-A and Authoritative CDN of CSP.
When an end-user of CDN-A initiate content acquisition from the CSP,
CDN-A decides CDN-B as the serving CDN. Then CDN-A redirects the
request to CDN-B, and indicate CDN-B to retrieve the content from it.
Normally CDN-B as Authoritative CDN is very likely to have cached the
content from original server. If CDN-B cannot identify the said
contents are the same content, the same content will be repeatedly
cached.
As the location of the content in a CDN is normally planned by CDN
itself, the URLs of the same content are likely different between
CDNs. So it is not enough to decide whether the content to be cached
is the same only by URL.
+-------+
| CSP |
+-------+
/ \
,--,--,--./ \,--,--,--.
,-' `-. ,-' `-.
( CDN Provider A )=====( CDN Provider B )
`-. (CDN-A) ,-' `-. (CDN-B) ,-'
`--'--'--' `--'--'--'
|
+------------+
| User Agent |
+------------+
=== CDN Interconnect
Figure 1
2.2. Scenarios 2
As depicted in Figure 2, both CDN-A and CDN-B establish
interconnections with CDN-C which acts as a dCDN. Thus, CDN-C will
cache the content for CDN-A and CDN-B. When both CDN provider A and
CDN provider B have agreements with the same CSP for content delivery
at the same time, CDN-C may be required by CDN-A and CDN-B separately
Jin, et al. Expires July 21, 2012 [Page 4]
Internet-Draft CDNi Content De-duplication Optimization January 2012
to cache the same content of CSP. Similar to Scenario 1, CDN-C is
also likely to suffer from content repetition troubles.
+-------+
| CSP |
+-------+
/ \
,--,--,--./ \,--,--,--.
,-' `-. ,-' `-.
( CDN Provider A ) ( CDN Provider B )
`-. (CDN-A) ,-' `-. (CDN-B) ,-'
`--'--'--' `--'--'--'
\\ //
\\,--,--,--.//
,-' `-.
( CDN Provider C )
`-. (CDN-C) ,-'
`--'--'--'
|
+------------+
| User Agent |
+------------+
=== CDN Interconnect
Figure 2
2.3. Scenarios 3
Now we consider the case of cascaded CDNs, as depicted in Figure 3,
wherein the top-layer Upstream CDN-A directly relate to CSP and
interconnects with two middle-layer CDNs (CDN-B and CDN-C) which have
the same bottom-layer Downstream CDN-D interconnected. So there are
two possible delivery paths for CDN-D to cache the contents of CSP,
CDN-A -> CDN-B -> CDN-D or CDN-A -> CDN-C -> CDN-D. CDN-D may be
required to cache the same content by upstream CDNs(CDN-B and CDN-C)
on different paths. If the URL of the content is changed by CDN-B or
CDN-C, CDN-D cannot be aware of the contents to be cached and
therefore this likely leads to content repetition.
Jin, et al. Expires July 21, 2012 [Page 5]
Internet-Draft CDNi Content De-duplication Optimization January 2012
,--,--,--.
,-' `-. +-------+
( CDN Provider A )----| CSP |
`-. (CDN-A) ,-' +-------+
//-'--'-\\
,--,--,--.// \\,--,--,--.
,-' `-. ,-' `-.
( CDN Provider B ) ( CDN Provider C )
`-. (CDN-B) ,-' `-. (CDN-C) ,-'
`--'--'--' `--'--'--'
\\ //
\\,--,--,--.//
,-' `-.
( CDN Provider D )
`-. (CDN-D) ,-'
`--'--'--'
|
+------------+
| User Agent |
+------------+
=== CDN Interconnect
Figure 3
3. Content Naming for CDNi
It is well known that CDNs have their own content naming mechanisms
most of which are independent and separated from each other due to
the use of different algorithms such as Hash algorithms. It implies
that for the same content distributed by two CDNs, the corresponding
content identifiers are likely to be quite different.
[I-D.lefaucheur-cdni-requirements] treats the information regarding
CDN content naming as intra-CDN information and the CDNI solution
MUST not require intra-CDN information to be exposed to other CDNs
for effective and efficient delivery of the content. Therefore,
establishing a uniform content naming mechanism is urgently needed
for CDNi network. This mechanism which can be implemented by CDNI
Metadata Distribution Protocol may possess the following properties
below.
3.1. Uniqueness
CDNi content naming mechanism must guarantee the uniqueness of
content identification. URL is widely used for identifying network
resource, but it is not quite suitable for content identification in
CDNi network where content de-duplication is needed. Although the
method of URL match is usually used by many cache systems to detect
Jin, et al. Expires July 21, 2012 [Page 6]
Internet-Draft CDNi Content De-duplication Optimization January 2012
the repetitive files with same name in order to avoid content
repetition, it is probably failed for CDNi due to different
forwarding mechanisms that is the user-originated requests are always
snooped by devices like DPI before transmitted to the original
server, whereas the requests received by dCDN are always redirected
by one or more uCDNs. Since there is no guarantee of unchanged URLs
through the redirected process, some other object need to be defined
to represent the uniqueness of content identification.
For the CSP's content distributed into different interconnected CDNs,
the related metadata objects may be somewhat different in many cases.
Taking the example in Figure 2, when both CDN-A and CDN-B delegate
the delivery of the same CSP's content, the content metadata such as
content description, access policy and security policy may be not
quite similar to each other. Obviously, such metadata information is
not suitable for content identification. Thus, we need to define
some other metadata object that uniquely identifies the same content.
3.2. Ownership
CDNi content naming mechanism should embody the ownership of content
identification. Typically, a CDN provider has contracts with many
CSPs for delivering their content, as well as operates its own
content. However, lots of contents published by these CSPs are
usually highly resembled, and part of them are even exactly the same.
So the problem is whether these identical copies originated from the
same CSP or how the interconnected CDNs know they are identical.
(Note: Such copies are pointed by the same content identification
only if they are from the same content source.) For a traditional
(non-interconnected) CDN, there is no problem to distinguish them via
its intra content naming mechanism. While a CDN interconnects with
other CDNs, the condition becomes more complicated due to the lack of
awareness of CSP's content when the CDN acts as a dCDN.
4. CDNi Content De-duplication Optimization Implementation
4.1. Constant URL
Of course, URL can be used to implement content de-duplication in
CDNi network. As refered in section 2, the URL is different in uCDN
by uCDN and is likely changed in the redirection process. There is
an usage of agreement of configuration URL between a pair of
interconnected CDN in [I-D.davie-cdni-framework], however it is
difficult to expand to the complex CDNi network. So a feasible
proposal is a mechanism for CDNi network to guarantee CSP's contents
cached in different uCDNs are identified by the same URL and the URL
is unchanged or at least the path part is remained unchanged in the
Jin, et al. Expires July 21, 2012 [Page 7]
Internet-Draft CDNi Content De-duplication Optimization January 2012
redirection process. The main problem of this mechanism is lack of
resilience and we prefer another mechanism to be introduced in
section 4.2.
4.2. Content Naming Mechanism
This section provides detailed implementation of CDNi content naming
mechanism using CDNi Metadata Protocol.
CSP's content as well as its copies cached in interconnected CDNs are
delivered to numerous End-Users. [I-D.davie-cdni-framework] assigns
each copy identifiers in terms of URLs embedded with "CDN-Domain"
used to distinguish a download request from an end-user or an uCDN.
We use the term Resource Identifier to represent the URLs pointing to
the content copies in interconnected CDNs. However, unlike to the
usage in [I-D.davie-cdni-framework], in this document CDNi content
naming mechanism makes Resource Identifiers are only related to
contents in uCDNs and cannot be unchanged during the forwarding
process. Taking the example in section 2.3, we use Resource
Identifier A point to content originated through uCDN A. When two
users acquire the same content from uCDN D, with two different
request forwarding paths, CDN A->CDN B->CDN D and CDN A->CDNC->CDN
D,respectively, uCDN D is able to find out that the visited content
and the related metadata are originally distributed from uCDN A using
the Resource Identifier A.
Although the Resource Identifier is able to indentify content,
uniqueness of content identification can't be guaranteed, as CSP may
have contracts with different uCDNs. In order to resolve it, we
introduce the term Content Identifier which is assigned to associate
with Resource Identifier to uniquely identifies the content and is
similar to the URN usage. (Note: The Content Identifier MUST be
globally unique.) A metadata model depicted in Figure 4 shows the
relationship between the two kinds of Identifier. Depending upon
this model, dCDN is able to be aware of such requests towards the
same targeted content.
Jin, et al. Expires July 21, 2012 [Page 8]
Internet-Draft CDNi Content De-duplication Optimization January 2012
+----------------------+
| Content Identifier |
+----------------------+
/ | \
/ | \
+--------------+ +--------------+ +--------------+
| Resouce | | Resouce | | Resouce |
| Indentifier 1| | Indentifier 2| ... | Indentifier n|
+--------------+ +--------------+ +--------------+
Figure 4
Note: Who is responsible for creating and maintaining the Content
Identifier needs to further study.
4.3. Details for Content De-duplication
This section details the solution making use of CDNi Content Naming
mechanism for content de-duplication.
Using the content identification model included in content metadata,
an interconnected CDN is able to detect content repetition. The
content status must be synchronously updated by the interconnected
CDN. According to content status, the interconnected CDN can judge
the resource copy is cached or not.
We present several procedures below to illustrate how to implement
optimization for CDNi content de-duplication.
4.3.1. Pre-Positioned Content Acquisition
The following flow illustrates how the two uCDNs successively pre-
position the same content in the dCDN. In this flow, the content to
be re-positioned in the dCDN is identified by different Resource
Identifiers corresponding to the uCDN A and uCDN B.
Jin, et al. Expires July 21, 2012 [Page 9]
Internet-Draft CDNi Content De-duplication Optimization January 2012
+--------+ +--------+ +--------+
| dCDN | | uCDN A | | uCDN B |
+--------+ +--------+ +--------+
| Pre-position Request | |
|<-------------------------| |
+--------------+ | |
| Detection of | | |
| content rep- | | |(1)
| etition | | |
+--------------+ | |
| OK | |
|------------------------->| |
| | |
| Acquisition Request | |
|------------------------->| |
| | |
| Content Data | |(2)
|<-------------------------| |
+--------------+ | |
| Update cont- | | |
| ent status | | |
+--------------+ | |
| Pre-position Request |
|<-----------------------------------------------------|
+--------------+ | |
| Detection of | | |
| content rep- | | |(3)
| etition | | |
+--------------+ | |
| OK |
|----------------------------------------------------->|
| | |
Figure 5
The steps illustrated in the figure are as follows:
1. The uCDN A requests that the dCDN pre-positions a particular
content item identified by its Resource Identifier and Content
Identifier. Receiving the message, the dCDN uses the Content
Identifier to look up target metadata to see whether the same content
item is cached. With the result that the metadata does not exist,
the dCDN then replies to uCDN A with a OK message to notify that no
such copy has been cached and content pre-position is required.
2. The dCDN acquires the item of content from uCDN A. Once the
content is pre-positioned, dCDN updates the content status and
Jin, et al. Expires July 21, 2012 [Page 10]
Internet-Draft CDNi Content De-duplication Optimization January 2012
maintains the content identification metadata.
3. The uCDN B requests that dCDN pre-positions the same content item
identified by its Resource Identifier and Content Identifier.
Receiving the message, the dCDN uses the Content Identifier to look
up target metadata. Because such metadata exists, dCDN determines
that the content is already cached. Then, dCDN replies to uCDN A
with a OK message to notify that the same copy has been cached and
pre-position should be canceled.
4.3.2. Dynamic Content Acquisition
The following flows illustrate how the dCDN performs content de-
duplication in cases of a cache miss and a cache hit without content
pre-positioning.
Jin, et al. Expires July 21, 2012 [Page 11]
Internet-Draft CDNi Content De-duplication Optimization January 2012
+----------+ +------+ +------+
| end-user | | dCDN | | uCDN |
+----------+ +------+ +------+
| Content Request | |
|------------------------------------------------------->|
| Content Redirection | |
|<-------------------------------------------------------|(1)
| Content Request | |
|-------------------------->| |
| | Content id Acquisition |
| |<-------------------------->|
| +--------------+ |
| | Detection of | |
| | content rep- | |(2)
| | etition | |
| +--------------+ |
| | Acquisition Request |
| |--------------------------->|
| | Content Data |
| |<---------------------------|
| +--------------+ |(3)
| | Update cont- | |
| | ent status | |
| +--------------+ |
| Content Data | |
|<--------------------------| |
| | |
Figure 6
The steps illustrated in the figure are as follows:
1. A content request originated by a end-user arrives at uCDN. The
uCDN processes the request and recognizes that the end-user is best
served by dCDN. So uCDN redirects the request to dCDN by sending
redirection response to the end-user who then requests the content
from dCDN.
2. Receiving the request, the dCDN uses the Resource Identifier
pointing to the requested resource to fetch the corresponding Content
Identifier from the uCDN. The dCDN then uses the Content Identifier
to look up target metadata indicates whether the content item is
cached. With the result that such metadata does not exist, the case
of a cache miss is decided by the dCDN so that the content needs to
be downloaded from the uCDN before delivered to the end-user.
3. The dCDN acquires the requested content from uCDN A. Once the
Jin, et al. Expires July 21, 2012 [Page 12]
Internet-Draft CDNi Content De-duplication Optimization January 2012
content cached, dCDN updates the content status and maintains the
content identification metadata. The dCDN then delivers content data
to the end-user.
+----------+ +------+ +------+
| end-user | | dCDN | | uCDN |
+----------+ +------+ +------+
| Content Request | |
|------------------------------------------------------->|
| Content Redirection | |
|<-------------------------------------------------------|(1)
| Content Request | |
|-------------------------->| |
| | Content id Acquisition |
| |<-------------------------->|
| +--------------+ |
| | Detection of | |
| | content rep- | |(2)
| | etition | |
| +--------------+ |
| Content Data | |
|<--------------------------| |
| | |
Figure 7
The steps illustrated in the figure are as follows:
Steps 1 and 2 are exactly the same as steps 1 and 2 of Figure 5, only
in this time dCDN deciding the case of a cache hit according to the
existence of such a content identification metadata.
This flow differs from the one in Figure 5 only without triggering
dynamic content acquisition (step 3), since the content has been
cached by dCDN.
4.3.3. Content Removal
The following flow illustrates how the dCDN removes the content
resource under the control of the uCDN.
Jin, et al. Expires July 21, 2012 [Page 13]
Internet-Draft CDNi Content De-duplication Optimization January 2012
+----------+ +------+ +------+
| end-user | | dCDN | | uCDN |
+----------+ +------+ +------+
| | Content Removal |
| |<---------------------------|
| +--------------+ |
| | Removal of | |
| | content | |(1)
| +--------------+ |
| | |
| +--------------+ |
| | Removal of | |
| | conjunction | |
| | metadata | |(2)
| +--------------+ |
| | OK |
| |--------------------------->|
| | |
Figure 8
A premise is that the content resource to be removed is cached in the
dCDN from the uCDN. The steps illustrated in the figure are as
follows:
1. The uCDN requests that the dCDN removes some content resource
identified by the Content Identifier due to the deployment policy or
expiration of life-time. The dCDN then removes the resource.
2. Once removal of the content resource, the dCDN updates the
content status as well as removes the content identification
metadata, and then replies a OK response to the uCDN.
5. Security Considerations
TBD
6. IANA Considerations
This document has no IANA Considerations.
Jin, et al. Expires July 21, 2012 [Page 14]
Internet-Draft CDNi Content De-duplication Optimization January 2012
7. Acknowledgments
TBD
8. References
8.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
8.2. Informative References
[I-D.davie-cdni-framework]
Davie, B. and L. Peterson, "Framework for CDN
Interconnection", July 2011.
[I-D.jenkins-cdni-problem-statement]
Niven-Jenkins, B., Faucheur, F., and N. Bitar, "Content
Distribution Network Interconnection (CDNI) Problem
Statement", March 2011.
[I-D.lefaucheur-cdni-requirements]
Leung, K., Lee, F., Faucheur, F., Viveganandhan, M., and
G. Watson, "Content Distribution Network Interconnection
(CDNI) Requirements", July 2011.
[I-D.narten-iana-considerations-rfc2434bis]
Narten, T. and H. Alvestrand, "Guidelines for Writing an
IANA Considerations Section in RFCs",
draft-narten-iana-considerations-rfc2434bis-09 (work in
progress), March 2008.
[RFC2629] Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629,
June 1999.
[RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC
Text on Security Considerations", BCP 72, RFC 3552,
July 2003.
Appendix A. Additional Stuff
Jin, et al. Expires July 21, 2012 [Page 15]
Internet-Draft CDNi Content De-duplication Optimization January 2012
Authors' Addresses
WeiYi Jin
ZTE Corporation
Nanjing, 210012
China
Phone: +86 025-52871364
Email: jin.weiyi@zte.com.cn
Wei Wang
ZTE Corporation
Nanjing, 210012
China
Phone: +86 025-88014631
Email: wang.wei108@zte.com.cn
ZhenWu Hao
ZTE Corporation
Nanjing, 210012
China
Phone: +86 025-52871304
Email: hao.zhenwu@zte.com.cn
Yu Meng
ZTE Corporation
Nanjing, 210012
China
Phone: +86 025-88014632
Email: meng.yu@zte.com.cn
Jin, et al. Expires July 21, 2012 [Page 16]