Internet DRAFT - draft-peng-p2psip-one-hop-plugin
draft-peng-p2psip-one-hop-plugin
P2PSIP Working Group J. Peng
Internet Draft Q. Yu
Intended status: Standards Track China Mobile
Expires: August 20 2013 Y. Li
Beijing University of Posts and
Telecommunications
Feburary 16, 2013
One Hop Lookups Algorithm Plugin for RELOAD
draft-peng-p2psip-one-hop-plugin-03
Abstract
This document defines a specific Topology Plugin using a ramification
of the basic One Hop Lookups based DHT algorithm which is called ONE-
HOP-RELOAD. In the One Hop Lookups algorithm, each peer maintains a
full routing table containing information about every node on the
overlay in order to route RELOAD message in one hop. Compared with
CHORD-RELOAD algorithm, ONE-HOP-RELOAD improves the routing
efficiency, and can maintain complete membership information with
reasonable bandwidth requirements. This algorithm is able to handle
frequent membership changes by superimposing a well-defined hierarchy
on the system that guarantees topology disturbance events
notification reach every peer in the overlay within a specified
amount of time. Currently some typical peer-to-peer storages systems
have stringent latency requirements, such as Amazon's Dynamo which is
built for latency sensitive applications uses One-Hop algorithm, so
that each node maintains enough routing information locally to route
a request to the appropriate node directly.
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
This Internet-Draft will expire on August 20, 2013.
Peng, et al. Expires August 20, 2013 [Page 1]
Internet-Draft One Hop Lookups Plugin Feb 2013
Copyright Notice
Copyright (c) 2012 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents carefully,
as they describe your rights and restrictions with respect to this
document. Code Components extracted from this document must include
Simplified BSD License text as described in Section 4.e of the Trust
Legal Provisions and are provided without warranty as described in
the Simplified BSD License.
Table of Contents
1. Introduction ................................................ 3
2. Terminology ................................................. 4
3. Hash function ............................................... 5
4. Network Architecture......................................... 5
5. Peer data structure ......................................... 6
5.1. Routing Table .......................................... 6
5.2. Peer Type .............................................. 6
5.2.1. Peer Type Structure ............................... 7
5.3. Region ID .............................................. 8
5.4. Special Identification Information ..................... 8
6. Routing ..................................................... 9
6.1. Next-Hop Selection Mechanism ........................... 9
6.2. Fault Tolerance........................................ 10
6.2.1. Resource-ID Routing Failure ...................... 11
6.2.1.1. Next-Hop Peer Leaving ....................... 11
6.2.1.2. New Peer Joining ............................ 12
6.2.2. Node-ID Routing Failure .......................... 13
6.2.2.1. Next-Hop Peer Leaving ....................... 13
6.2.2.2. Next-Hop Client Leaving ..................... 14
7. Replica Placement Policy ................................... 14
8. Joining .................................................... 15
8.1. Joining Process........................................ 15
8.2. Joinreq Message Structure ............................. 18
9. Leaving .................................................... 18
9.1. Handling Neighbor Failures ............................ 18
9.2. Leavereq Message Structure ............................ 20
10. Updates ................................................... 22
10.1. Topology Anomaly Detection ........................... 22
10.2. Topology Disturbance Propagation Procedure ........... 22
Peng, et al. Expires August 20, 2013 [Page 2]
Internet-Draft One Hop Lookups Plugin Feb 2013
10.3. Updatereq Message Structure .......................... 23
10.3.1. Update Types .................................... 23
10.3.2. Routing Information ............................. 24
10.3.3. Event notification .............................. 25
10.3.4. ONE-HOP-RELOAD Update Data ...................... 27
11. Security Considerations ................................... 28
12. IANA Considerations........................................ 28
13. References ................................................ 28
13.1. Normative References ................................. 28
13.2. Informative References ............................... 28
14. Acknowledgments ........................................... 29
Authors' Addresses ............................................ 30
1. Introduction
RELOAD [I-D.ietf-p2psip-base] is a peer-to-peer (P2P) signaling
protocol for use on the Internet. RELOAD is explicitly designed to
work with a variety of overlay algorithms, each implementation of
that is provided by a Topology Plugin so that each overlay can select
an appropriate overlay algorithm that relies on the common RELOAD
core protocols and code. In the RELOAD base protocol, the Topology
Plugin is defined by a DHT based on Chord, and this specification
defines a new DHT based on One Hop Lookups which provides a high
routing efficiency and can handle frequent membership changes in a
way that has reasonable bandwidth consumption.
Structured peer-to-peer overlay network like Chord, Pastry, CAN
provide a substrate for building large-scale distributed applications.
These overlays allow applications to locate objects stored in the
system in a limited number of overlay hops. These peer-to-peer lookup
algorithms strive to maintain a small amount of per-node routing
state, typically O (log N). All these O (log N) algorithms incur less
maintenance traffic on large or high-churn networks but need nearly O
(log N) steps to find one peer or resource, if there are a large
number of peers in the overlay, it costs a lot time to locate peer or
resource which may have a bad influence on the application level of
RELOAD.
The experiment results [One-Hop-Lookups] show that the peer-to-peer
system which uses the One Hop Lookups algorithm can route very
efficiently even though the system is large and membership is
changing rapidly. They show analytic results to prove that the whole
routing table maintenance bandwidth requirements including the
topology anomaly detection and topology disturbance propagation are
small enough to make most participants in a 10^5 system, and the load
on the peer in the overlay increases linearly with the size of the
system. For example, in a system with 10^5 nodes, the load on an
ordinary peer is 3.84 kbps and the load on a slice leader is 35 kbps
Peng, et al. Expires August 20, 2013 [Page 3]
Internet-Draft One Hop Lookups Plugin Feb 2013
upstream. In the simulation experiments described in the paper, they
assume dynamic membership behavior (i.e., node joining and leaving)
of the peer-to-peer system as in Gnutella, which is representative of
an open Internet environment. From the study of Gnutella [Gnutella],
we can draw the conclusion that there are about 20 and 200 membership
change events per second, in a system with 10^5 and 10^6 peers
respectively.
The real time communication has a high demand on the routing
efficiency, for example, the VoIP usage on the application level of
RELOAD, it depends on the looking up efficiency of the Overlay
Topology. So, in a relative small and stable network like a low-churn
telecom core net with VoIP applications, the O (1) algorithms as a
Topology Plugin of RELOAD is beneficial.
Currently some typical peer-to-peer storages systems have stringent
latency requirements, such as Amazon's Dynamo which is built for
latency sensitive applications uses One-Hop algorithm, so that each
node on the overlay maintains enough routing information locally to
route a request to the appropriate node directly.
The algorithm described in this document is assigned the name ONE-
HOP-RELOAD to indicate it is an adaptation of the basic One Hop
algorithm based DHT. This algorithm uses the core techniques of the
original One Hop Lookups, and has been adapted to RELOAD protocol. In
the One Hop Lookups algorithm, each peer maintains a complete
description of system memberships, and it uses dissemination trees to
propagate event information to update all the peers' routing table
within several seconds so that peers can maintain their own
membership information accurately with low communications costs.
In this draft, the network architecture and the routing information
which peers maintain in the ONE-HOP-RELOAD are first described. Then
the replication and fault tolerance strategies are stated. At last,
the overlay specific data structures into the RELOAD frame are filled,
and some procedures with topology messages defined in the RELOAD
Topology Plugin are implemented. All of the definitions and
implementations based on the requirements and methods provided by
RELOAD.
2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC-2119 [RFC2119].
Peng, et al. Expires August 20, 2013 [Page 4]
Internet-Draft One Hop Lookups Plugin Feb 2013
We use the terminology and definitions from the RELOAD Base Protocol
[I-D.ietf-p2psip-base] extensively in this document.
3. Hash function
In this One Hop Lookups based topology plugin, the size of the Node-
ID and Resource-ID is 128 bits. The hash of a Resource-ID MUST be
computed by SHA-1 [RFC3174].
4. Network Architecture
Like in Chord, ONE-HOP-RELOAD uses a ring topology, and the successor
and predecessor of a peer with Node-ID i refer to the first peer
clockwise and counterclockwise respectively from peer i in the ring.
A peer, n, is responsible for a particular Resource-ID k if k is less
than or equal to n and k is greater than p, where p is the Node-ID of
the predecessor of peer n. Care must be taken when computing to note
that all math is modulo 2^128.
On this basis, we construct a three layered DHT to form dissemination
trees which are used to propagate topology disturbance event
information, i.e., joins and leaves. It is suggested that the scale
of peer-to-peer system is pre-configured manually, and we impose this
hierarchy on the system with dynamic membership by dividing the 128-
bit circular identifier space into k equal contiguous intervals
called slices, the ith slice contains all nodes currently in the
overlay whose node identifiers lie in the range [i*2^128/k,
(i+1)*2^128/k) [OneHopLookups].
Each slice has a slice leader, which is chosen dynamically as the
node that is the successor of the mid-point of the slice identifier
space in the One Hop Lookups Algorithm, i.e., the slice leader of the
ith slice is the successor node of the key (i+1/2)*2^128 /k. But
frequent slice leader changing due to joining or leaving of peers
will cause higher network traffic expenses, because all the peers in
the slice and other slice leaders need to modify the configuration
and to establish a new connection with the new slice leader.
Therefore, it is better to choose the peer with high reliability and
availability peer to become the slice leader, and assign the peer a
fixed Node-ID which is the successor of the mid-point of the slice
identifier space in order to reduce the dynamic slice leader
replacement.
Similarly, each slice is divided into equal-sized intervals called
units. Each unit has a unit leader, which is dynamically chosen as
the successor of the mid-point of the unit identifier space.
Peng, et al. Expires August 20, 2013 [Page 5]
Internet-Draft One Hop Lookups Plugin Feb 2013
The choice of the number of levels in the hierarchy involves a
tradeoff. A large number of levels imply a larger delay in
propagating the information, whereas a small number of levels
generate a large load at the nodes in the upper levels
[OneHopLookups]. Recommended to choose the three level DHT, because
it leads to reasonable bandwidth consumption and message propagation
delay.
5. Peer data structure
Each peer keeps track of the Whole Routing Table and a neighbor table.
The Whole Routing Table of each node keeps all of nodes?routing
information, so that after receiving the RELOAD request message the
peer can find the destination peer by just one hop if the items of
routing table are correct and the peer is working properly. The
neighbor table contains at least the three peers before and after
this peer in the DHT ring. It is used for topology maintenance and
node anomaly detection. There may not be three entries in any cases,
e.g. in the case of small rings or during changing of the ring
topology.
5.1. Routing Table
The routing table is the union of the neighbor table and the whole
routing table.
Fundamentally, the neighbor table entry contains the Node-IDs of the
predecessors and successors. The Whole Routing Table (WRT) maintains
the routing information of all the nodes in the overlay. The WRT
entry should at least contain the Node-ID, the address information
that may be IPv4 or IPv6 addresses and should include IP addresses,
port numbers and Resource-ID range which the peer is responsible for.
5.2. Peer Type
In the One Hop Lookups algorithm, whenever one peer detects a change
in membership (its successor failed or it has a new successor), it
will notify its local slice leader by sending an event notification
message as soon as possible. The local slice leader collects all
event notifications it receives from its own slice and disseminates
these notifications to other slice leaders. At the same time, the
slice leaders aggregate messages they receive for a short time period
and then dispatch these messages to all unit leaders of their
respective slices. Unit leaders spread these messages to themselves?
successor and predecessor in internal unit.
Peng, et al. Expires August 20, 2013 [Page 6]
Internet-Draft One Hop Lookups Plugin Feb 2013
From above, we can conclude that there are four kinds of peer type in
the One Hop Lookups as follows:
Unit Boundary:
The peer at unit boundaries do not send topology disturbance event
notification message (defined in Update message) to their
neighboring peers outside their unit. This ensures that there is
no redundancy in the communications: a peer will get Update
message only from its neighbor that is one step closer to its unit
leader, so that the Update message just being transferred within
unit. In a unit, message is always flowing from the unit leader to
the both ends of the unit.
Unit Leader:
Unit Leader receives Update Messages from slice leader, and spread
these messages to its successor and predecessor in internal unit.
Slice Leader:
Slice leader is a special peer which is responsible for topology
maintenance and management. It can collect Update Messages from
slice internal and other slices and do other management tasks. Any
peer in the slice can send topology disturbance event notification
message (defined in Update message) to its slice leader to inform
events. Different from SPM mechanism in the SandStone [SandStone],
the slice leader in ONE-HOP-RELOAD participate in the resource
location and discovery procedures, and it is best to be pre-
configured.
Ordinary Peer:
Ordinary peer do not belong to the above three roles, it receives
Update message from its successor or predecessor, and spread
forward the message along the direction. Ordinary peer in the
overlay know its unit leader and slice leader, and when it detects
a change in membership (its predecessor failed or it has a new
predecessor), it will notify its local slice leader.
Each peer in the overlay all need to record their peer type. Note
that unit leader may also be unit boundary while the ring topology is
small.
5.2.1. Peer Type Structure
enum { reserved (0),
Peng, et al. Expires August 20, 2013 [Page 7]
Internet-Draft One Hop Lookups Plugin Feb 2013
ordinary_node (1), unit_boundary(2), unit_leader(3),
slice_leader(4), (255)}
OneHopPeerType;
The OneHopPeerType gives an enumeration of peer type which identifies
the different peer role. When a peer is both a unit boundary and a
unit leader, we can use array of the OneHopPeerType which contains
unit_boundary and unit_leader two elements to identify its peer type.
5.3. Region ID
The peer belonging to a specific unit of a specific slice should
record its own location information. In the One Hop Lookups algorithm,
we use Slice-ID and Unit-ID to mark the peer location. The scale of
peer-to-peer system, i.e. the number of hierarchy levels, the number
of slices and units, the Node-ID range of each slice and unit are all
pre-configured manually. When a new peer prepares to join in the
overlay, it can obtain configuration information from the
configuration server which is responsible for assigning Node-IDs and
providing Node-ID range forms for each slice and each unit. Then, the
joining peer can compute its own Region ID by using the routing
information obtained from its neighbors in the procedure of joining
the overlay.
typedef opaque SliceId[SliceIdLength];
typedef opaque UnitId[UnitIdLength];
struct {
SliceId slice_id;
UnitId unit_id;
} RegionId;
The struct of RegionId is used to identify the peer location. Both
SliceId and UnitId is fixed-length structure represented as a series
of bytes, with the most significant byte first. The length is set on
a per-overlay basis within the range of 16-20 bytes (128 to 160 bits).
5.4. Special Identification Information
In the One Hop Lookups algorithm, each peer has its own peer type,
such as ordinary node or unit leader. In order to maintain the
accuracy and stability of the overlay layer network architecture,
Peng, et al. Expires August 20, 2013 [Page 8]
Internet-Draft One Hop Lookups Plugin Feb 2013
each peer also needs to maintain additional identification
information which relates to the peer type closely, named as the
special identification information.
Ordinary Node / Unit Boundary:
The ordinary node and unit boundary in the overlay should record
its unit leader and slice leader identification. In addition, it
can also maintain the Node-ID range of each slice and each unit
obtained from the configuration server when joining in the overlay.
Unit Leader:
The identification information maintained by the unit leader and
the ordinary node are basically the same, except that the unit
leader does not need to maintain the Node-ID of unit leader.
Slice Leader:
Slice Leader collects Update messages from its own slice and other
slices and spread these messages to the network according to the
given rules. Thus, each slice leader should record the
identification information about all the slice leaders and unit
leaders within the slice it is responsible for in order to
dispatch the Update messages.
Each peer needs to maintain appropriate identification information
according to their peer type.
6. Routing
6.1. Next-Hop Selection Mechanism
Next-Hop Selection mechanism defines that a peer who receives a
RELOAD message carrying the destination ID, i.e., Node-ID and
Resource-ID, according to its own routing information, routes the
message to the next hop peer on the overlay. Two scenarios, i.e. the
Resource-ID routing and the Node-ID routing, are analyzed here.
Resource-ID routing:
If the peer is not responsible for the purpose Resource-ID k, but
is directly connected to a node with Node-ID k, then it MUST route
the message to that node. Otherwise, it finds the smallest Node-ID
that is greater than k to indicate it should be responsible for
the Resource-ID k; and then route the message to that destination
node.
Peng, et al. Expires August 20, 2013 [Page 9]
Internet-Draft One Hop Lookups Plugin Feb 2013
Node-ID routing:
If the peer is not responsible for the purpose Node-ID k, but is
directly connected to a node with Node-ID k, then it MUST route
the message to that node. Otherwise, it finds the peer in the
Whole Routing Table whose Node-ID equals to k to indicate it is
the destination node k; and then route the request to the peer.
If no such node is found, it finds the smallest Node-ID that is
greater than k to indicate it should be the overlay access node of
the client k; and then route the message to that destination peer.
Note that in the CHORD-RELOAD algorithm [I-D.ietf-p2psip-base], each
peer maintains a finger table which contains only a small fraction of
routing information in the overlay, and peer has established TLS
connections with all the peers in the routing table which is the
union of the neighbor table and the finger table during the process
of joining in the overlay. Thus, for the peer-to-peer system that
uses the CHORD-RELOAD algorithm, the connection table that Link
Management Module in the RELOAD protocol maintains is the set of
nodes to which a node is directly connected, and the peers in the
routing table will all be on the connection table but not vice versa.
The ONE-HOP-RELOAD algorithm differs from the CHORD-RELOAD algorithm.
It gets each peer to maintain the Whole Routing Table, so when the
system scales to million peers, the routing table will be very large.
If the peer sets up TLS connections with all the other peers in the
overlay, it will consume a large amount of network resources.
Therefore, establishing full connections is not recommended. In the
ONE-HOP-RELOAD algorithm, a peer only needs to select some related
peers to establish TLS connections according to the needs of the
network architecture, thus the connection table is a subset of the
Whole Routing Table.
The consequence of changing the relationship between routing table
and connection table is that the peer chosen by the Next-Hop Node
Selection mechanism may not have an active TLS connection with the
source node. Therefore, the peer may need to set up a connection
before routing message to the destination node.
6.2. Fault Tolerance
Topology disturbance which is triggered by membership change events,
i.e., joining and leaving, may lead to a one hop routing failure. If
the source peer that prepares to send message to the next hop peer
does not update its routing information in time when the topology
Peng, et al. Expires August 20, 2013 [Page 10]
Internet-Draft One Hop Lookups Plugin Feb 2013
disturbance occurs, it may route messages to an incorrect peer or a
peer which has withdrawn from the overlay.
Two scenarios, i.e. the Resource-ID routing and the Node-ID routing,
are analyzed here. In Resource-ID routing scenario, the disturbances
caused by peer joining and leaving are separately analyzed. In
Resource-ID routing scenario, the disturbances caused by peer leaving
and client leaving are separately analyzed. Note that Figure 1 is a
peer-to-peer system schematic.
+--------+ +--------+ +--------+
| Peer 10|--------------| Peer 20|--------------| Peer 30|
+--------+ +--------+ +--------+
| |
| |
+--------+ +--------+
| Peer 90| | Peer 50|
+--------+ +--------+
| |
| |
+--------+ +--------+ +--------+
| Peer 80|--------------| Peer 70|--------------| Peer 60|
+--------+ +--------+ +--------+
|
|
+-----------+
|Resource 65|
+-----------+
Figure 1 peer-to-peer system schematic
6.2.1. Resource-ID Routing Failure
6.2.1.1. Next-Hop Peer Leaving
When a peer leaves the peer-to-peer system, the Resource-ID k which
the peer is responsible for will be taken over by its successor. At
the same time, a source peer in the overlay wants to route the RELOAD
message to the peer who manages Resource-ID k and has just left the
network. The message will then be routed to a non-existent peer so
that the source peer will receive an error RELOAD response message,
and the message Error Code name is Error_Request_Timeout which means
that the request time is out or the target peer is unreachable.
For instance, Peer 70 has left the overlay, and its Resource 65 has
been taken over by Peer 80. At this time, Peer 20 want to send a
RELOAD message to the peer who is responsible for the Resource 65,
but its routing table has not been updated due to this topology
Peng, et al. Expires August 20, 2013 [Page 11]
Internet-Draft One Hop Lookups Plugin Feb 2013
changes, then Peer 20 would route the RELOAD message to the old Peer
70, and this one hop routing to the Peer 70 will fail due to Peer 70
leaving. After that, Peer 20 will receive an error RELOAD response
message whose Error Code name is Error_Request_Timeout.
Peer 20 who receives this error message can take two response
measures. If using reactive fault tolerance mechanism, it then sends
an immediate RELOAD message to the successor Peer 80 of the failed
Peer 70. Otherwise, it should wait for receiving a topology
disturbance message to update its own routing information, and then
resend the RELOAD message.
+--------+ +--------+ +--------+
| Peer 10|--------------| Peer 20|--------------| Peer 30|
+--------+ +--------+ +--------+
| |
| |
+--------+ +--------+
| Peer 90| | Peer 50|
+--------+ +--------+
| |
| |
+--------+ +--------+ +--------+
| Peer 80|--------------------------------------| Peer 60|
+--------+ +--------+ +--------+
|
|
+-----------+
|Resource 65|
+-----------+
6.2.1.2. New Peer Joining
When a peer joins the peer-to-peer system, it will take over the
Resource-ID k which its successor node is responsible for. At the
same time, a source peer in the overlay wants to route the RELOAD
message to the peer who manages Resource-ID k during the past and has
just transferred Resource-ID k to the new online peer, then the
message will be routed to an old and incorrect peer so that the
source peer may receive an error RELOAD response message, and the
message Error Code Name is Error_Not_Found which means that target
resource is not part of the destination peer management. Another way
to deal with the new peer joining is that the old peer forwards the
RELOAD message to the new joining peer directly, and this mechanism
will increase the numbers of routing hops.
For instance, Peer 68 has just joined the overlay network and taken
over the Resource 65 from successor Peer 70. At this time, Peer 20
want to send a RELOAD message to the peer who is responsible for the
Peng, et al. Expires August 20, 2013 [Page 12]
Internet-Draft One Hop Lookups Plugin Feb 2013
Resource 65, but its routing table has not been updated due to this
topology changes, then Peer 20 would route the RELOAD message to the
old Peer 70.
Peer 70 who receives this error message can take two response
measures. If using routing forwarding schema, Peer 70 would directly
forward the message to the new joining Peer 68 who is responsible for
the Resource 65. Otherwise, Peer 70 would return an error response to
Peer 20 whose Error Code name is Error_Not_Found.
+--------+ +--------+ +--------+
| Peer 10|--------------| Peer 20|--------------| Peer 30|
+--------+ +--------+ +--------+
| |
| |
+--------+ +--------+
| Peer 90| | Peer 50|
+--------+ +--------+
| |
| |
+--------+ +--------+ +--------+ +--------+
| Peer 80|------| Peer 70|-----| Peer 68|-------| Peer 60|
+--------+ +--------+ +--------+ +--------+
|
|
+-----------+
|Resource 65|
+-----------+
6.2.2. Node-ID Routing Failure
6.2.2.1. Next-Hop Peer Leaving
The source peer in the overlay route the RELOAD message to the Node-
ID m of the peer who has left the network, then message will be
routed to a non-existent peer so that the source peer will receive an
error RELOAD response message, and the message Error Code name is
Error_Request_Timeout which means that the request time is out or the
target peer is unreachable.
For instance, Peer 70 has left the overlay network. At this time,
Peer 20 want to send a RELOAD message to the Peer 70, but its routing
table has not been updated due to this topology changes, then Peer 20
would route the RELOAD message to the old Peer 70, and this one hop
routing to the Peer 70 will fail due to Peer 70 leaving. After that,
Peer 20 will receive an error RELOAD response message whose Error
Code name is Error_Request_Timeout.
Peng, et al. Expires August 20, 2013 [Page 13]
Internet-Draft One Hop Lookups Plugin Feb 2013
+--------+ +--------+ +--------+
| Peer 10|--------------| Peer 20|--------------| Peer 30|
+--------+ +--------+ +--------+
| |
| |
+--------+ +--------+
| Peer 90| | Peer 50|
+--------+ +--------+
| |
| |
+--------+ +--------+ +--------+
| Peer 80|--------------------------------------| Peer 60|
+--------+ +--------+ +--------+
|
|
+----------+
| Client 65|
+----------+
6.2.2.2. Next-Hop Client Leaving
The source peer in the overlay route the RELOAD message to the Node-
ID m of the client who has left the network, then message will be
routed to the Overlay Access Peer that is responsible for the leaving
client m.
This Overlay Access Peer who receives the RELOAD message can take two
response measures. If using reactive fault tolerance mechanism, it
then response an error RELOAD response message, and the message Error
Code name is Error_Not_Found which means that target resource is not
part of the destination peer management. Otherwise, the source peer
will receive an error RELOAD response message, and the message Error
Code name is Error_Request_Timeout which means that the request time
is out or the target peer is unreachable.
7. Replica Placement Policy
To achieve high availability and reliability, ONE-HOP-RELOAD
replicates data in multiple peers, three on default. The replica
placement policy has two kinds of design scheme.
1. Two backup data stored in two successors.
Peng, et al. Expires August 20, 2013 [Page 14]
Internet-Draft One Hop Lookups Plugin Feb 2013
2. The way defined in SandStone [SandStone] can also be used. The
first data is stored in the primary peer; the second replica is
saved in a peer of different unit but in the same slice; and the
third replica is saved in a peer in a different slice. The most
important issue of this scheme is the choosing of the replica peer.
Recommended that two levels of strip segmentation based ID
assignment mechanism can be used to allocate Node-ID and choose
backup nodes.
To guarantee the consistency among multiple replicas is a difficult
but inevitable task. When a peer receives a Store request for
Resource-ID k, and it is responsible for Resource-ID k, it MUST store
the data and returns a success response and then send a Store request
to its backup nodes to maintain replicas consistency. This part is
beyond the scope of the present document, specific details can be
found in reference papers.
The topology disturbance always leads to the changes of data backup
relationships between several the data storage peers. If using
replica placement policy similar to the SandStone, the distance
between the master data storage peer and backup peer may be far,
therefore the procedure of data migration caused by the topology
disturbance may not be triggered immediately. There are two kinds of
trigger the procedure of data migration strategy:
1. Reactive Notification: The joining or leaving peer or the peer who
detected topology disturbances should take the initiative to
inform the affected backup node, and then trigger the data
migration.
2. Passively Waiting: When the peer received topology change event
notification message, it should test whether its own data backup
relationship is affected or not. If affected, it will trigger data
migration process.
8. Joining
8.1. Joining Process
The joining process for a joining peer (JP) with Node-ID n is as
follows:
1. JP MUST connect to its chosen or preconfigured bootstrap node.
2. JP SHOULD send an Attach request to its admitting peer (AP) for
Node-ID n. The "send_update" flag should be used to acquire the
routing table and other information from AP.
Peng, et al. Expires August 20, 2013 [Page 15]
Internet-Draft One Hop Lookups Plugin Feb 2013
3. JP determines its peer type and Region-ID according to the routing
information of AP. If JP is an ordinary peer, it should send
Attach requests to initiate connections to each of the peers in
the neighbor table. After establishing connections with all the
peers in the neighbor table, JP MUST record all the peers it has
contacted into the neighbor table. If JP is other type of peer, it
will perform some additional processes for a special peer type
described as follows:
Unit Boundary:
If JP is the Unit Boundary where AP locates, AP is the old unit
boundary and JP will replace its role. Then JP piggybacks its
own peer types on the Update message to AP in order to inform
AP to convert peer role.
If JP and AP are not in the same unit or even slice, and JP and
PP are in the same unit, JP will replace the role of PP which
is the old Unit Boundary where JP locates.
If JP is the new Unit Leader and there is no peer within the
unit, JP also is the new Unit Boundary of this unit.
Unit Leader:
If JP is the new Unit Leader and AP is in the same unit with JP,
AP is the old leader who is about to be replaced. JP informs AP
to convert the peer type of AP and add Node-ID of the new Unit
Leader into the routing information of AP. Then JP should
notify its slice leader to modify the unit leader
identification information and trigger the topology disturbance
procedure.
If there is no peer within the unit and JP is both the new Unit
Leader and the new Unit Boundary whose Node-ID is pre-
configured manually, JP should notify its slice leader to
modify the unit leader identification information and trigger
the topology disturbance procedure.
Slice Leader:
If JP is the new Slice Leader where AP locates, AP is the old
leader who is about to be replaced. JP informs AP to convert
its peer type and modify its special identification information.
If using reactive scheme, JP should send Attach message to
establish TLS connection with all the other slice leaders and
the whole peers in the slice.
Peng, et al. Expires August 20, 2013 [Page 16]
Internet-Draft One Hop Lookups Plugin Feb 2013
If there is no peer within the slice and JP is the new Slice
Leader whose Node-ID is pre-configured manually, JP needs to
obtain the special identification information of Slice Leader
from the configuration server, and wait for peer joining its
slice.
4. If JP is not new slice leader, it MUST send an Attach request to
the slice leader where JP locations in order to timely report
topology changing information to its slice leader.
5. JP MUST send a Join to AP, the AP sends the response to the Join.
6. AP MUST do a series of Store requests to JP to store the data that
JP will be responsible for.
7. AP MUST send JP an Update explicitly labeling JP as its
predecessor. At this point, JP is part of the ring and responsible
for a section of the overlay. AP can now forget any data which is
assigned to JP and not AP.
8. JP piggybacks its routing information on the Update message to AP.
AP should start topology change event propagation process. The
process is described in section 10.2.
If JP sends an Attach to other peers in the overlay with send_update,
it will receive the Update messages from the other peers which carry
the routing information of other peers, including the type of the
peer, its region, and its own neighbor table. JP can construct its
own routing information from these information, and perform related
procedures to join the overlay and play its own role.
Note that in the CHORD-RELOAD algorithm [I-D.ietf-p2psip-base], each
peer keeps track of a finger table and a neighbor table, and peer has
established TLS connections with all the peers in the routing table
which is the union of the neighbor table and the finger table in the
process of joining in the overlay. But the ONE-HOP-RELOAD algorithm
differs from the CHORD-RELOAD. It defines that each peer maintains
the Whole Routing Table so that the routing information is too large
that the peer can't establish connections with all the peers in the
Whole Routing Table. Therefore, in the joining process described
above, we select some related peers in the overlay to establish
connections with the joining peer according to the different peer
type.
Peng, et al. Expires August 20, 2013 [Page 17]
Internet-Draft One Hop Lookups Plugin Feb 2013
8.2. Joinreq Message Structure
Before sending Joinreq message to AP, JP needs to construct its own
routing information and send Attach messages to related peers. If its
peer type is special, it also should trigger some relevant peers to
perform peer type conversion procedure. After completing these tasks,
JP sends Joinreq message to AP, informing AP that it has been
completed the preparatory work to join the overlay, and can start to
take over the overlay data which it is responsible for.
The overlay_specific_data field in Joinreq message MUST contain the
OneHopJoinData structure defined below:
struct {
OneHopPeerType joining_peer_type;
RegionId
region_id;
IpAddressPort joining_peer_address;
} OneHopJoinData;
The contents of this structure are as follows:
joining_peer_type:
The peer type of JP.
region_id:
The identification of the region where JP locates.
joining_peer_address:
The address information of JP may be IPv4 or IPv6 addresses. If AP
receives Joinreq message carried the address information of JP, it
should add the information into its own routing table.
AP which receives a Joinreq message from JP should check whether the
message is correct and return a success response if correct.
9. Leaving
9.1. Handling Neighbor Failures
Every time the peer in the overlay finds that its neighboring peer
has already left the network or is ready to leave (maybe as
Peng, et al. Expires August 20, 2013 [Page 18]
Internet-Draft One Hop Lookups Plugin Feb 2013
determined by connectivity pings or the Leave message from the
leaving peer), it MUST perform the following tasks:
o Obtaining the routing information of leaving node. When the peer
finds that its neighbor has left the network abnormally, it needs
to calculate the routing information of the leaving peer according
to its local routing information.
o Updating its routing information, including removing the entry
from its neighbor table and replacing it with the best match peer
in its own Whole Routing Table.
o If the peer is the successor of the leaving peer, it also should
start the topology disturbance propagation procedure to report and
propagate the topology disturbance event. The process is described
in section 10.2. It is also recommended that multiple relevant
peers should simultaneously report the events to its slice leader,
in order to avoid the loss of topology disturbance information.
o If the data management or backup relationship changes, triggering
the corresponding data migration procedure.
If the type of the leaving peer is special, it will perform some
additional processes for a special peer type described as follows:
Unit Boundary:
LP is the Unit Boundary where its predecessor or successor
locations, then one of the two neighbors will become the new unit
boundary and replace the role of LP.
Unit Leader:
If JP is the Unit Leader where its successor NP locates, NP will
become the new Unit Leader who will replace the role of LP. NP
piggybacks Unit Leader replacement information on the Update
message to the slice leader. The slice leader received this Update
message would modify the unit leader identification information.
Slice Leader:
If JP is the Slice Leader where its successor NP locates, NP will
become the new Slice Leader who will replace the role of LP. If
using reactive scheme, the new slice leader NP should send Attach
messages to establish TLS connections with all the other slice
leaders and the whole peers in its own slice; and then send the
topology disturbance event information to the peers in its own
Peng, et al. Expires August 20, 2013 [Page 19]
Internet-Draft One Hop Lookups Plugin Feb 2013
slice and all the other slice leaders. Note that the successor NP
MUST maintain the replica of routing information of LP which
contains the identifications of all the slice leaders and unit
leaders in the slice.
9.2. Leavereq Message Structure
Peers SHOULD send a Leave request to all members of their neighbor
table prior to exiting the Overlay Instance. The
overlay_specific_data field MUST contain the OneHopLeaveData
structure defined below:
struct {
NodeId predecessors<0...2^16-1>;
NodeId successors<0...2^16-1>;
} Neighbors;
struct {
OneHopNodeType
predecessors<0...2^16-1>;
NodeId successors<0...2^16-1>;
} OneHopLeaveData;
struct {
OneHopPeerType
leaving_peer_type;
RegionId
region_id;
Neighbors neighbor_table;
select (leaving_peer_type) {
case ordinary_peer:
NodeId unit_leader;
NodeId slice_leader;
case unit_leader:
NodeId slice_leader;
case slice_leader:
NodeId unit_leader_list<0...N>;
NodeId slice_leader_list<0...N>;
}
} OneHopLeaveData;
The contents of this structure are as follows:
region_id:
The identification of the region where JP locates.
Peng, et al. Expires August 20, 2013 [Page 20]
Internet-Draft One Hop Lookups Plugin Feb 2013
neighbor_table:
The neighbor_table contains all the Node-IDs of the current
entries in the neighbor table except for the leaving one.
The 'leaving_peer_type' field indicates the type of the leaving peer:
If the type of the leaving peer is 'ordinary_peer' the contents will
be:
unit_leader
The Node-ID of the Unit Leader of the unit where leaving peer
locates.
slice_leader
The Node-ID of the Slice Leader of the slice where leaving peer
locates.
If the type of the leaving peer is 'unit_leader' the contents will
be:
slice_leader
The Node-ID of the Slice Leader of the slice where leaving peer
locate s.
If the type of the leaving peer is 'slice_leader' the contents will
be:
unit_leader_list
The Node-ID list of each Unit Leader in the slice which the the
leaving Slice Leader manages.
slice_leader_list
The Node-ID list of all the other Slice Leader.
When a peer is ready to leave the overlay, it takes the initiative to
send a Leave message to all peers in its own neighbor table. Any peer
which receives a Leave for a peer in its neighbor set follows
procedures as if it had detected a peer failure as described in
Section 9.1.
Peng, et al. Expires August 20, 2013 [Page 21]
Internet-Draft One Hop Lookups Plugin Feb 2013
10. Updates
10.1. Topology Anomaly Detection
A peer MUST maintain connections with all the peers in its neighbor
table. A peer MUST try to maintain at least three predecessors and
successors at least. In the CHORD-RELOAD algorithm [I-D.ietf-p2psip-
base], it is RECOMMENDED that O (log (N)) predecessors and successors
be maintained in the neighbor table.
Every peer in the overlay including Slice Leader MUST periodically
send a keep-alive message (defined by Update message) to all the peer
in its neighbor table. The purpose of this is to keep the predecessor
and successor lists up to date and to detect failed peers. The
default time is about every ten minutes. A peer SHOULD randomly
offset these Update requests so they do not occur all at once.
When detecting the topology anomaly, the peer follows procedures as
described in Section 9.1.
10.2. Topology Disturbance Propagation Procedure
The successor of the joining peer or leaving peer will trigger the
topology disturbance propagation procedure that realizes topology
change event spread in the network in accordance with the given
direction. The purpose of this is to keep the Whole Routing Table up
to date and to adjust the network topology.
The topology disturbance propagation procedure for the successor with
Node-ID n who detects a topology change, i.e., new predecessor
joining and old predecessor leaving, as follows:
1. The successor MUST send an Update message which piggybacks the
topology change event information to its slice leader directly.
2. The slice leader MUST collect all topology change events it
receives from the peers in its own slice and aggregates them
during several seconds (default time is about 20 seconds); and
then sends Update messages to the other slice leaders to spread
these events. Best not to send Update messages to other slice
leaders at the same time.
3. The slice leader waits for a short minutes (default time is about
10 seconds), and aggregates all Update messages received during
this period; and then dispatch the Update message to all the unit
leaders in its own slice.
Peng, et al. Expires August 20, 2013 [Page 22]
Internet-Draft One Hop Lookups Plugin Feb 2013
4. A unit leader SHOULD forward the Update message to its successor
and predecessor.
5. Other peers propagate this Update message in one direction: if
they receive from their successors, they SHOULD send it to their
successors and vice versa. Note that the Unit Boundaries SHOULD
NOT send Update messages to their neighbor peers outside their
unit.
10.3. Updatereq Message Structure
The purpose of using Update message is to piggyback the routing
information of the peer or to spread the topology disturbance events
information on the overlay. Thus, Update message can carry two kinds
of content: routing information of the peer on the overlay and
topology disturbance event information, i.e., peer joining and peer
leaving.
The Update message for the ONE-HOP-RELOAD is defined as follows.
10.3.1. Update Types
enum { reserved (0),
routing_info (1), event_notification(2), (255)}
UpdateType;
The 'UpdateType' gives an enumeration of Update message including
'routing_info' and 'event_notification'
routing_info
The routing_info message piggybacks the routing information of
the message sending peer, including the routing table which may
be the union of the Whole Routing Table and neighbor table or
only the neighbor table, peer type, region ID and special
identification information.
event_notification
The event_notification message will take a list of topology
disturbance events information which indicates topology
changing in the overlay and may trigger peers to modify their
routing information.
Peng, et al. Expires August 20, 2013 [Page 23]
Internet-Draft One Hop Lookups Plugin Feb 2013
10.3.2. Routing Information
enum { reserved (0),
full (1), peer_info(2), (255)}
RoutingInfoType;
The 'RoutingInfoType' gives an enumeration of routing_info including
'full' and 'peer_info' it identifies the different types of
routing_info message:
full
The kind of routing_info message piggybacks all the routing
table of the message sending peer, including the Whole Routing
Table and neighbor table.
peer_info
The kind of routing_info message piggybacks only the neighbor
table of the message sending peer and some other identification
information.
struct {
NodeId peer_id;
IpAddressPort address;
} OneHopRoutingInfo;
The struct of 'OneHopRoutingInfo' is used to construct the Whole
Routing Table entry. The WRT entry should at least contain the Node-
ID and the address information that may be IPv4 or IPv6 addresses:
peer_id
The Node-ID of the peer in the overlay.
address
The address information that may be IPv4 or IPv6 addresses.
struct {
OneHopPeerType
peer_type;
Peng, et al. Expires August 20, 2013 [Page 24]
Internet-Draft One Hop Lookups Plugin Feb 2013
RegionId region_id;
Neighbors neighbor_table;
select (peer_type) {
case ordinary_peer:
NodeId unit_leader;
NodeId slice_leader;
case unit_leader:
NodeId slice_leader;
case slice_leader:
NodeId unit_leader_list<0...N>;
NodeId slice_leader_list<0...N>;
}
RoutingInfoType routing_info_type;
select (routing_info_type) {
case full:
OneHopRoutingInfo whole_routing_info<0...N>;
case peer_info:
}
} RoutingInfo;
The parameters of this structure are similar to the 'OneHopLeaveData'
struct described in section 9.2, except that the 'RoutingInfo' could
piggyback the Whole Routing Table.
The 'Routing_info_type' field indicates whether the message includes
the WRT or not. If the type of the field is 'Full' the contents will
be:
whole_routing_info
The variable represents the information of peers?routing table
which has all of the peers?information in the overlay.
If the type of the field is 'peer_info? the message does not carry
WRT.
10.3.3. Event notification
enum { reserved (0),
peer_joining (1), peer_leaving(2), (255)}
EventNotificationType;
The 'EventNotificationType' gives an enumeration of topology
disturbance event kinds, including peer joining and peer leaving.
Peng, et al. Expires August 20, 2013 [Page 25]
Internet-Draft One Hop Lookups Plugin Feb 2013
struct {
EventNotificationType event_notification_type;
select (event_notification_type) {
case peer_joining:
OneHopRoutingInfo joining_peer_info;
OneHopPeerType joining_peer_type;
RegionId region_id;
select (joining_peer_type) {
case unit_leader:
RegionId change_region_id;
NodeId old_unit_leader;
case slice_leader:
RegionId change_region_id;
NodeId old_slice_leader;
}
case peer_leaving:
OneHopRoutingInfo leaving_peer_info;
OneHopPeerType leaving_peer_type;
RegionId region_id;
select (leaving_peer_type) {
case unit_leader:
RegionId change_region_id;
NodeId new_unit_leader;
case slice_leader:
RegionId change_region_id;
NodeId new_slice_leader;
}
}
} EventNotificationItem;
The structure of 'EventNotificationItem' is an item of the
EventNotification message. The peer who receives this item should
refresh its routing information and do some other topology management
tasks.
The contents of this structure are as follows:
The 'Event_notification_type' field indicates the type of the
topology disturbance event, including 'peer_joining' and
'peer_leaving'
If the type of the event is 'peer_joining' the contents will be:
joining_peer_info
Peng, et al. Expires August 20, 2013 [Page 26]
Internet-Draft One Hop Lookups Plugin Feb 2013
The entry corresponds to the Node-ID of joining peer in the
Whole Routing Table
region_id
The identification of the region where joining peer locates.
The 'joining_peer_type' field indicates the peer type of the
joining peer whose type is special, including 'slice_leader' and
'unit_leader'
If the type of the joining peer is 'unit_leader' the contents
will be:
change_region_id
The identification of the region where joining peer locates.
The joining peer will take the place of the old Unit Leader.
old_unit_leader
The Node-ID of the old Unit Leader who has been replaced by the
joining peer.
If the type of the joining peer is 'slice_leader' the contents
will be:
change_region_id
The identification of the region where joining peer locates.
The joining peer will take the place of the old Slice Leader.
old_slice_leader
The Node-ID of the old Slice Leader who has been replaced by
joining peer.
If the type of the event is 'peer_leaving' the contents are similar
to the parameters of 'peer_joining' event, except that when the
leaving peer is the old Unit Leader or Slice Leader, the
corresponding parameters specify the Node-ID of the new Unit Leader
or Slice Leader.
10.3.4. ONE-HOP-RELOAD Update Data
struct {
UpdateType update_type;
Peng, et al. Expires August 20, 2013 [Page 27]
Internet-Draft One Hop Lookups Plugin Feb 2013
Select (update_type) {
case routing_info:
RoutingInfo routing_info;
case event_notification:
EventNotificationItem event_notification_list<0...N>;
}
} OneHopUpdateData;
This structure is the Update message used in the ONE-HOP-RELOAD. The
Update message is composed by two kinds of data. One of them is the
RoutingInfo structures, and the other is the EventNotification
structures. All the peers in the overlay can use this Update message
to carry routing information or spread topology disturbance event
information.
11. Security Considerations
There is no specific security consideration associated with this
draft.
12. IANA Considerations
There are no IANA considerations associated to this memo.
13. References
13.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2234] Crocker, D. and Overell, P.(Editors), "Augmented BNF for
Syntax Specifications: ABNF", RFC 2234, Internet Mail
Consortium and Demon Internet Ltd., November 1997.
[I-D.ietf-p2psip-base]
Jennings, C., Lowekamp, B., Rescorla, E., Baset, S., and H.
Schulzrinne, "REsource LOcation And Discovery (RELOAD) Base
Protocol", draft-ietf-p2psip-base-15 (work in progress),
May 2011.
13.2. Informative References
[RFC3174] Eastlake, D. and P. Jones, "US Secure Hash Algorithm 1
(SHA1)", RFC 3174, September 2001.
Peng, et al. Expires August 20, 2013 [Page 28]
Internet-Draft One Hop Lookups Plugin Feb 2013
[One-Hop-Lookups]
Gupta, A., Liskov, B., and R. Rodrigues, "One Hop Lookups
for Peer-to-Peer Overlays", June 2003.
[SandStone]
Shi, G., Chen, J., Gong, H., Fan, L., Xue, H., Lu, Q., and
L. Liang, "SandStone: A DHT based Carrier Grade Distributed
Storage System", September 2009.
[Gnutella]
S. Saroiu, P. K. Gummadi, and S. D. Gribble, "A measurement
study of per-to-peer file sharing systems", Jan 2002.
14. Acknowledgments
Peng, et al. Expires August 20, 2013 [Page 29]
Internet-Draft One Hop Lookups Plugin Feb 2013
Authors?Addresses
Jin Peng
China Mobile
Unit 2, 28 Xuanwumenxi Ave,
Xuanwu District
Beijing 100053
P.R.China
Email: pengjin@chinamobile.com
Qing Yu
China Mobile
Unit 2, 28 Xuanwumenxi Ave,
Xuanwu District
Beijing 100053
P.R.China
Email: yuqing@chinamobile.com
Yuan Li
Beijing University of Posts and Telecommunications
10 Xi Tu Cheng Rd.
Haidian District
Beijing 100876
P.R.China
Email: liyuan8903@139.com
Peng, et al. Expires August 20, 2013 [Page 30]