SAM Research Group | J.F. Buford |
Internet-Draft | Avaya Labs Research |
Intended status: Experimental | M. Kolberg, Ed. |
Expires: January 18, 2014 | University of Stirling |
July 17, 2013 |
Application Layer Multicast Extensions to RELOAD
draft-irtf-samrg-sam-baseline-protocol-05
We define a RELOAD Usage for Application Layer Multicast as well as a mapping to the RELOAD experimental message type to support ALM. The ALM Usage is intended to support a variety of ALM control algorithms in an overlay-independent way. Two example algorithms are defined, based on Scribe and P2PCast.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 18, 2014.
Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document.
The concept of scalable adaptive multicast includes both scaling properties and adaptability properties. Scalability is intended to cover:
Adaptability includes
Application Layer Multicast (ALM) has been demonstrated to be a viable multicast technology where native multicast isn't available. Many ALM designs have been proposed. This ALM Usage focuses on: [I-D.ietf-p2psip-base] has an application extension mechanism in which a new type of application defines a Usage. A RELOAD Usage defines a set of data types and rules for their use. In addition, this document describes additional message types and a new ALM algorithm plugin architectural component.
RELOAD
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].
We adopt the terminology defined in section 2 of [I-D.ietf-p2psip-base], specifically the distinction between Node, Peer, and Client.
Overlay network - An application layer virtual or logical network in which end points are addressable and that provides connectivity, routing, and messaging between end points. Overlay networks are frequently used as a substrate for deploying new network services, or for providing a routing topology not available from the underlying physical network. Many peer-to-peer systems are overlay networks that run on top of the Internet. In Figure 1, "P" indicates overlay peers, and peers are connected in a logical address space. The links shown in the figure represent predecessor/successor links. Depending on the overlay routing model, additional or different links may be present.
P P P P P ..+....+....+...+.....+... . +P P+ . . +P ..+....+....+...+.....+... P P P P P
Figure 1: Overlay Network Example
Overlay Multicast (OM): Hosts participating in a multicast session form an overlay network and utilize unicast connections among pairs of hosts for data dissemination [BUFORD2009], [KOLBERG2010], [BUFORD2008]. The hosts in overlay multicast exclusively handle group management, routing, and tree construction, without any support from Internet routers. This is also commonly known as Application Layer Multicast (ALM) or End System Multicast (ESM). We call systems which use proxies connected in an overlay multicast backbone "proxied overlay multicast" or POM.
SSM tree: The creator of the tree is the source. It sends data messages to the tree root which are forwarded down the tree.
ASM tree: A node sending a data message sends the message to its parent and its children. Each node receiving a data message from one edge forwards it to remaining tree edges it is connected to.
Peer: an autonomous end system that is connected to the physical network and participates in and contributes resources to overlay construction, routing and maintenance. Some peers may also perform additional roles such as connection relays, super nodes, NAT traversal assistance, and data storage.
Peers connect in a large-scale overlay, which may be used for a variety of peer-to-peer applications in addition to multicast sessions. Peers may assume additional roles in the overlay beyond participation in the overlay and in multicast trees. We assume a single structured overlay routing algorithm is used. Any of a variety of multi-hop, one-hop, or variable-hop overlay algorithms could be used.
Castro et al. [CASTRO2003] compared multi-hop overlays and found that tree-based construction in a single overlay out-performed using separate overlays for each multicast session. We use a single overlay rather than separate overlays per multicast sessions.
An overlay multicast algorithm may leverage the overlay's mechanism for maintaining overlay state in the face of churn. For example, a peer may store a number of DHT (Distributed Hash Table) entries. When the peer gracefully leaves the overlay, it transfers those entries to the nearest peer. When another peer joins which is closer to some of the entries than the current peer which holds those entries, than those entries are migrated. Overlay churn affects multicast trees as well; remedies include automatic migration of the tree state and automatic re-join operations for dislocated children nodes.
The overlay supports concurrent multiple multicast trees. The limit on number of concurrent trees depends on peer and network resources and is not an intrinsic property of the overlay.
We use RELOAD [I-D.ietf-p2psip-base] as the Peer-to-Peer overlay for data storage and the mechanism by which the peers interconnect and route messages. RELOAD is a generic P2P overlay, and application support is defined by profiles called Usages.
Some nodes in the overlay may be in a private address space and behind firewalls. We use the RELOAD mechanisms for NAT traversal. We permit clients to be leaf nodes in an ALM tree.
All tree control messages are routed in the overlay. Two types of data or media topologies are envisioned: 1) tree edges are paths in the overlay, 2) tree edges are direct connections between a parent and child peer in the tree, formed using the RELOAD AppAttach method.
There are two changes as depicted in Figure 2. New ALM messages are mapped to RELOAD Message Transport using the RELOAD experimental message type. A plug-in for ALM algorithms handles the ALM state and control. The ALM Algorithm is under control of the application via the Group API [I-D.irtf-samrg-common-api].
+---------+ |Group API| +---------+ | ------------------- Application ------------------------ +-------+ | | ALM | | | Usage | | +-------+ | -------------- Messaging Service Boundary -------------- | +--------+ +-----------+---------+ +---------+ | Storage|<---> | RELOAD | ALM |<-->| ALM Alg | +--------+ | Message | Messages| +---------+ ^ | Transport | | | +-----------+---------+ v | | +-------------+ | | Topology | | | Plugin | | +-------------+ | ^ | v v +-------------------+ | Forwarding & | | Link Management | +-------------------+ ---------- Overlay Link Service Boundary --------------
Figure 2: RELOAD Architecture Extensions
The ALM components interact with RELOAD as follows:
Applications of RELOAD are restricted in the data types that can be stored in the DHT. The profile of accepted data types for an application is referred to as a Usage. RELOAD is designed so that new applications can easily define new Usages. New RELOAD Usages are needed for multicast applications since the data types in base RELOAD and existing usages are not sufficient.
We define an ALM Usage in RELOAD. This ALM Usage is sufficient for applications which require ALM functionality in the overlay. Figure 2 shows the internal structure of the ALM Usage. This contains the Group API ([I-D.irtf-samrg-common-api]) an ALM algorithm plugin (e.g. Scribe) and the ALM messages which are then sent out to the RELOAD network.
A RELOAD Usage is required [I-D.ietf-p2psip-base] to define the following:
an ALM GroupID is a RELOAD Node-ID. The owner of an ALM group creates a RELOAD Node-ID as specified in [I-D.ietf-p2psip-base]. This means that a GroupID is used as a RELOAD Destination for overlay routing purposes.
Peers use the overlay to support ALM operations such as:
There are a variety of algorithms for peers to form multicast trees in the overlay. The approach presented here permits multiple such algorithms to be supported in the overlay since different algorithms may be more suitable for certain application requirements, and to support experimentation. Therefore, overlay messaging corresponding to the set of overlay multicast operations MUST carry algorithm identification information.
For example, for small groups, the join point might be directly assigned by the rendezvous point, while for large trees the join request might be propagated down the tree with candidate parents forwarding their position directly to the new node.
Here is a simplistic notation for forming a multicast tree in the overlay. Its main advantage is the use of the overlay for routing both control and data messages. The group creator does not have to be the root of the tree or even in the tree. It does not consider per node load, admission control, or alternative paths. After the creation of a tree, the groupID is expected to be advertised or distributed out of band, perhaps by publishing in the DHT. Similarly, joining peers will discover the groupID out of band, perhaps by a lookup in the tree.
groupID = create(); // Allocate a unique groupId. // The root is the nearest // peer in the overlay.
joinTree(groupID); // sends "join groupID" message
As stated earlier, multiple algorithms will co-exist in the overlay.
In this document we define messages for overlay multicast tree creation, using an existing protocol (RELOAD) in the P2P-SIP WG [I-D.ietf-p2psip-base] for a universal structured peer-to-peer overlay protocol. RELOAD provides the mechanism to support a number of overlay topologies. Hence the overlay multicast framework defined in this document can be used with P2P-SIP, and makes the SAM framework overlay agnostic.
As discussed in the SAM requirements document [I-D.muramoto-irtf-sam-generic-require], there are a variety of ALM tree formation and tree maintenance algorithms. The intent of this specification is to be algorithm agnostic, similar to how RELOAD is overlay algorithm agnostic. We assume that all control messages are propagated using overlay routed messages.
The message types needed for ALM behavior are divided into the following categories:
The message codes are defined in Section 15.2 of this document. Messages are mapped to the RELOAD experimental message type.
In the following sections the protocol messages as mapped to RELOAD are discussed. Detailed example message flows are provided in Section 11.
In the following descriptions we use the datatype Dictionary which is a set of opaque values indexed by an opaque key with one value for each key. A single dictionary entry is represented by a DictionaryEntry as defined in Section 7.2.3 of the RELOAD document [I-D.ietf-p2psip-base]. The Dictionary datatype is defined as follows:
struct { DictionaryEntry elements<0..2^16-1>; } Dictionary;
Peers use the overlay to transmit ALM (application layer multicast) operations defined in this section.
A new ALM tree is created in the overlay with the identity specified by group_id. The common interpretation in a DHT based overlay of group_id is that the peer with peer id closest to and less than the group_id is the root of the tree. However, other overlay types are supported. The tree has no children at the time it is created.
The group_id is generated from a well-known session key to be used by other peers to address the multicast tree in the overlay. The generation of the group_id from the session_key MUST be done using the overlay's id generation mechanism.
struct { node_id peer_id; opaque session_key<0..2^32-1>; node_id group_id; Dictionary options; } ALMTree;
peer_id: the overlay address of the peer that creates the multicast tree.
session_key: a well-known string that when hashed using the overlay's id generation algorithm produces the group_id.
group_id: the overlay address of the root of the tree
options: name-value list of properties to be associated with the tree, such as the maximum size of the tree, restrictions on peers joining the tree, latency constraints, preference for distributed or centralized tree formation and maintenance, heartbeat interval.
Tree creation is subject to access control since it involves a Store operation. The NODE-MATCH access policy defined in section 7.3.2 of RELOAD is used.
A successful Create Tree causes an ALMTree structure to be stored in the overlay at the node G responsible for the group_id. This node G performs the RELOAD-defined StoreReq operation as a side effect of performing the Create Tree. If the StoreReq fails, the Create Tree fails too.
After a successful Create Tree, peers can use the RELOAD Fetch method to retrieve the ALMTree struct at address group_id. The ALMTree kind is defined in Section 12.1.
After receiving a CreateTree message from node S, the peer sends a CreateTreeReponse to node S.
struct { Dictionary options; } CreateTreeResponse;
options: A node may provide algorithm-dependent parameters about the created tree to the requesting node.
Causes the distributed algorithm for peer join of a specific ALM group to be invoked. The definition of the Join message is shown below. If successful, the joining peer is notified of one or more candidate parent peers in one or more JoinAccept messages. The particular ALM join algorithm is not specified in this protocol.
struct { node_id peer_id; node_id group_id; Dictionary options; } Join;
peer_id: overlay address of joining/leaving peer
group_id: the overlay address of the root of the tree
options: name-value list of options proposed by joining peer
RELOAD is a request-response protocol. Consequently, the messages JoinAccept and JoinReject (defined below) are matching responses for Join. If JoinReject is received, then no further action on this request is carried out. If JoinAccept is received, then either a JoinConfirm or a JoinDecline message (see below) is sent. The matching response for JoinConfirm is JoinConfirmResponse. The matching response for JoinDecline is JoinDeclineResponse.
The following list shows the matching request-responses according to the request-response mechanism defined in RELOAD.
Thus Join, JoinConfirm, and JoinDecline are treated as requests as defined in RELOAD, are mapped to the RELOAD exp_a_req message, and are therefore retransmitted until either a retry limit is reached or a matching response received. JoinAccept, JoinReject, JoinConfirmResponse, and JoinDeclineResponse are treated as message responses as defined above, and are mapped to the RELOAD exp_a_ans message.
The Join behaviour can be described as follows:
if(checkAccept(msg)) { recvJoins.add(msg.source, msg.group_id) SEND(JOINAccept(node_id, msg.source, msg.group_id)) }
Tells the requesting joining peer that the indicated peer is available to act as its parent in the ALM tree specified by group_id, with the corresponding options specified. A peer MAY receive more than one JoinAccept from different candidate parent peers in the group_id tree. The peer accepts a peer as parent using a JoinConfirm message. A JoinAccept which receives neither a JoinConfirm or JoinDecline message MUST expire. RELOAD implementations are able to read a local configuration file for settings. It is assumed that this file contains the timeout value to be used.
struct { node_id parent_peer_id; node_id child_peer_id; node_id group_id; Dictionary options; } JoinAccept;
parent_peer_id: overlay address of a peer which accepts the joining peer
child_peer_id: overlay address of joining peer
group_id: the overlay address of the root of the tree
options: name-value list of options accepted by parent peer
A peer receiving a Join message responds with a JoinReject response to indicate the request is rejected.
A peer receiving a JoinAccept message which it wishes to accept MUST explicitly accept it before the expiration of a timer for the JoinAccept message using a JoinConfirm message. The joining peer MUST include only those options from the JoinAccept which it also accepts, completing the negotiation of options between the two peers.
struct { node_id child_peer_id; node_id parent_peer_id; node_id group_id; Dictionary options; } JoinConfirm;
child_peer_id: overlay address of joining peer which is a child of the parent peer
parent_peer_id: overlay address of the peer which is the parent of the joining peer
group_id: the overlay address of the root of the tree
options: name-value list of options accepted by both peers
The JoinConfirm message behaviour is decribed below:
if(recvJoins.contains(msg.source,msg.group_id)){ if !(groups.contains(msg.group_id)) { groups.add(msg.group_id) SEND(msg,msg.group_id) } groups[msg.group_id].children.add(msg.source) recvJoins.del(msg.source, msg.group_id) }
A peer receiving a JoinConfirm message responds with a JoinConfirmResponse message.
A peer receiving a JoinAccept message which it does not wish to accept it MAY explicitly decline it using a JoinDecline message.
struct { node_id peer_id; node_id parent_peer_id; node_id group_id; } JoinDecline;
peer_id: overlay address of joining peer which declines the JoinAccept
parent_peer_id: overlay address of the peer which issued a JoinAccept to this peer
group_id: the overlay address of the root of the tree
The behaviour of the JoinDecline message is described as follows:
if(recvJoins.contains(msg.source,msg.group_id)) recvJoins.del(msg.source, msg.group_id)
A peer receiving a JoinConfirm message responds with a JoinDeclineResponse message.
A peer which is part of an ALM tree identified by group_id which intends to detach from either a child or parent peer SHOULD send a Leave message to the peer it wishes to detach from. A peer receiving a Leave message from a peer which is neither in its parent or child lists SHOULD ignore the message.
struct { node_id peer_id; node_id group_id; Dictionary options; } Leave;
peer_id: overlay address of leaving peer
group_id: the overlay address of the root of the tree
options: name-value list of options
The behaviour of the Leave message can be described as:
groups[msg.group_id].children.remove(msg.source) if (groups[msg.group].children = 0) SEND(msg,groups[msg.group_id].parent)
A peer receiving a Leave message responds with a LeaveResponse message.
This triggers a reorganization of either the entire tree or only a sub-tree. It MAY include hints to specific peers of recommended parent or child peers to reconnect to. A peer receiving this message MAY ignore it, MAY propagate it to other peers in its subtree, and MAY invoke local algorithms for selecting preferred parent and/or child peers.
struct { node_id group_id; node_id peer_id; Dictionary options; } Reform;
group_id: the overlay address of the root of the tree
peer_id: if omitted, then the tree is reorganized starting from the root, otherwise it is reorganized only at the sub-tree identified by peer_id.
options: name-value list of options
A peer receiving a Reform message responds with a ReformResponse
struct { Dictionary options; } ReformResponse;
options: algorithm dependent information about the results of the reform operation
A child node signals to its adjacent parent nodes in the tree that it is alive. If a parent node does not receive a Heartbeat message within N heartbeat time intervals, it MUST treat this as an explicit Leave message from the unresponsive peer. N is configurable. RELOAD implementations are able to read a local configuration file for settings. It is assumed that this file contains the value for N to be used.
struct { node_id peer_id_src; node_id peer_id_dst; node_id group_id; Dictionary options; } Heartbeat;
peer_id_src: source of heartbeat
peer_id_dst: destination of heartbeat
group_id: overlay address of the root of the tree
options: an algorithm may use the heartbeat message to provide state information to adjacent nodes in the tree
A parent node responds with a Heartbeat Response to a Heartbeat from a child node indicating that it has received the Heartbeat message.
The NodeQuery message is used to obtain information about the state and performance of the tree on a per node basis. A set of nodes could be queried to construct a centralized view of the multicast trees, similar to a web crawler.
struct { node_id peer_id_src; node_id peer_id_dst; } NodeQuery;
peer_id_src: source of query
peer_id_dst: destination of query
The response to a NodeQuery message contains a NodeStatistics instance for this node.
public struct { uint32 node_lifetime; uint32 total_number_trees; uint16 number_algorithms_supported; uint8 algorithms_supported[32]; TreeData max_tree_data; uint16 active_number_trees; TreeData tree_data<0..2^8-1>; ImplementationInfo imp_info; } NodeStatistics;
public struct { uint32 tree_id; uint8 algorithm; NodeId tree_root; uint8 number_parents; NodeId parent<0..2^8-1>; Uint16 number_children_nodes; NodeId children<0..2^16-1>; Uint32 path_length_to_root; Uint32 path_delay_to_root; Uint32 path_delay_to_child; } TreeData;
public struct { uint32 join_confim_timeout; uint32 heartbeat_interval; uint32 heartbeat_reponse_timeout; uint16 info_length; uint8 info<0..2^16-1>; } ImplementationInfo;
A peer sends arbitrary multicast data to other peers in the tree. Nodes in the tree forward this message to adjacent nodes in the tree in an algorithm dependent way.
struct { node_id group_id; uint8 priority; uint32 length; uint8 data<0..2^32-1>; } Push;
group_id: overlay address of root of the ALM tree
priority: the relative priority of the message, highest priority is 255. A node may ignore this field
length: length of the data field in bytes
data: the data
In pseudocode the behaviour of Push can be described as:
foreach(groups[msg.group_id].children as node_id) SEND(msg,node_id) if memberOf(msg.group_id) invokeMessageHandler(msg.group_id, msg)
After receiving a Push message from node S, the receiving peer sends a PushReponse to node S.
struct { Dictionary options; } PushResponse;
options: A node may provide feedback to the sender about previous push messages in some window, for example, the last N push messages. The feedback could include, for each push message received, the number of adjacent nodes which were forwarded the push message, and the number of adjacent nodes from which a PushResponse was received.
Figure 3 shows a mapping between RELOAD ALM messages (as defined in Section 5 of this document) and Scribe messages as defined in [CASTRO2002].
+---------+-------------------+-----------------+ | Section |RELOAD ALM Message | Scribe Message | +---------+-------------------+-----------------+ | 7.2.1 | CreateALMTree | Create | +---------+-------------------+-----------------+ | 7.2.2 | Join | Join | +---------+-------------------+-----------------+ | 7.2.3 | JoinAccept | | +---------+-------------------+-----------------+ | 7.2.4 | JoinConfirm | | +---------+-------------------+-----------------+ | 7.2.5 | JoinDecline | | +---------+-------------------+-----------------+ | 7.2.6 | Leave | Leave | +---------+-------------------+-----------------+ | 7.2.7 | Reform | | +---------+-------------------+-----------------+ | 7.2.8 | Heartbeat | | +---------+-------------------+-----------------+ | 7.2.9 | NodeQuery | | +---------+-------------------+-----------------+ | 7.2.10 | Push | Multicast | +---------+-------------------+-----------------+ | | Note 1 | deliver | +---------+-------------------+-----------------+ | | Note 1 | forward | +---------+-------------------+-----------------+ | | Note 1 | route | +---------+-------------------+-----------------+ | | Note 1 | send | +---------+-------------------+-----------------+
Figure 3: Mapping to Scribe Messages
Note 1: These Scribe messages are handled by RELOAD messages.
The following sections describe the Scribe algorithm in more detail.
This message will create a group with group_id. This message MUST be delivered to the node whose node_id is closest to the group_id. This node becomes the rendezvous point and root for the new multicast tree. Groups MAY have multiple sources of multicast messages.
To join a multicast tree a node SHOULD send a JOIN request with the group_id as the key. This message gets routed by the overlay to the rendezvous point of the tree. If an intermediate node is already a forwarder for this tree, it SHOULD add the joining node as a child. Otherwise the node SHOULD create a child table for the group and add the joining node. It SHOULD then send the JOIN request towards the rendevous point terminating the JOIN message from the child.
To adapt the Scribe algorithm into the ALM Usage proposed here, after a JOIN request is accepted, a JOINAccept message MUST be returned to the joining node.
When leaving a multicast group a node SHOULD change its local state to indicate that it left the group. If the node has no children in its table it MUST send a LEAVE request to its parent, from where it SHOULD travel up the multicast tree and stop at a node which has still children remaining after removing the leaving node.
This message is not part of the Scribe protocol, but required by the basic protocol proposed in this document. Thus the usage MUST send this message to confirm a joining node accepting its parent node.
Like JoinConfirm, this message is not part of the Scribe protocol. Thus the usage MUST send this message if a peer receiving a JoinAccept message wishes to decline it.
A message to be multicast to a group MUST be sent to the rendevous node from where it is forwarded down the tree. If a node is a member of the tree rather than just a forwarder it SHOULD pass the multicast data up to the application.
P2PCast [P2PCAST] creates a forest of related trees to increase load balancing. P2PCast is independent of the underlying P2P substrate. Its goals and approach are similar to Splitstream [SPLITSTREAM] (which assumes Pastry as the P2P overlay). In P2PCast the content provider splits the stream of data into f stripes. Each tree in the forest of multicast trees is an (almost) full tree of arity f. These trees are conceptually separate: every node of the system appears once in each tree, with the content provider being the source in all of them. To ensure that each peer contributes as much bandwidth as it receives, every node is a leaf in all the trees except for one, in which the node will serve as an internal node (proper tree of this node). The remainder of this section will assume f=2 for the discussion. This is to keep the complexity for the description down. However, the algorithm scales for any number f.
P2PCast distinguishes the following types of nodes:
Figure 4 shows a mapping between RELOAD ALM messages (as defined in Section 5 of this document) and P2PCast messages as defined in [P2PCAST].
+---------+-------------------+-----------------+ | Section |RELOAD ALM Message | P2PCast Message | +---------+-------------------+-----------------+ | 7.2.1 | CreateALMTree | Create | +---------+-------------------+-----------------+ | 7.2.2 | Join | Join | +---------+-------------------+-----------------+ | 7.2.3 | JoinAccept | | +---------+-------------------+-----------------+ | 7.2.4 | JoinConfirm | | +---------+-------------------+-----------------+ | 7.2.5 | JoinDecline | | +---------+-------------------+-----------------+ | 7.2.6 | Leave | Leave | +---------+-------------------+-----------------+ | 7.2.7 | Reform | Takeon | | | | Substitute | | | | Search | | | | Replace | | | | Direct | | | | Update | +---------+-------------------+-----------------+ | 7.2.8 | Heartbeat | | +---------+-------------------+-----------------+ | 7.2.9 | NodeQuery | | +---------+-------------------+-----------------+ | 7.2.10 | Push | Multicast | +---------+-------------------+-----------------+
Figure 4: Mapping to P2PCast Messages
The following sections describe the mapping of the P2PCast messages in more detail.
This message will create a group with group_id. This message MUST be delivered to the node whose node_id is closest to the group_id. This node becomes the rendezvous point and root for the new multicast tree. The rendezvous point will maintain f subtrees.
To join a multicast tree a joining node N MUST send a JOIN request to a random node A already part of the tree. Depending of the type of A the joining algorithm continues as follows:
P2PCast uses defined messages for communication between nodes during reorganisation. To use P2PCast in this context, these messages are encapsulated by the message type REFORM. In doing so, the P2PCast message is to be included in the options parameter of REFORM. The following reorganisation messages are defined by P2PCast:
To adapt the P2PCast algorithm into the ALM Usage proposed here, after a JOIN request is accepted, a JOINAccept message MUST be returned to the joining node (one for every subtree).
When leaving a multicast group a node will change its local state to indicate that it left the group. Disregarding the case where the leaving node is the root of the tree, the leaving node must be complete or incomplete in its proper tree. In the other trees the node is a leaf and can just disappear by notifying its parent. For the proper tree, if the node is incomplete, it is replaced by its child. However, if the node is complete, a gap is created which is filled by a random child. If this child is incomplete, it can simply fill the gap. However, if it is complete, it needs to shed a random child. This child is directed to its sibling, which sheds a random child. This process ripples down the tree until the next-to-last level is reached. The shed node is then taken as a child by the parent of the deleted node in the other stripe.
Again, for the reorganisation of the tree, the REFORM message type is used as defined in the previous section.
This message is not part of the P2PCast protocol, but required by the basic protocol defined in this document. Thus the usage MUST send this message to confirm a joining node accepting its parent node. As with Join and JoinAccept, this MUST be carried out for every subtree.
A message to be multicast to a group MUST be sent to the rendezvous node from where it is forwarded down the tree by being split into k stripes. Each stripe is then sent via a subtree. If a receiving node is a member of the tree rather than just a forwarder it MAY pass the multicast data up to the application.
All messages are mapped to the RELOAD experimental message type. The mapping is given in the following table. The message codes are given in Section 15.2. The format of the body of a message is given in Figure 5.
+-------------------------+------------------+ | Message |RELOAD Code Point | +-------------------------+------------------+ | CreateALMTree | exp_a_req | +-------------------------+------------------+ | CreateALMTreeResponse | exp_a_ans | +-------------------------+------------------+ | Join | exp_a_req | +-------------------------+------------------+ | JoinAccept | exp_a_ans | +-------------------------+------------------+ | JoinReject | exp_a_ans | +-------------------------+------------------+ | JoinConfirm | exp_a_req | +-------------------------+------------------+ | JoinConfirmResponse | exp_a_ans | +-------------------------+------------------+ | JoinDecline | exp_a_req | +-------------------------+------------------+ | JoinDeclineResponse | exp_a_ans | +-------------------------+------------------+ | Leave | exp_a_req | +-------------------------+------------------+ | LeaveResponse | exp_a_ans | +-------------------------+------------------+ | Reform | exp_a_req | +-------------------------+------------------+ | ReformResponse | exp_a_ans | +-------------------------+------------------+ | Heartbeat | exp_a_req | +-------------------------+------------------+ | HeartbeatResponse | exp_a_ans | +-------------------------+------------------+ | NodeQuery | exp_a_req | +-------------------------+------------------+ | NodeQueryResponse | exp_a_ans | +-------------------------+------------------+ | Push | exp_a_req | +-------------------------+------------------+ | PushResponse | exp_a_ans | +-------------------------+------------------+
Figure 5: RELOAD Message Code mapping
For Data Kind-IDs, the RELOAD specification states: "Code points in the range 0xf0000001 to 0xfffffffe are reserved for private use". ALM Usage Kind-IDs are defined in the private use range.
All ALM Usage messages map to the RELOAD Message Extension mechanism.
Code points for the kinds defined in this document MUST NOT conflict with any defined code points for RELOAD. RELOAD defines exp_a_req, exp_a_ans for experimental purposes. This specification uses only these message types for all ALM messages. RELOAD defines the MessageContents data structure. The ALM mapping uses the fields as follows:
struct { uint32 sam_token; uint16 alm_algorithm_id; uint8 version; } ALMHeader;
The fields in ALMHeader are used as follows:
struct { uint16 alm_message_code; opaque alm_message_body; } ALMMessageContents;
The fields in ALMMessageContents are used as follows:
Response codes are defined in section 6.3.3.1 in RELOAD. This specification maps to RELOAD ErrorResponse as follows:
ErrorResponse.error_code = Error_Exp_A;
Error_info contains an ALMErrorResponse instance.
public struct { uint16 alm_error_code; opaque alm_error_info<0..2^16-1>; } ALMErrorResponse;
alm_error_code: The following error code values are defined. Numeric values for these are defined in section Section 15.3.
All peers in the examples are assumed to have completed bootstrapping. "Pn" refers to peer N. "GroupID" refers to a peer responsible for storing the ALMTree instance with GroupID.
A node with "NODE-MATCH" rights sends a request CreateTree to the group-id node, which also has NODE-MATCH rights for its own address. The group-id node determines whether to create the new tree, and if so, performs a local StoreReq. If the CreateTree succeeds, the ALMTree instance can be retrieved using Fetch. An example message flow for ceating a tree is depicted in Figure 6.
P1 P2 P3 P4 GroupID | | | | | | | | | | | | | | | | CreateTree | | | |------------------------------->| | | | | | | | | | | StoreReq | | | | |--+ | | | | | | | | | | | | | | | | |<-+ | | | | | StoreResponse | | | | |--+ | | | | | | | | | | | | | | | | |<-+ | | | | | | | | | | | | CreateTreeResponse | |<-------------------------------| | | | | | | | | | | | Fetch | | | |------------------------------->| | | | | | | | | | | | | FetchResponse | |<-------------------------------| | | | | |
Figure 6: Message flow example for CreateTree.
P1 joins node GroupID as child node. P2 joins the tree as a child of P1. P4 joins the tree as a child of P1. The corresponding message flow is shown in Figure 7.
P1 P2 P3 P4 GroupID | | | | | | | | | | | Join | |------------------------------->| | | | | | | JoinAccept | |<-------------------------------| | | | | | | | | | | | |Join | | |----------------------->| | | | | | | Join| |<-------------------------------| | | | | | |JoinAccept | | | |------>| | | | | | | | | |JoinConfirm | | | |<------| | | | | | | | | | | | |Join | | | | |------>| | | | | Join | |<-------------------------------| | | | | | | Join | | | | |------>| | | | | | | | | | JoinAccept | | | |----------------------->| | | | | | | | | JoinAccept | | | |--------------->| | | | | | | | | | | | | | Join Confirm | | |<-----------------------| | | | | | | | | Join Decline | | | |<---------------| | | | | | | | | | | |
Figure 7: Message flow example for tree Join.
P1 P2 P3 P4 GroupID | | | | | | | | | | | | | Leave | | |<-----------------------| | | | | | | | LeaveResponse | | | |----------------------->| | | | | | | | | | | |
Figure 8: Message flow example for Leave tree.
The multicast data is pushed recursively P1 => GroupID => P1 => P2, P4 following the tree topology created in the Join example above. An example message flow is shown in Figure 9.
P1 P2 P3 P4 GroupID | | | | | | Push | | | | |------------------------------->| | | | | | | | | PushResponse| |<-------------------------------| | | | | | | | | | Push| |<-------------------------------| | | | | | | PushResponse | | | |------------------------------->| | | | | | |Push | | | | |------>| | | | | | | | | |PushResponse | | | |<------| | | | | | | | | | Push | | | | |----------------------->| | | | | | | | | PushResponse | | |<-----------------------| | | | | | | | | | | | | | | | |
Figure 9: Message flow example for pushing data.
This section defines the ALMTree kind per section 7.4.5 in RELOAD. An instance of the ALMTree kind is stored in the overlay for each ALM tree instance. It is stored at the address group_id.
Kind-Id: 0xf0000001 (This is a private-use code-point per section 14.6 of RELOAD.) The Resource Name for the ALMTree Kind-ID is the session_key used to identify the ALM tree.
Data Model The data model is the ALMTree structure.
Access Control NODE-MATCH. The node performing the store operation is required to have NODE-MATCH access.
Meaning: The meaning of the fields is given in Section 7.2.1.
struct { node_id peer_id; opaque session_key<0..2^32-1>; node_id group_id; Dictionary options; } ALMTree;
There are no ALM parameters defined for the RELOAD configuration file.
Version 05: Updated references. Fixed typos.
This section contains the new code points registered by this document. [NOTE TO IANA/RFC-EDITOR: Please replace RFC-to-be with the RFC number for this specification in the following list. ]
IANA SHALL create a "SAM ALM Algorithm ID" Registry. Entries in this registry are 16-bit integers denoting Application Layer Multicast algorithms as described in section Section 10.1 of [RFC-to-be]. Code points in the range 0x3 to 0x7fff SHALL be registered via RFC 5226 [RFC5226] Expert Review. Code points in the range 0x7fff to 0xfffe are reserved for private use. The initial contents of this registry are:
+----------------+-------------------+-----------+ | Algorithm Name | ALM Algorith ID | RFC | +----------------+-------------------+-----------+ | INVALID-ALG | 0 | RFC-to-be | | SCRIBE-SAM | 1 | RFC-to-be | | P2PCAST-SAM | 2 | RFC-to-be | | Reserved | 0x3..0xffff | RFC-to-be | +----------------+-------------------+-----------+
Figure 10
These values have been made available for the purposes of experimentation. These values are not meant for vendor specific use of any sort and MUST NOT be used for operational deployments.
IANA SHALL create a "SAM ALM Message Code" Registry. Entries in this registry are 16-bit integers denoting message codes as described in section Section 10.2 of [RFC-to-be]. Code points in the range 0x14 to 0x7fff SHALL be registered via RFC 5226 [RFC5226] Expert Review. Code points in the range 0x7fff to 0xfffe are reserved for private use. The initial contents of this registry are:
+-------------------------+----------------------+-----------+ | Message Code Name | Message Code Value | RFC | +-------------------------+----------------------+-----------+ | InvalidMessageCode | 0 | RFC-to-be | | CreateALMTRee | 1 | RFC-to-be | | CreateALMTreeResponse | 2 | RFC-to-be | | Join | 3 | RFC-to-be | | JoinAccept | 4 | RFC-to-be | | JoinReject | 5 | RFC-to-be | | JoinConfirm | 6 | RFC-to-be | | JoinConfirmResponse | 7 | RFC-to-be | | JoinDecline | 8 | RFC-to-be | | JoinDeclineResponse | 9 | RFC-to-be | | Leave | 10 | RFC-to-be | | LeaveResponse | 11 | RFC-to-be | | Reform | 12 | RFC-to-be | | ReformResponse | 13 | RFC-to-be | | Heartbeat | 14 | RFC-to-be | | HeartbeatResponse | 15 | RFC-to-be | | NodeQuery | 16 | RFC-to-be | | NodeQueryResponse | 17 | RFC-to-be | | Push | 18 | RFC-to-be | | PushResponse | 19 | RFC-to-be | | Reserved | 0x14..0xffff | RFC-to-be | +-------------------------+----------------------+-----------+
Figure 11
These values have been made available for the purposes of experimentation. These values are not meant for vendor specific use of any sort and MUST NOT be used for operational deployments.
IANA SHALL create a "SAM ALM Error Code" Registry. Entries in this registry are 16-bit integers denoting error codes as described in section Section 10.3 of [RFC-to-be]. Code points in the range 0x14 to 0x7fff SHALL be registered via RFC 5226 [RFC5226] Expert Review. Code points in the range 0x7fff to 0xfffe are reserved for private use. The initial contents of this registry are:
+----------------------------------+--------------+-----------+ | Error Code Name | Code Value | RFC | +----------------------------------+--------------+-----------+ | InvalidErrorCode | 0 | RFC-to-be | | Error_Unknown_Algorithm | 1 | RFC-to-be | | Error_Child_Limit_Reached | 2 | RFC-to-be | | Error_Node_Bandwidth_Reached | 3 | RFC-to-be | | Error_Node_Conn_Limit_Reached | 4 | RFC-to-be | | Error_Link_Cap_Limit_Reached | 5 | RFC-to-be | | Error_Node_Mem_Limit_Reached | 6 | RFC-to-be | | Error_Node_CPU_Cap_Limit_Reached | 7 | RFC-to-be | | Error_Path_Limit_Reached | 8 | RFC-to-be | | Error_Path_Delay_Limit_Reached | 9 | RFC-to-be | | Error_Tree_Fanout_Limit_Reached | 10 | RFC-to-be | | Error_Tree_Depth_Limit_Reached | 11 | RFC-to-be | | Error_Other | 12 | RFC-to-be | | Reserved | 0x14..0xffff | RFC-to-be | +----------------------------------+--------------+-----------+
Figure 12
These values have been made available for the purposes of experimentation. These values are not meant for vendor specific use of any sort and MUST NOT be used for operational deployments.
Overlays are vulnerable to DOS and collusion attacks. We are not solving overlay security issues. We assume the node authentication model as defined in [I-D.ietf-p2psip-base].
ALM Usage specific security issues:
Marc Petit-Huguenin, Michael Welzl, Joerg Ott, and Lars Eggert provided important comments on earlier versions of this document.