Internet DRAFT - draft-bestler-transactional-subset-multicast
draft-bestler-transactional-subset-multicast
TSVWG C. Bestler, Ed.
Internet-Draft R. Novak
Intended status: Experimental Nexenta
Expires: March 14, 2015 September 10, 2014
Creation of Transactional Subset Multicast Groups
draft-bestler-transactional-subset-multicast-00
Abstract
This memo presents techniques for controlling the membership of
multicast groups which are constrained to be a subset of a pre-
existing multicast group, where such subset groups are only used for
short duration transactions which are multicast to a subset of the
larger multicast group.
Editor's Note
The proper working group for this draft has not yet been determined.
Alternate working groups include PIM and INT.
Nexenta has been developing a multicast based transport/storage
protocol for Object Clusters at Nexenta. This applies multicast
datagrams to creation and replication of Objects such as those
supported by the Amazon Simple Storage Service ("S3") protocol or the
OpenStack Object Storage service ("Swift"). Creating replicas of
object payload on multiple servers is an inherent part of any storage
cluster, which makes multicast addressing very inviting. There are
issues of congestion control and reliability to settle, but new Layer
2 capabilities such as DCB (Data Center Bridging) make this doable.
However, we found that the existing protocols for controlling
multicast group membership (IGMP and MLD) are not suitable for our
storage application. The Authors doubt this is unique to a single
application. It should apply to many clusters that have a need to
distribute transactional messages to dynamically selected subsets of
a group within a cluster to multiple known recipients.
Computational clusters using MPI are also potential users of
transactional multicasting. Inter-server replication in a pNFS
cluster is another.
These are just examples of synchronizing cluster data where the
synchronization does not replicate all of the shared data with the
entire cluster. But these are merely initial hunches, working group
feedback is expected to refine characterization of the applicability
of transactional subset multicast groups.
Bestler & Novak Expires March 14, 2015 [Page 1]
Internet-Draft Transactional Subset Multicast Groups September 2014
This submission, and ensuing discussion of this draft and its
successors will make reference to specific applications, including
the Nexenta Replicast protocol for multicast replication in Nexenta's
Cloud Copy-on-Write (CCOW) Object Cluster used in the NexentaEdge
product. Such examples are merely for illustrative purposes. Any
IETF standardization of the Replicast storage protocols would be done
via the Storm or NFS groups, and would require adoption of a
definition of Object Storage as a service before standardizing any
specific protocol for providing Object Storage services.
At this stage in drafting message formats have not yet been set for
the standardized version of the protocol. The pre-standard version
was limited to a single L2 physical network, which would be an
inappropriate limitation for an IETF standard. Working Group
feedback on the format of these messages will be sought during the
consensus building process.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on March 14, 2015.
Copyright Notice
Copyright (c) 2014 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Bestler & Novak Expires March 14, 2015 [Page 2]
Internet-Draft Transactional Subset Multicast Groups September 2014
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1. Requirements Notation . . . . . . . . . . . . . . . . . . 4
2. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 4
3. An Example Application . . . . . . . . . . . . . . . . . . . 5
4. Generalized Usage of Transactional Subset Multicast Groups . 6
5. Transactional Subset Multicast Groups . . . . . . . . . . . . 6
5.1. Definition . . . . . . . . . . . . . . . . . . . . . . . 6
5.1.1. Dynamic Specification versus Dynamic Selection . . . 7
5.1.2. Push vs. Join . . . . . . . . . . . . . . . . . . . . 7
5.2. Applicability . . . . . . . . . . . . . . . . . . . . . . 8
5.2.1. How is the Group Selected? . . . . . . . . . . . . . 8
5.2.2. What are the endpoints that receive the messages? . . 9
5.2.3. What is the duration of the group? . . . . . . . . . 9
5.2.4. Who are the members of the group? . . . . . . . . . . 11
5.2.5. How much latency does the application tolerate? . . . 11
5.2.6. What must be done to maintain the Group? . . . . . . 12
6. Forwarding Control Agent . . . . . . . . . . . . . . . . . . 12
6.1. Network Topology . . . . . . . . . . . . . . . . . . . . 13
6.2. Isolated VLANs Strategy . . . . . . . . . . . . . . . . . 13
7. Forwarding Control Agent Methods . . . . . . . . . . . . . . 14
7.1. Dynamically Pushed Subset Groups . . . . . . . . . . . . 14
7.2. Persistent Transactional Subset Groups . . . . . . . . . 15
8. Relationship to Existing Multicast Membership Protocols . . . 16
9. Control Protocol . . . . . . . . . . . . . . . . . . . . . . 17
10. Forwarding Control Agent Methods . . . . . . . . . . . . . . 17
10.1. Create Transactional Multicast Address Block . . . . . . 17
10.2. Release Transactional Multicast Address Block . . . . . 18
10.3. Set Dynamic Transactional Multicast Group Membership
IPV6 . . . . . . . . . . . . . . . . . . . . . . . . . . 18
10.4. Set Dynamic Transactional Multicast Group Membership
IPV4 . . . . . . . . . . . . . . . . . . . . . . . . . . 19
10.5. Set Persistent Transactional Multicast Groups IPv6 . . . 19
10.6. Set Persistent Transactional Multicast Groups IPv4 . . . 20
10.7. Refresh Persistent Transactional Multicast Group . . . . 21
11. Security Considerations . . . . . . . . . . . . . . . . . . . 22
12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23
13. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
14. References . . . . . . . . . . . . . . . . . . . . . . . . . 23
14.1. Informative References . . . . . . . . . . . . . . . . . 23
14.2. Normative References . . . . . . . . . . . . . . . . . . 24
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 24
Bestler & Novak Expires March 14, 2015 [Page 3]
Internet-Draft Transactional Subset Multicast Groups September 2014
1. Introduction
Existing standards for controlling the membership of multicast groups
can be characterized as being Join-driven. These include
[RFC3376],[RFC3810], [RFC4541] and [RFC4604]. Due to their inherent
latency these techniques prove to be unsuitable for maintaining large
sets of related multiast groups. This memo details a new method of
maintaining such large sets of related multicast groups when they are
all subsets of a single master reference group. This is not a
restriction for most cluster-oriented applications which could use
transactional multicasting.
Transactional Subset Multicasting defines techniques that extends
existing control of a reference multicast group to a potentially
large set of multicast addresses used with a VLAN within each local
subnet that the reference multicast group reaches.
This specification makes no modifications to the forwarding of
multicast packets nor to the communications between mrouters. New
methods are defined to set Layer 2 multicast forwarding rules on
switches within each of the relevant Layer 2 subnets.
1.1. Requirements Notation
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
2. Motivation
Transactional Subset Multicast groups are maintained within each
VLAN. A 'Forwarding Control Agent' is defined within each VLAN that
is responsible for applying the forwarding information known for a
reference multicast group to efficiently set layer 2 multicast
forwarding rules within each local network.
The functionality of the Forwarding Control Agent is best understood
as extending the functionality of IGMP/MLD Snooping (See [RFC4541]).
An IGMP/MLD snooper interprets IGMP (see [RFC3376]) or MLD (see
[RFC3810]) messages to translate their Layer 3 objectives into Layer
2 multicast forwarding rules.
A Forwarding Control Agent interprets new messages defined in this
specification for a newly defined class of transactional subset
multicast groups into the same Layer 2 multicast forwarding rules.
Strategies for implementing Forwarding Control Agents would include
Bestler & Novak Expires March 14, 2015 [Page 4]
Internet-Draft Transactional Subset Multicast Groups September 2014
extending IGMP/MLD snooping implementations or building the
Forwarding Control Agent external to the existing L2 switch software.
The per transaction costs of using such groups are far lower than
with the existing methods. The ongoing maintenance work for
multicast forwarding elements is limited to the reference multicast
group, it is not replicated for each of the subset transactional
multicast groups.
3. An Example Application
The Replicast (see [Replicast]) usage of transactional subset
multicasting involves:
o Taking a Cryptographic Hash of each chunk to be stored. This
"hash id" is used with a distributed hash table to determine a
conventional multicast group which will be used to negotiate
placement of the chunk. This is the reference multicast group.
Replicast refers to it as a "Negotiating Group".
o Multicasting a request to put the chunk to the reference multicast
group. Receiving storage nodes will respond with a bid on when
they could store that chunk, or an indication that they already
have that chunk stored. Each of the storage nodes is offering a
provisional reservation of its input capacity for a specific time
window.
o Assuming that the chunk is not already stored, selecting the best
responses to make a transactional subset group. Determination of
'best' typically is driven by the earliest possible completion of
the transaction, but may factor the current available storage
capacity on each of the storage nodes as well.
o Form or select a "rendezvous group" which will be used to transfer
the chunk. When the core network is non-blocking, the transfer
will be able to proceed at close to full wire speed at the
reserved time because each of the selected storage nodes has
reserved its input capacity for bulk payload exclusively. A
multicast message to the reference group informs both those
selected and those not selected for the rendezvous transfer.
Those not selected will release the provisional reservation.
o At the designated time, multicast the chunk payload to the
transactional subset multicast group.
o Each recipient validates the cryptographic hash of the received
data, and unicasts a positive or negative acknowledgement to the
sender.
Bestler & Novak Expires March 14, 2015 [Page 5]
Internet-Draft Transactional Subset Multicast Groups September 2014
o If sufficient valid copies have been positively acknowledge, the
transaction is complete. Otherwise it is retried.
4. Generalized Usage of Transactional Subset Multicast Groups
Beyond a specific application, the generalized potential for dramatic
savings is that transactional messaging within a cluster is a
radically different use-case from traditional multicast. The set of
factors that differentiates this class of applications can be
examined through a series of questions:
o How is the group Selected? Section 5.2.1
o What are the endpoints that receive the messages? Section 5.2.2
o What is the duration of the group?Section 5.2.3
o Who are the potential members of the group? Section 5.2.4
o How much latency does the application tolerate? Section 5.2.5
o What must be done to maintain the group? Section 5.2.6
5. Transactional Subset Multicast Groups
5.1. Definition
A Transactions Subset Multicast Group is a multicast group which:
o Is derived from a pre-existing multicast group created by means
independent of this standard. The membership of this derived
group is a subset of the reference existing multicast group.
o Has a multicast group address which is part of a block allocated
for transactional multicast groups.
o Will only be used for the duration of a transaction. A network
failure or re-configuration during the transaction will require an
upper layer retry of the transaction. Transactional Subset
Multicast groups are not suitable for streaming of content.
Transactional subset multicast groups may be persistent, in that
the same group continues to exist and be used for a series of
transactions. But each message sent to the group is part of a
single short duration transaction.
Bestler & Novak Expires March 14, 2015 [Page 6]
Internet-Draft Transactional Subset Multicast Groups September 2014
5.1.1. Dynamic Specification versus Dynamic Selection
There are two basic strategies for managing the membership of subset
multicast groups:
o Dynamic Specification: The selected members join a group that had
been pre-selected for the transaction.
o Dynamic Selection: A pre-existing group is selected to match the
subset desired. That group is allocated for this purpose and used
for the transaction.
These two strategies can also be combined to form a hybrid strategy.
If there is a pre-existing group for the desired membership list it
is allocated and used, otherwise an available group is allocated and
re-configured to have the required membership.
5.1.2. Push vs. Join
Existing methods for managing membership of a multicast group can be
characterized as Join protocols. The receivers may join the group,
or subscribe to a specific source within a group, but the receivers
of multicast messages control their reception of multicast messages.
This model is well suited for multimedia transmission where the
sender does not necessarily know the full set of endpoints receiving
its multicast content. In many cluster application the sender has
determined the set of receivers. Requiring the sender to communicate
with the recipients so that they can Join the group adds latency to
the entire transaction.
However, there would be a serious security concern if transactional
multicasting is not limited to transactional subset multicasting.
Requiring that every member of a subset multicast group already be a
member of a reference multicast group ensures that no new method of
sending traffic is being created. Without this guarantee a denial-
of-service attacker could simply push a multicast group membership
listing 1000 members, then flood that multicast group. The amount of
traffic delivered to the aggregate destinations would be multiplied
by a factor of 1000.
Transactional subset multicasting is defined to eliminate the latency
required for Join-directed multicast group membership, while avoiding
creating a new attack vector for denial-of-service flooding.
Bestler & Novak Expires March 14, 2015 [Page 7]
Internet-Draft Transactional Subset Multicast Groups September 2014
5.2. Applicability
Transactional Subset Multicast Groups are applicable for applications
that want to reduce overall latency by reducing the number of round-
trips required for their transactions when identical content must be
delivered to multiple cluster members, but the selected members are a
subset of a larger group that must be dynamically selected.
Parallel processing of payload and/or storage of payload are the
primary examples of such a pattern of communications.
Examples of such applications include:
o Computational Clusters, particularly those using MPI (see [MPI])
o Storage applications, including:
* pNFS (See [RFC5661]).
* Amazon Simple Storage Service (S3) (See [AmazonS3]).
* OpenStack Object Storage (Swift) (See [Swift]).
Dynamic selection of subsets ultimately enables multiple concurrent
transfers to occur, which would not have been possible if the message
had been sent to the entire reference multicast group. Applications
with relatively small payload to be multicast may find it easier to
use simple multicast and slightly over-deliver the message.
5.2.1. How is the Group Selected?
In Join-directed multicasting the membership of a multicast group is
controlled by the listeners joining and leaving the group. The
sender does not control or even know the recipients. This matches
the multicast streaming use-case very well. However it does not
match a cluster that needs to distribute a transactional message to a
subset of a known cluster.
The target group is also assumed to be stable for a long sequence of
packets, such as streaming a video. The targeted applications direct
transactions to a subset of a stable group.
One example of the need to distribute a transactional message to a
subset of a known cluster is replication of data within an object
cluster. A set of targets has been selected through an higher layer
protocol. Joi-directed group setup here adds excessive latency to
the process. The targets must be informed of their selection, they
must execute IGMP joins and confirm their joining to the source
Bestler & Novak Expires March 14, 2015 [Page 8]
Internet-Draft Transactional Subset Multicast Groups September 2014
before the multicast delivery can begin. Only replication of large
storage assets can tolerate this setup penalty.
A distributed computation may similarly have data that is relevant to
a specific set of recipients within the cluster. Performing the
distribution serially to each target over unicast point-to-point
connections uses excessive bandwidth and increases the transactions'
latency. It is also undesirable to incur the latency of Join-driven
multicast group setup.
This specification creates two methods for a sender to form or select
a multicast group for transactional purposes. With these methods no
further transmissions are required from the selected targets until
the full transfer is complete.
The restriction that the targeted group must be a subset of an
existing multicast group is necessary to prevent a denial-of-service
flooding attack. Transactional multicast groups that were not
restricted to being a subset of an existing multicast group could be
used to flood a large number of targets that were unprepared to
process incoming multicast datagrams.
5.2.2. What are the endpoints that receive the messages?
The endpoints of the transactional messages may be higher layer
entities, where each network endpoint supports multiples instances of
the higher layer entities. For example, a storage application may
have IP addresses associated with specific virtual drives, as opposed
to an IP address associated with a server that hosted multiple
virtual drives.
Having an IP address for each drive makes migrating control over that
drive to a new server easier, and allows the servers to direct
incoming payload to the correct drive.
5.2.3. What is the duration of the group?
Join-directed multicasting is designed primarily for the multicast
streaming use-case. A group has an indefinite lifespan, and members
come and go at any time during this lifespan, which might be measured
in minutes, hours or days.
Transaction multicasting is designed to support applications where a
transaction lasts for microseconds or milliseconds (possibly even
seconds). Transactional multicasting seeks to identify a multicast
group for the duration of sending a set of multicast datagrams
related to a specific transaction. Recipients either receive the
entire set of datagrams or they do not. Multicast streaming
Bestler & Novak Expires March 14, 2015 [Page 9]
Internet-Draft Transactional Subset Multicast Groups September 2014
typically is transmitting error tolerant content, such as MPEG
encoded material. Transaction multicasting will typically transmit
data with some form of validating signature and transaction
identifier that allows each recipient to confirm full reception of
the transaction.
This obviously needs to be combined with applicable congestion
control strategies being deployed by the upper layer protocols. The
Nexenta Replicast protocol only does bulk transfers against reserved
bandwidth, but there are probably as many solutions for this problem
as there are applications. Replicast relies upon IEEE I802.1
Datacenter Bridging (DCB) protocols such as Priority Flow Control and
Congestion Notification to provide no-drop service. The DCB
protocols deal with the fine timing of congestion avoidance, but
require higher layer transport or application protocols to keep the
sustained traffic rates below the sustained capacity. Creating
explicit reservations for bulk transfers is the main method for
accomplishing this.
The relevant DCB protocols include:
o Congestion Notification:[IEEE.802.1Qau-2011]
o Enhanced Transmission Selection:[IEEE.802.1Qaz-2011]
o Priority Flow Control[IEEE.802.1Qbb-2011]
The important distinction between Replicast and conventional
multicast applications is that there is no need to dynamically adjust
multicast forwarding tables during the lifespan of a transaction,
while IGMP and MLD are designed to allow the addition and deletion of
members while a multicast group is in use. This distinction is not
unique to any single storage application. Transactional replication
is a common element in cluster protocol design.
The limited duration of a transactional multicast group implies that
there is no need for the multicast forwarding element to rebuild its
forwarding tables after it restarts. Any transaction in progress
will have failed, and been retried by the higher-layer protocol.
Merely limiting the rate at which it fails and restarts is all that
is required of each forwarding element.
Another implication is that there is no need for the forwarding
elements to rebuild the membership list of a transactional multicast
group after the forwarding element has been reset. The transactions
using the forwarding element will all fail, and be retried by a
higher layer transport or application protocol. Assuming that
Bestler & Novak Expires March 14, 2015 [Page 10]
Internet-Draft Transactional Subset Multicast Groups September 2014
forwarding elements do not reset multiple times a minute this will
have very limited impact on overall application throughput.
The duration of a transaction is application specific, but inherently
limited. A failed transaction will be retried at the application
layer, so obviously it has a duration measured in seconds at the
longest.
5.2.4. Who are the members of the group?
Join-directed multicasting allows any number of recipients to join or
leave a group at will.
Transactional multicast requires that the group be identified as a
small subset of a pre-existing multicast group.
Building forwarding rules that are a subset of forwarding rules for
an existing multicast group can be done substantially faster than
creating forwarding rules to arbitrary and potentially previously
unknown destinations.
Some applications, including Object Clusters, benefit considering the
members to be higher layer entities (such as virtual drives) rather
than simply being the base IP address of the servers that host the
higher layer entities. Doing so allows groups to be defined for each
set of logical endpoints, not merely sets of physical endpoints. An
Object Cluster, for example, could have two different groups ([A,B,C]
vs [A,B,D]) even when the destinations are the same Layer 2 MAC
address (i.e., C and D are hosted by the same server). This allows
the server hosting both C and D to distinguish which entity is
addressed using the Destination IP Address.
5.2.5. How much latency does the application tolerate?
While no application likes latency, multicast streaming is very
tolerant of setup latency. If the end application is viewing or
listening to media, how many msecs are required to subscribe to the
group will not have a measurable impact to the end user.
For transactions in a cluster, however, every msec is delaying
forward progress. The time it takes to do an IGMP join would be a
significant addition to the latency of storing an object in an object
cluster using a relatively fast storage technology (such as SSD,
Flash or Memristor).
Bestler & Novak Expires March 14, 2015 [Page 11]
Internet-Draft Transactional Subset Multicast Groups September 2014
5.2.6. What must be done to maintain the Group?
The Join-directed multicast protocols specify methods for the
required maintenance of multicast groups.mMulticast forwarders,
switches or mrouters, must deal with new routes and new locations for
endpoints.
The reference multicast group will still be maintained by the
existing Join-directed multicast group protocols. The existing IGMP/
MLD snooping procedures will keep the L2 multicasting forwarding
rules updated as changes in the network topology are detected.
Nothing in this specification changes the handling of the reference
multicast group.
Transactional subset multicast groups are defined to be used only for
short transactions, allowing them to piggy-back on the maintenance of
the reference multicast group.
6. Forwarding Control Agent
The Forwarding Control Agent is responsible for translating
forwarding control messages as defined in Section 7 into Layer 2
multicast forwarding for one or more subnets associated with a single
physical layer 2 subnet.
Each Forwarding Control Agent can be though of as extending the IGMP/
MLD snooping capabilities of an L2 forwarding element. It is
translating the forwarding control agent messages into configuration
of L2 multicast forwarding just as an IGMP/MLD snooper translates
IGMP/MLD messages into configuration of Layer 2 multicast forwarding.
This MAY be done external to the existing implementation, or it may
be integrated with the IGMP/MLD snooper implementation.
Each Forwarding Control Agent:
o MUST Accept authenticated forwarding control agent messages
controlling the creation and membership of Transactional Subset
Multicast Groups within the context of a specified VLAN.
o MUST support at least one VLAN.
o MAY support multiple VLANs.
o MUST update the controlled Layer 2 forwarding element's multicast
forwarding rules to reflect the subset specified for the group.
o MUST Update the controlled L2 forwarding elements multicast
forwarding rules to reflect changes in the mapping of IP addresses
Bestler & Novak Expires March 14, 2015 [Page 12]
Internet-Draft Transactional Subset Multicast Groups September 2014
to L2 MAC addresses between transactions for persistent
transactional suset multicast groups when informed of a prior
transactional failure with a Refresh Membership message (see
Figure 7).
o MAY refresh the Layer 2 multicast forwarding rules at any time.
6.1. Network Topology
Forwarding Control Agents are applicable for networks which consist
of one or more local subnets which have direct links with each other.
6.2. Isolated VLANs Strategy
Transactional Subset Multicast groups define a very large number of
multicast addresses which must be delivered within a closed set of IP
subnets without having to dynamically co-ordinate allocation of these
multicast addresses with a wider network.
This MAY be accomplished using a "Isolated VLANs Strategy" where the
reference multicast group and all transactional multicast groups
derived from it are used strictly inside of a single VLAN or a set of
interconnected VLANs which route these multicast groups solely within
this closed set.
Specifically, an implementation using the Isolated VLANs Strategy:
o MUST include only a pre-defined set of subnets,each enforced with
a VLAN.
o MUST provide for routing or forwarding of all packets using the
reference multicast group and all transactional subset multicast
groups derived from it amongst these subnets.
o MUST NOT allow any packet using the reference multicast group or
any transactional subset multicast groups derived from it to be
routed to any subnet that is not part of the identified Isolated
VLAN set.
o MAY/SHOULD guard the confidentiality of multicast packets routed
between subnets that transit subnets that are not part of the
Isolated VLAN set.
Applications MAY use the Isolated VLAN Strategy. Virtually all
applications will elect to do so because allocating a very large
block of adjacent multicast addresses would be very difficult.
Confining usage of these addresses to a single VLAN is highly
desirable.
Bestler & Novak Expires March 14, 2015 [Page 13]
Internet-Draft Transactional Subset Multicast Groups September 2014
Direct connections between the VLANs hosting Forwarding Control
Agents is required because the Transactional Subset Multicast Groups
are not known to any intermediate multicast routers that would
implement indirect links. Co-locating Forwarding Control Agents with
RBridges [[RFC6325]] MAY be a solution.
7. Forwarding Control Agent Methods
7.1. Dynamically Pushed Subset Groups
Each Pushed Subset Membership commands MUST contain the following:
o Subset Transactional Multicast Group: Group multicast address that
is to have its multicast forwarding rules updated. This address
must be within a block of Transactional Multicast Groups
previously created using the Create Transactional Multicast
Address Block command (Section 10.1).
o Target List: List of IP Addresses which are to be the targets of
this group. These addresses are intended to be members of the
reference group. When formulating the list, non-members MUST NOT
be included. However there is no transaction lock placed upon the
group, and therefor there may be changes in the group membership
before the message is received. Therefore the Forwarding Control
Agent MUST ignored any listed target that is not a member of the
reference group.
This sets the multicast forwarding rules for pre-existing multicast
forwarding address X to be the subset of the forwarding rules for
existing group Y required to reach a specified member list.
This is done by communicating the same instruction (above) to each
multicast forwarding network element. This can be done by unicast
addressing with each of them, or by multicasting the instructions.
Each multicast forwarder will modify its multicast forwarding port
set to be the union of the unicast forwarding it has for the listed
members, but result must be a subset of the forwarding ports for the
parent group.
For example, consider an instruction is to modify a transaction
multicast group I which is a subset of multicast group J to reach
addresses A,B and C.
Addresses A and B are attached directly to multicast forwarder X,
while C is attached to multicast forwarder Y.
On forwarder X the forwarding rule for new group I contains:
Bestler & Novak Expires March 14, 2015 [Page 14]
Internet-Draft Transactional Subset Multicast Groups September 2014
o The forwarding port for A.
o The forwarding port for B. The forwarding port to forwarder Y (a
hub link). This eventually leads to C.
While on forwarder Y the forwarding rule for the new group I will
contain:
The forwarding port for forwarder X (a hub link). This eventually
leads to A and B.
The forwarding port for C.
This assumes that the Forwarding Control Agent can perform a two-step
translation: first from IP Address to MAC Address, and then from MAC
Address to forwarding port. For typical applications of
Transactional Subset Multicasting, all of the referenced IP Addresses
will have been involved in recent messaging, and therefore will
frequently already be cached.
Many ethernet switches already support command line and/or SNMP
methods of setting these multicast forwarding rules, but it is
challenging for an application to reliably apply the same changes
using multiple vendor specific methods. Having a standardized method
of pushing the membership of a multicast group from the sender would
be desirable.
A Forwarding Control Agent MAY accept a request where the Target List
is expressed as a list of destination L2 MAC addresses.
7.2. Persistent Transactional Subset Groups
There is a large group of pre-configured multicast groups which are
an enumeration of the possible subsets of a master group. This will
be a specific subset, such as all combinations of 3 members for
multicast group X. These groups are enumerated and assuaged
successive multicast addresses within a block.
The sender first obtains exclusive permission to utilize a portion of
the reception capacity of each desire target, and then selects the
multicast address that will reach that group.
In a straightforward enumeration of 3 members out of a group of 20,
there are 20*19*18/3*2 or 1040 possible groups. Typically the higher
layer protocol will have negotiated the right to send the transaction
with the member prior to selecting the multicast group. In making
the final selection, the actual multicast group is selected and some
offered targets are declined.
Bestler & Novak Expires March 14, 2015 [Page 15]
Internet-Draft Transactional Subset Multicast Groups September 2014
Those 1040 possible groups can be enumerated in order (starting with
M1, M2 and M3 and ending with M18, M19 and M20) and assigned
multicast addresses from N to N+1039.
When the transaction requires reaching M4, M5 and M19, you simply
select that group. Because exclusive rights to use multicasting to
M4, M5 and M19 have already been obtained through the higher layer
protocol the group [M4,M5,M19] is already exclusively claimed.
These 1040 groups may be set up through any of the following means:
o Traditional IGMP/MLD joining/leaving.
o Setting static forwarding rules using SNMP MIBs and/or switch-
specific command line interfaces. Note that the wide-spread
existence of command line interfaces to custom set multicast
forwarding rules is an indicator that there are existing
applications that find the exising IGMP/MLD protocols to be
inadequate to fulfill their needs.
o The Dynamically Pushed Multicast Group method. See Section 7.1
8. Relationship to Existing Multicast Membership Protocols
TBD: briefly describe and cite IGMP, MLD and PIM.
Transactional Subset Multicast Groups are not a replacement for Join-
based management of Multicast Groups. Rather it extends the group
maintenance performed by the Join-based multicast control protocols
from the reference group to any entire set of multicast addresses
that are subsets of it.
This extension requires no modification to the existing data-plane
multicast forwarding protocols or implementations. Transactional
Subset Multicast groups may be implemented solely in the sender,
receivers and the Forwarding Control Agents associated with each
multicast forwarder supporting the reference group.
The maintenance work of the Join-based multicast protocols performed
on the reference multicast group is leveraged to allow maintenance of
a potentially large number of derived Transactional Multicast groups.
This allows identification of a large number of subsets of the
reference group, without requiring a matching increase in the
maintenance traffic which would have been required had the derived
groups been formed with a Join-based protocol.
Bestler & Novak Expires March 14, 2015 [Page 16]
Internet-Draft Transactional Subset Multicast Groups September 2014
9. Control Protocol
Note: the pre-standard protocol relies on multicasting of commands
within a single secure VLAN. More general usage of these techniques
will require transmitting Forwarding Control Agent instructions
between subnets where they may be subject to interception and even
alteration. Therefore a more secure method of delivering Forwarding
Control Agent instructions is required.
The methods standardized by the KARP (Key Authentication for Router
Protocols) are, in the Authors' opinion, fully applicable to this
protocol. See [RFC6518]. Working Group feedback is sought as to how
to expand this section, whether to split the Control Protocol to a
separate document, or other methods of dealing with the control
protocol.
The following requirements apply to any Control Protocol used:
o Each request MUST be uniquely identified. This identification
MUST include the source IP address of the requester.
o The message MUST be authenticated.
o WG discussion is needed to reach a consensus as to whether the
message contents need to be kept confidential, or whether
preventing alteration is sufficient.
o The sender MUST NOT be required to transmit the command more than
once other than as required for retries. For example, requiring
SSH connections with each Forwarding Control Agent is not
acceptable.
o Barring network errors, the message MUST be delivered to all
Forwarding Control Agents that can receive the reference master
group.
10. Forwarding Control Agent Methods
10.1. Create Transactional Multicast Address Block
TBD:This section will define the fields required for the command to
create a block of transactional subset multicast addresses within a
specific VLAN. The command defined here is delivered within a
control protocol.
Bestler & Novak Expires March 14, 2015 [Page 17]
Internet-Draft Transactional Subset Multicast Groups September 2014
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Opcode=CreateTransactionalMulticast |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Base Multicast Group Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+
| Number of Addresses required in Block |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+_-+--+-+
Figure 1: Create Transcaction Multicast Address Block Message
The Multicast Group Number is the 24-bit L2 Multicast MAC address.
This matches both the IPV4 and IPV6 addresses which map to it. A
given UDP datagram is sent using either an IPV4 or an IPV6 address,
so the membership of a Multicast Group is either IPV4 endpoints or
IPV6 endpoints at any given instant.
This command does not allow creating numerically scattered group of
addresses. Doing so would have made the job of each Forwarding
Control Agent more complex, and would be of no benefit in the
recommended Isolated VLANs strategy (See Section 6.2).
note: add IANA language here
10.2. Release Transactional Multicast Address Block
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Opcode=ReleaseTransactionalMulticast |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Base Multicast Group Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+
Figure 2: Release Transcactin Multicast Address Block Message
note: add IANA language here
10.3. Set Dynamic Transactional Multicast Group Membership IPV6
Bestler & Novak Expires March 14, 2015 [Page 18]
Internet-Draft Transactional Subset Multicast Groups September 2014
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Opcode=PushTransactionalMulticastMembershipIPV6 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| # members | Multicast Group Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| IPV6 Address of 1st Member |
| |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
...
Figure 3: Set Dynamic Transactional Multicast Group Membership
Message
Members: 8 bit unsigned number of IPV6 addresses that are to be the
target of this specified Multicast Group Number.
note: add IANA language here
10.4. Set Dynamic Transactional Multicast Group Membership IPV4
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Opcode=PushTransactionalMulticastMembershipIPV4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| # members | Multicast Group Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| IPV4 Address of 1st member |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
...
Figure 4: Set Dynamic Transactional Multicast Group Membership
Message
Members: 8 bit unsigned number of IPV6 addresses that are to be the
target of this specified Multicast Group Number.
note: add IANA language here
10.5. Set Persistent Transactional Multicast Groups IPv6
Bestler & Novak Expires March 14, 2015 [Page 19]
Internet-Draft Transactional Subset Multicast Groups September 2014
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Opcode=PushPersistentMulticastMembershipIPV6 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| select N | Base Multicast Group Number to be |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| # members | Reference Multicast Group Num |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| IPV6 Address of 1st Member |
| |
| |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
...
Figure 5: Set Persistent Transactional Multicast Groups Message IPV6
Members: 8 bit unsigned number of Members that are to be included in
each Transactional Subset Group set by this command.
Base Multicast Group Number to be set.
# Members in the following list of IPV6 addresses. These must all be
members of the Reference Multicast Group.
Reference Multicast Group Num: 24 bit L2 Multicast Group Number.
The motivation for supplying the list of IP addresses is to avoid
race conditions where an IGMP or MLD join is in progress. If there
were a method to refer to a specific generation of a multicast group
membership then it would be possible to omit this list. Working
Group suggestions are encouraged on this topic.
note: add IANA language here
10.6. Set Persistent Transactional Multicast Groups IPv4
Bestler & Novak Expires March 14, 2015 [Page 20]
Internet-Draft Transactional Subset Multicast Groups September 2014
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Opcode=PushPersistentMulticastMembershipIPV6 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| select N | Base Multicast Group Number to be |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| # members | Reference Multicast Group Num |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| IPV4 Address of 1st Member |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
...
Figure 6: Set Persistent Transactional Multicast Groups Message IPv4
Members: 8 bit unsigned number of Members that are to be included in
each Transactional Subset Group set by this command.
Base Multicast Group Number to be set.
# Members in the following list of IPV6 addresses. These must all be
members of the Reference Multicast Group.
Reference Multicast Group Num: 24 bit L2 Multicast Group Number.
note: add IANA language here
10.7. Refresh Persistent Transactional Multicast Group
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Opcode=RefreshMulticastMembership |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| reserve | Multicast Group Number to be Refreshed |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| reserved | Reference Multicast Group Num |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 7: Refresh Persistent Transactional Multicast Groups Message
The existing Join-directed multicast group control protocols maintain
delivery of a multicast group to the subscribers independent of
network topology changes either at Layer 2 or layer 3. If a unicast
IP datagram to a member would be delivered, then the multicast
forwarding can be expected to also be current.
Bestler & Novak Expires March 14, 2015 [Page 21]
Internet-Draft Transactional Subset Multicast Groups September 2014
Transactional subset multicast groups do not require the same effort
for maintenance. For a given transaction the entire set of datagrams
is either delivered or it is not. There is no benefit to the
application that the Forwarding Control Agent can achieve by promptly
updating the L2 multicast forwarding tables after a network topology
change. The current transaction will miss at least one datagram, and
therefore does not care if it misses multiple datagrams.
However, a Persistent Transactional Subset Mutlicast Group is used
for a sequence of transactions targeting the same group. The upper
layer protocol sender must have obtained exclusive rights to use the
group for the period of time that it will be sending the transaction.
One method that it MAY use is to obtain the exclusive right to send
the specific type of transaction to each of the members of the
targeted group during negotiations conducted prior to use of the
transactional group. For example, a reservation on inbound bandwidth
may have been granted.
The Forwarding Control Agent MAY refresh its mapping from member IP
addresses to L2 MAC address and then to L2 forwarding port at any
time. However it MUST do so after receipt of a Refresh Transactional
Subset Multicast Group for the group.
The sender of a transaction SHOULD send a Refresh Transactional
Subset Multicast Group message after it fails to receive
acknowledgement of an attempted transaction.
11. Security Considerations
The methods described here enable no sender to multicast messages to
any destination that was not already addressable by it. Therefore no
new security vulnerabilities are enabled by these techniques.
Because authentication of subset commands is kept lightweight there
is an implicit trust within the application that transactional subset
groups will be formed or selected in accordance with application
layer expectations. The transport layer lacks sufficient information
to enforce application layer expectations. If a malicious actor
deliberately creates a transactional subset multicast group with an
incorrect group it may adversely impact the operation of the specific
upper layer application. However in no case can it be used to launch
a denial of service attack on targets that have not already
voluntarily joined the reference group
The protocol does not currently provide any mechanism to guard
against selecting an existing but unrelated multicast group as a
reference multicast group. Explicitly enabling use of an existing
Bestler & Novak Expires March 14, 2015 [Page 22]
Internet-Draft Transactional Subset Multicast Groups September 2014
multicast group to be a reference group would not solve the problem
that the existing management of multicast groups is not aware of the
need to explicitly forbid creation of derived multicast groups based
upon a multicast group that it creates.
12. IANA Considerations
To be completed.
13. Summary
The proposal provides for two new methods to manage multicast group
membership, Thee are simple techniques, but provide a cohesive
cluster-wide approach to providing transactional multicasting. These
techniques are better suited for transactional multicasting that the
existing methods, IGMP and MLD, which are oriented to streaming use-
cases.
14. References
14.1. Informative References
[Replicast]
Bestler, C., "White Paper: Nexenta Replicast
http://info.nexenta.com/rs/nexenta/images/
Nexenta_Replicast_White_Paper.pdf", November 2013.
[MPI] MPI Forum, "Message Passing Inteface", 2012.
[AmazonS3]
Amazon, "Amazon Simple Storage Service (S3)
http://aws.amazon.com/s3/", 2014.
[Swift] Openstack, "OpenStack Object Service (Swift)
http://docs.openstack.org/developer/swift/", 2014.
[IEEE.802.1Qau-2011]
IEEE, "IEEE Standard for Local and Metropolitan Area
Networks: Virtual Bridged Local Area Networks - Amendment
10: Congestion Notification", IEEE Std 802.1Qau, 2011.
[IEEE.802.1Qaz-2011]
IEEE, "IEEE Standard for Local and Metropolitan Area
Networks: Virtual Bridged Local Area Networks - Amendment
18: Enhanced Transmission Selection.", IEEE Std 802.1Qaz,
2011.
Bestler & Novak Expires March 14, 2015 [Page 23]
Internet-Draft Transactional Subset Multicast Groups September 2014
[IEEE.802.1Qbb-2011]
IEEE, "IEEE Standard for Local and Metropolitan Area
Networks: Virtual Bridged Local Area Networks - Amendment
17: Priority-based Flow Control.", IEEE Std 802.1Qbb,
2011.
[RFC5661] Shepler, S., Eisler, M., and D. Noveck, "Network File
System (NFS) Version 4 Minor Version 1 Protocol", RFC
5661, January 2010.
14.2. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC3376] Cain, B., Deering, S., Kouvelas, I., Fenner, B., and A.
Thyagarajan, "Internet Group Management Protocol, Version
3", RFC 3376, October 2002.
[RFC3810] Vida, R. and L. Costa, "Multicast Listener Discovery
Version 2 (MLDv2) for IPv6", RFC 3810, June 2004.
[RFC4541] Christensen, M., Kimball, K., and F. Solensky,
"Considerations for Internet Group Management Protocol
(IGMP) and Multicast Listener Discovery (MLD) Snooping
Switches", RFC 4541, May 2006.
[RFC4604] Holbrook, H., Cain, B., and B. Haberman, "Using Internet
Group Management Protocol Version 3 (IGMPv3) and Multicast
Listener Discovery Protocol Version 2 (MLDv2) for Source-
Specific Multicast", RFC 4604, August 2006.
[RFC6325] Perlman, R., Eastlake, D., Dutt, D., Gai, S., and A.
Ghanwani, "Routing Bridges (RBridges): Base Protocol
Specification", RFC 6325, July 2011.
[RFC6518] Lebovitz, G. and M. Bhatia, "Keying and Authentication for
Routing Protocols (KARP) Design Guidelines", RFC 6518,
February 2012.
Authors' Addresses
Bestler & Novak Expires March 14, 2015 [Page 24]
Internet-Draft Transactional Subset Multicast Groups September 2014
Caitlin Bestler (editor)
Nexenta Systems
455 El Camino Real
Santa Clara, CA
US
Email: caitlin.bestler@nexenta.com,cait@asomi.com
Robert Novak
Nexenta Systems
455 El Camino Real
Santa Clara, CA
US
Email: robert.novak@nexenta.com
Bestler & Novak Expires March 14, 2015 [Page 25]