Internet DRAFT - draft-zzhang-rift-multicast
draft-zzhang-rift-multicast
RIFT Z. Zhang
Internet-Draft Juniper Networks
Intended status: Standards Track P. Thubert
Expires: January 14, 2021 Cisco
July 13, 2020
Multicast Routing In Fat Trees
draft-zzhang-rift-multicast-01
Abstract
This document specifies multicast procedures with RIFT. Multicast in
RIFT is similar to Bidirectional Protocol Independent Multicast (PIM-
Bidir), with the Rendezvous Point Link (RP-Link) simulated by a
spanning tree of some Top of Fabric (TOF) nodes and sub-TOF nodes.
Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in BCP
14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 14, 2021.
Copyright Notice
Copyright (c) 2020 IETF Trust and the persons identified as the
document authors. All rights reserved.
Zhang & Thubert Expires January 14, 2021 [Page 1]
Internet-Draft mrift July 2020
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Specifications . . . . . . . . . . . . . . . . . . . . . . . 4
2.1. Multicast Capability . . . . . . . . . . . . . . . . . . 4
2.2. Optional Per-neighbor Flooding Scope . . . . . . . . . . 5
2.3. Multicast TIE . . . . . . . . . . . . . . . . . . . . . . 5
2.4. Building Spanning Tree among TOFs and sub-TOFs . . . . . 6
3. Security Considerations . . . . . . . . . . . . . . . . . . . 7
4. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 7
5. References . . . . . . . . . . . . . . . . . . . . . . . . . 7
5.1. Normative References . . . . . . . . . . . . . . . . . . 7
5.2. Informative References . . . . . . . . . . . . . . . . . 8
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 8
1. Introduction
Because of the simple north-south regular topology in Fat Tree
networks, the PIM-Bidir [RFC5015] solution is extended for multicast
in RIFT (referred to as MRIFT in this document). The following is a
summary of the changes and adaptations compared to PIM-Bidir.
With PIM-Bidir, PIM joins are sent towards a Rendezvous Point Address
(RPA), which could be an address not belonging to any router. The
RPA does belong to a RP Link (RPL), which could be attached to a
single router or multiple routers (e.g. RPL is a LAN). With MRIFT,
there is no concept of RPA any more (joins are simply sent
northbound). The joins are terminated on some sub-TOF nodes and the
RPL is simulated by a spanning tree among some TOF and sub-TOF nodes.
Instead of (*,G) trees in PIM-Bidir, MRIFT uses (*,G-Prefix) trees,
where the G-Prefix could be *, G, or anything in between (e.g.,
225.1.1.0/24). For light flows, they could just follow the (*,*)
tree. For heavy flows, individual (*,G) trees could be built. For
medium flows, some (*,G-prefix) trees could be shared. All the First
Hop Routers (FHRs, connecting to sources) and the Last Hop Routers
(LHRs, connecting to receivers) of a particular (*,G) flow must agree
on whether a (*,*) or (*,G) or (*,G-prefix) tree is used for the flow
Zhang & Thubert Expires January 14, 2021 [Page 2]
Internet-Draft mrift July 2020
so that they all join the same tree. This is done via out of band
control outside the scope of this document.
Because of the rich connections in Fat Trees, a router has to choose
one of its many north neighbors to send join to. This is done
through hashing. The hashing algorithm should lead to several but
not too many routers choosing the same north neighbor, so that fewer
routers are involved in multicast traffic forwarding, yet none of
those routers are overburdened by replicating to too many downstream
neighbors.
Instead of PIM messages, RIFT's own TIEs are used, similar to the
concept in [draft-zzhang-pim-pds]. This introduces the concept of
neighbor-scoped flooding - a multicast TIE is sent only to a chosen
upstream north neighbor that consumes it and then regenerates a new
TIE for the next upstream.
When a join reaches a sub-TOF node, the normal join process stops.
This forms a sub-tree rooted at this sub-TOF node. Multiple sub-
trees of the same tree may be joined by a single TOF node, or they
may have to be connected by a spanning tree serving as the RPL. For
example, in the following topology, in normal situations the two sub-
tree roots for the two pods, say Spine111 and Spine121, may be joined
by TOF21, but if the TOF21-Spine121 link is down, then TOF22 may be
used, and if the TOF22-Spine111 link is also down, then Spine111 and
Spine121 will have to be joined via
Spine111-TOF21-Spine112-TOF22-Spine121.
Zhang & Thubert Expires January 14, 2021 [Page 3]
Internet-Draft mrift July 2020
. +--------+ +--------+ ^ N
. |TOF 21| |TOF 22| |
.Level 2 ++-+--+-++ ++-+--+-++ <-*-> E/W
. | | | | | | | | |
. P111/2| |P121 | | | | S v
. ^ ^ ^ ^ | | | |
. | | | | | | | |
. +--------------+ | +-----------+ | | | +---------------+
. | | | | | | | |
. South +-----------------------------+ | | ^
. | | | | | | | All TIEs
. 0/0 0/0 0/0 +-----------------------------+ |
. v v v | | | | |
. | | +-+ +<-0/0----------+ | |
. | | | | | | | |
.+-+----++ optional +-+----++ ++----+-+ ++-----++
.| | E/W link | | | | | |
.|Spin111+----------+Spin112| |Spin121| |Spin122|
.+-+---+-+ ++----+-+ +-+---+-+ ++---+--+
. | | | South | | | |
. | +---0/0--->-----+ 0/0 | +----------------+ |
. 0/0 | | | | | | |
. | +---<-0/0-----+ | v | +--------------+ | |
. v | | | | | | |
.+-+---+-+ +--+--+-+ +-+---+-+ +---+-+-+
.| | (L2L) | | | | Level 0 | |
.|Leaf111~~~~~~~~~~~~Leaf112| |Leaf121| |Leaf122|
.+-+-----+ +-+---+-+ +--+--+-+ +-+-----+
. + + \ / + +
. Prefix111 Prefix112 \ / Prefix121 Prefix122
. multi-homed
. Prefix
.+---------- Pod 1 ---------+ +---------- Pod 2 ---------+
2. Specifications
2.1. Multicast Capability
A new optional field is added to the NodeCapabilities to indicate
that the node is enabled for multicast:
struct NodeCapabilities {
...
4: optional bool multicast_enabled;
}
Zhang & Thubert Expires January 14, 2021 [Page 4]
Internet-Draft mrift July 2020
2.2. Optional Per-neighbor Flooding Scope
This document introduces an optional per-neighbor flooding scope for
TIEs:
struct TIEHeader {
...
13: optional common.SystemIDType flooding_scope_neighbor;
}
When a node originates a TIE with a per-neighbor flooding scope, it
is sent to the specified neighbor only. When a node receives a TIE
with per-neighbor flooding scope, it is accepted only if the node is
the specified neighbor, and it is not reflooded any further.
2.3. Multicast TIE
Currently the multicast TIEs are only N-TIEs with per-neighbor
flooding scope except on TOFs and sub-TOFs. If a multicast TIE is
received from a node south of sub-TOFs without the per-neighbor
flooding scope specified, it MUST be discarded.
Zhang & Thubert Expires January 14, 2021 [Page 5]
Internet-Draft mrift July 2020
/** TIE for multicast */
struct IPMulticastTIEElement {
/** Multicast TIEs are for (*, group-prefix) joins.
The '*' is not encoded in the TIE. */
1: required common.IPPrefixType group_prefix;
/** fields used by TOFs and sub-TOFs to build spanning tree RPL */
2: optional common.SystemIDType chosen_or_highest_parent;
3: optional list<common.SystemIDType> sub_tof_children;
}
/** Type of TIE.
...
*/
enum TIETypeType {
...
TIETypeIPMulticast = 11,
TIETypeMaxValue = 12,
}
/** Single element in a TIE.
...
*/
union TIEElement {
...
/** IP multicast elements. */
10: optional IPMulticastTIEElement ip_multicast;
}
2.4. Building Spanning Tree among TOFs and sub-TOFs
Note: this is still subject to further discussion/change. It may be
replaced by another scheme upon further discussions.
If a sub-TOF node is the root of a sub-tree for a (*, G-prefix) tree,
it hashes to a TOF neighbor as its parent for the tree, and
originates a corresponding multicast N-TIE without the per-neighbor
flooding scope - flooded to all its north TOF neighbors. The
chosen_or_highest_parent field is set to the chosen TOF neighbor.
A receiving TOF node originates a corresponding S-TIE without the
per-neighbor flooding scope. The chosen_or_highest_parent field is
set to the highest chosen_or_highest_parent of all received N-TIEs
and S-TIEs for the tree, identifying the root of all sub-trees from
that TOF node's point of view. The sub_tof_children list all of sub-
TOF nodes that have chosen the root as parent.
Zhang & Thubert Expires January 14, 2021 [Page 6]
Internet-Draft mrift July 2020
If a sub-TOF node that is the root of a sub-tree receives from TOF
neighbors some S-TIE for the same tree but with different
chosen_or_highest_parent values, it chooses, from all its TOF
neighbors that are recorded as a chosen_or_highest_parent, the one
with the highest system-id and (re)parent to that neighbor if that
neighbor is not already its parent.
After the above steps, if a TOF node remains as the chosen parent of
some sub-TOF nodes but its system-id does not match the highest
chosen_or_highest_parent of all N-TIEs and S-TIEs (i.e. the root),
the TOF node needs to join towards the root through some intermediate
sub-TOF and TOF nodes. If it has a sub-TOF neighbor listed in the
sub_tof_children of the root, it originates an S-TIE with the per-
neighbor flooding scope set to the sub-TOF neighbor, i.e. the sub-TOF
neighbor now becomes the parent of the TOF node (that is a parent of
some other sub-TOF nodes).
In case the TOF node does not have a neighbor listed in the
sub_tof_children of the S-TIE for the root, further study is needed.
It could be that the topology is so partitioned that a spanning tree
could not be built.
3. Security Considerations
To be provided.
4. Acknowledgements
The authors thank Bruno Rijsman and Antoni Przygenda for their review
and suggestions.
5. References
5.1. Normative References
[I-D.ietf-rift-rift]
Przygienda, T., Sharma, A., Thubert, P., Rijsman, B., and
D. Afanasiev, "RIFT: Routing in Fat Trees", draft-ietf-
rift-rift-12 (work in progress), May 2020.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.
Zhang & Thubert Expires January 14, 2021 [Page 7]
Internet-Draft mrift July 2020
5.2. Informative References
[I-D.zzhang-pim-pds]
Zhang, J. and K. Patel, "Protocol Dependent Multicast
Signaling", draft-zzhang-pim-pds-00 (work in progress),
October 2015.
[RFC5015] Handley, M., Kouvelas, I., Speakman, T., and L. Vicisano,
"Bidirectional Protocol Independent Multicast (BIDIR-
PIM)", RFC 5015, DOI 10.17487/RFC5015, October 2007,
<https://www.rfc-editor.org/info/rfc5015>.
Authors' Addresses
Zhaohui Zhang
Juniper Networks
EMail: zzhang@juniper.net
Pascal Thubert
Cisco Systems, Inc
EMail: pthubert@cisco.com
Zhang & Thubert Expires January 14, 2021 [Page 8]