Softwire WG | M. Xu |
Internet-Draft | Y. Cui |
Intended status: Standards Track | J. Wu |
Expires: December 11, 2019 | Tsinghua University |
S. Yang | |
Shenzhen University | |
C. Metz | |
Cisco Systems | |
June 9, 2019 |
IPv4 Multicast over an IPv6 Multicast in Softwire Mesh Network
draft-ietf-softwire-mesh-multicast-25
During the transition to IPv6, there will be scenarios where a backbone network internally running one IP address family (referred to as the internal IP or I-IP family), connects client networks running another IP address family (referred to as the external IP or E-IP family). In such cases, the I-IP backbone needs to offer both unicast and multicast transit services to the client E-IP networks.
This document describes a mechanism for supporting multicast across backbone networks where the I-IP and E-IP protocol families differ. The document focuses on IPv4-over-IPv6 scenario, due to lack of real-world use cases for IPv6-over-IPv4 scenario.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on December 11, 2019.
Copyright (c) 2019 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
During the transition to IPv6, there will be scenarios where a backbone network internally running one IP address family (referred to as the internal IP or I-IP family), connects client networks running another IP address family (referred to as the external IP or E-IP family).
One solution is to leverage the multicast functions inherent in the I-IP backbone to efficiently forward client E-IP multicast packets inside an I-IP core tree. The I-IP tree is rooted at one or more ingress Address Family Border Routers (AFBRs) [RFC5565] and branches out to one or more egress AFBRs.
[RFC4925] outlines the requirements for the softwire mesh scenario and includes support for multicast traffic. It is likely that client E-IP multicast sources and receivers will reside in different client E-IP networks connected to an I-IP backbone network. This requires the client E-IP source-rooted or shared tree to traverse the I-IP backbone network.
This could be accomplished by re-using the multicast VPN approach outlined in [RFC6513]. MVPN-like schemes can support the softwire mesh scenario and achieve a "many-to-one" mapping between the E-IP client multicast trees and the transit core multicast trees. The advantage of this approach is that the number of trees in the I-IP backbone network scales less than linearly with the number of E-IP client trees. Corporate enterprise networks, and by extension multicast VPNs, have been known to run applications that create too many (S,G) states, which is source specific states related with a specified multicast group [RFC7761][RFC7899]. Aggregation at the edge contains the (S,G) states for customer's VPNs and these need to be maintained by the network operator. The disadvantage of this approach is the possibility of inefficient bandwidth and resource utilization when multicast packets are delivered to a receiving AFBR with no attached E-IP receivers.
[RFC8114] provides a solution for delivering IPv4 multicast services over an IPv6 network. But it mainly focuses on the DS-lite [RFC6333] scenario, where IPv4 addresses assigned by a broadband service provider are shared among customers. This document describes a detailed solution for the IPv4-over-IPv6 softwire mesh scenario, where client networks run IPv4 and the backbone network runs IPv6.
Internet-style multicast is somewhat different to the [RFC8114] scenario in that the trees are source-rooted and relatively sparse. The need for multicast aggregation at the edge (where many customer multicast trees are mapped into one or more backbone multicast trees) does not exist and to date has not been identified. Thus the need for alignment between the E-IP and I-IP multicast mechanisms emerges.
[RFC5565] describes the "Softwire Mesh Framework". This document provides a more detailed description of how one-to-one mapping schemes ([RFC5565], Section 11.1) for IPv4-over-IPv6 multicast can be achieved.
Figure 1 shows an example of how a softwire mesh network can support multicast traffic. A multicast source S is located in one E-IP client network, while candidate E-IP group receivers are located in the same or different E-IP client networks that all share a common I-IP transit network. When E-IP sources and receivers are not local to each other, they can only communicate with each other through the I-IP core. There may be several E-IP sources for a single multicast group residing in different client E-IP networks. In the case of shared trees, the E-IP sources, receivers and rendezvous points (RPs) might be located in different client E-IP networks. In the simplest case, a single operator manages the resources of the I-IP core, although the inter-operator case is also possible and so not precluded.
+---------+ +---------+ | | | | +--------+ | E-IP | | E-IP +--+Source S| | network | | network | +--------+ +---+-----+ +--+------+ | | +-+--------+ +-------+--+ | | | upstream | +-| AFBR +--+ AFBR |-+ | +----------+ +----------+ | | | E-IP Multicast | I-IP transit core | packets are forwarded | | across the I-IP | +----------+ +----------+ | transit core +-|dowstream | |downstream|-+ | AFBR |--| AFBR | +--+-------+ +--------+-+ | | +---+----+ +---+----+ +--------+ | | | | +--------+ |Receiver+---+ E-IP | | E-IP +--+Receiver| +--------+ |network | |network | +--------+ +--------+ +--------+
Figure 1: Softwire Mesh Multicast Framework
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.
Terminology used in this document:
o Address Family Border Router (AFBR) - A router interconnecting two or more networks using different IP address families. Additionally, in the context of softwire mesh multicast, the AFBR runs E-IP and I-IP control planes to maintain E-IP and I-IP multicast states respectively and performs the appropriate encapsulation/decapsulation of client E-IP multicast packets for transport across the I-IP core. An AFBR will act as a source and/or receiver in an I-IP multicast tree.
o Upstream AFBR: An AFBR that is closer to the source of a multicast data flow.
o Downstream AFBR: An AFBR that is closer to a receiver of a multicast data flow.
o I-IP (Internal IP): This refers to IP address family that is supported by the core network. In this document, the I-IP is IPv6.
o E-IP (External IP): This refers to the IP address family that is supported by the client network(s) attached to the I-IP transit core. In this document, the E-IP is IPv4.
o I-IP core tree: A distribution tree rooted at one or more AFBR source nodes and branched out to one or more AFBR leaf nodes. An I-IP core tree is built using standard IP or MPLS multicast signaling protocols (in this document, we focus on IP multicast) operating exclusively inside the I-IP core network. An I-IP core tree is used to forward E-IP multicast packets belonging to E-IP trees across the I-IP core. Another name for an I-IP core tree is multicast or multipoint softwire.
o E-IP client tree: A distribution tree rooted at one or more hosts or routers located inside a client E-IP network and branched out to one or more leaf nodes located in the same or different client E-IP networks.
o uPrefix64: The /96 unicast IPv6 prefix for constructing an IPv4-embedded IPv6 unicast address [RFC8114].
o mPrefix64: The /96 multicast IPv6 prefix for constructing an IPv4-embedded IPv6 multicast address [RFC8114].
o PIMv4, PIMv6: refer to [RFC8114].
o Inter-AFBR signaling: A mechanism used by downstream AFBRs to send PIMv6 messages to the upstream AFBR.
This document focuses on the IPv4-over-IPv6 scenario, as shown in the following diagram:
+---------+ +---------+ | IPv4 | | IPv4 | +--------+ | Client | | Client |--+Source S| | Network | | Network | +--------+ +----+----+ +----+----+ | | +--+-------+ +-------+--+ | | | Upstream | +-+ AFBR +--+ AFBR |-+ | +----------+ +----------+ | | | | IPv6 transit core | | | | +----------+ +----------+ | +-+Downstream+--+Downstream+-+ | AFBR | | AFBR | +--+-------+ +-------+--+ | | +----+----+ +----+----+ +--------+ | IPv4 | | IPv4 | +--------+ |Receiver+--+ Client | | Client +--+Receiver| +--------+ | Network | | Network | +--------+ +---------+ +---------+
Figure 2: IPv4-over-IPv6 Scenario
In Figure 2, the E-IP client networks run IPv4 and the I-IP core runs IPv6.
Because of the much larger IPv6 group address space, the client E-IP tree can be mapped to a specific I-IP core tree. This simplifies operations on the AFBR because it becomes possible to algorithmically map an IPv4 group/source address to an IPv6 group/source address and vice-versa.
The IPv4-over-IPv6 scenario is an emerging requirement as network operators build out native IPv6 backbone networks. These networks support native IPv6 services and applications but in many cases, support for legacy IPv4 unicast and multicast services will also need to be accommodated.
Routers in the client E-IP networks have routes to all other client E-IP networks. Through PIMv4 messages, E-IP hosts and routers have discovered or learnt of (S,G) or (*,G)[RFC7761] IPv4 addresses. Any I-IP multicast state instantiated in the core is referred to as (S',G') or (*,G') and is separated from E-IP multicast state.
Suppose a downstream AFBR receives an E-IP PIM Join/Prune message from the E-IP network for either an (S,G) tree or a (*,G) tree. The AFBR translates the PIMv4 message into an PIMv6 message with the latter being directed towards the I-IP IPv6 address of the upstream AFBR. When the PIMv6 message arrives at the upstream AFBR, it is translated back into an PIMv4 message. The result of these actions is the construction of E-IP trees and a corresponding I-IP tree in the I-IP network. An example of the packet format and translation is provided in Section 8.
In this case, it is incumbent upon the AFBRs to perform PIM message conversions in the control plane and IP group address conversions or mappings in the data plane. The AFBRs perform an algorithmic, one-to-one mapping of IPv4-to-IPv6.
A simple algorithmic mapping between IPv4 multicast group addresses and IPv6 group addresses is performed. Figure 3 is provided as a reminder of the format:
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | 0-------------32--40--48--56--64--72--80--88--96-----------127| +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | mPrefix64 | group address | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
Figure 3: IPv4-Embedded IPv6 Multicast Address Format
An IPv6 multicast prefix (mPrefix64) is provisioned on each AFBR. AFBRs will prepend the prefix to an IPv4 multicast group address when translating it to an IPv6 multicast group address.
The construction of the mPrefix64 for Source-Specific Multicast (SSM) is the same as the construction of the mPrefix64 described in Section 5 of [RFC8114].
With this scheme, each IPv4 multicast address can be mapped into an IPv6 multicast address (with the assigned prefix), and each IPv6 multicast address with the assigned prefix can be mapped into an IPv4 multicast address. The group address translation algorithm can be referred in Section 5.2 of [RFC8114].
There are two kinds of multicast: Any-Source Multicast (ASM) and SSM. Considering that the I-IP network and E-IP network may support different kinds of multicast, the source address translation rules needed to support all possible scenarios may become very complex. But since SSM can be implemented with a strict subset of the PIM-SM protocol mechanisms [RFC7761], we can treat the I-IP core as SSM-only to make it as simple as possible. There then remain only two scenarios to be discussed in detail:
With mesh multicast, PIMv6 messages originating from a downstream AFBR need to be propogated to the correct upstream AFBR, and every AFBR needs the /96 prefix in "IPv4-Embedded IPv6 Source Address Format" [RFC6052].
To achieve this, every AFBR MUST announce the address of one of its E-IPv4 interfaces in the "v4" field [RFC6052] alongside the corresponding uPreifx46. The announcement MUST be sent to the other AFBRs through MBGP [RFC4760]. Every uPrefix64 that an AFBR announces MUST be unique. "uPrefix64" is an IPv6 prefix, and the distribution mechanism is the same as the traditional mesh unicast scenario.
As the "v4" field is an E-IP address, and BGP messages are not tunneled through softwires or any other mechanism specified in [RFC5565], AFBRs MUST be able to transport and encode/decode BGP messages that are carried over the I-IP, and whose NLRI and NH are of the E-IP address family.
In this way, when a downstream AFBR receives an E-IP PIM (S,G) message, it can translate this message into (S',G') by looking up the IP address of the corresponding AFBR's E-IP interface. Since the uPrefix64 of S' is unique, and is known to every router in the I-IP network, the translated message will be forwarded to the corresponding upstream AFBR, and the upstream AFBR can translate the message back to (S,G).
When a downstream AFBR receives an E-IP PIM (*,G) message, S' can be generated with the "source address" field set to * (wildcard value). The translated message will be forwarded to the corresponding upstream AFBR. Since every PIM router within a PIM domain MUST be able to map a particular multicast group address to the same RP when the source address is set to wildcard value (see Section 4.7 of [RFC7761]), when the upstream AFBR checks the "source address" field of the message, it finds the IPv4 address of the RP, and ascertains that this is originally a (*,G) message. This is then translated back to the (*,G) message and processed.
AFBRs are responsible for the following functions:
E-IP (*,G) and (S,G) state maintenance for an AFBR is the same as E-IP (*,G) and (S,G) state maintenance for an mAFTR described in Section 7.2 of [RFC8114]
It is possible that the I-IP transit core runs another, non-transit, I-IP PIM-SSM instance. Since the translated source address starts with the unique "Well-Known" prefix or the ISP-defined prefix that MUST NOT be used by another service provider, mesh multicast will not influence non-transit PIM-SSM multicast at all. When an AFBR receives an I-IP (S',G') message, it MUST check S'. If S' starts with the unique prefix, then the message is actually a translated E-IP (S,G) or (*,G) message, and the AFBR translate this message back to a PIMv4 message and process it.
When an AFBR wishes to propagate a Join/Prune(S,G,rpt)[RFC7761] message to an I-IP upstream router, the AFBR MUST operate as specified in Section 6.5 and Section 6.6.
Assume that one downstream AFBR has joined an RPT of (*,G) and an SPT of (S,G), and decided to perform an SPT switchover (see Section 4.2.1 of [RFC7761]). According to [RFC7761], it should propagate a Prune(S,G,rpt) message along with the periodical Join(*,G) message upstream towards the RP. However, routers in the I-IP transit core do not process (S,G,rpt) messages since the I-IP transit core is treated as SSM-only. As a result, the downstream AFBR is unable to prune S from this RPT, so it will receive two copies of the same data for (S,G). In order to solve this problem, we introduce a new mechanism for downstream AFBRs to inform upstream AFBRs of pruning any given S from an RPT.
When a downstream AFBR wishes to propagate an (S,G,rpt) message upstream, it SHOULD encapsulate the (S,G,rpt) message, then send the encapsulated unicast message to the corresponding upstream AFBR, which we call "RP'".
When RP' receives this encapsulated message, it MUST decapsulate the message as in the unicast scenario, and retrieve the original (S,G,rpt) message. The incoming interface of this message may be different to the outgoing interface which propagates multicast data to the corresponding downstream AFBR, and there may be other downstream AFBRs that need to receive multicast data of (S,G) from this incoming interface, so RP' should not simply process this message as specified in [RFC7761] on the incoming interface.
To solve this problem, we introduce an "interface agent" to process all the encapsulated (S,G,rpt) messages the upstream AFBR receives. The interface agent's RP' should prune S from the RPT of group G when no downstream AFBR is subscribed to receive multicast data of (S,G) along the RPT.
In this way, we ensure that downstream AFBRs will not miss any multicast data that they need. The cost of this is that multicast data for (S,G) will be duplicated along the RPT received by AFBRs affected by the SPT switch over, if at least one downstream AFBR exists that has not yet sent Prune(S,G,rpt) messages to the upstream AFBR.
In certain deployment scenarios (e.g. if there is only a single downstream router), the interface agent function is not required.
The mechanism used to achieve this is left to the implementation. The following diagram provides one possible solution for an "interface agent" implementation:
+----------------------------------------+ | | | +-----------+----------+ | | | PIM-SM | UDP | | | +-----------+----------+ | | ^ | | | | | | | | v | | +----------------------+ | | | I/F Agent | | | +----------------------+ | | PIM ^ | multicast | | messages | | data | | | +-------------+---+ | | +--+--|-----------+ | | | | v | v | | +--------- + +----------+ | | | I-IP I/F | | I-IP I/F | | | +----------+ +----------+ | | ^ | ^ | | | | | | | | +--------|-----|----------|-----|--------+ | v | v
Figure 4: Interface Agent Implementation Example
Figure 4 shows an example of an interface agent implementation using UDP encapsulation. The interface agent has two responsibilities: In the control plane, it should work as a real interface that has joined (*,G), representing of all the I-IP interfaces which are outgoing interfaces of the (*,G) state machine, and process the (S,G,rpt) messages received from all the I-IP interfaces.
The interface agent maintains downstream (S,G,rpt) state machines for every downstream AFBR, and submits Prune (S,G,rpt) messages to the PIM-SM module only when every (S,G,rpt) state machine is in the Prune(P) or PruneTmp(P') state, which means that no downstream AFBR is subscribed to receive multicast data for (S,G) along the RPT of G. Once a (S,G,rpt) state machine changes to NoInfo(NI) state, which means that the corresponding downstream AFBR has switched to receive multicast data of (S,G) along the RPT again, the interface agent MUST send a Join (S,G,rpt) to the PIM-SM module immediately.
In the data plane, upon receiving a multicast data packet, the interface agent MUST encapsulate it at first, then propagate the encapsulated packet from every I-IP interface.
NOTICE: It is possible that an E-IP neighbor of RP' has joined the RPT of G, so the per-interface state machine for receiving E-IP Join/Prune (S,G,rpt) messages should be preserved.
After a new AFBR requests the receipt of traffic destined for a multicast group, it will receive all the data from the RPT at first. At this time, every downstream AFBR will receive multicast data from any source from this RPT, in spite of whether they have switched over to an SPT or not.
To minimize this redundancy, it is recommended that every AFBR's SwitchToSptDesired(S,G) function employs the "switch on first packet" policy. In this way, the delay in switchover to SPT is kept as small as possible, and after the moment that every AFBR has performed the SPT switchover for every S of group G, no data will be forwarded in the RPT of G, thus no more unnecessary duplication will be produced.
In addition to Join or Prune, other message types exist, including Register, Register-Stop, Hello and Assert. Register and Register-Stop messages are sent by unicast, while Hello and Assert messages are only used between directly linked routers to negotiate with each other. It is not necessary to translate these for forwarding, thus the processing of these messages is out of scope for this document.
In addition to states mentioned above, other states exist, including (*,*,RP) and I-IP (*,G') state. Since we treat the I-IP core as SSM-only, the maintenance of these states is out of scope for this document.
Refer to Section 7.4 of [RFC8114]. If there is at least one outgoing interface whose IP address family is different from the incoming interface, the AFBR MUST encapsulate this packet with mPrefix64-derived and uPrefix64-derived IPv6 address to form an IPv6 multicast packet.
Upon encapsulation, the TTL and hop account in the outer header SHOULD be set by policy. Upon decapsulation, the TTL and hop count in the inner header SHOULD be modified by policy, it MUST NOT be incremented and it MAY be decremented to reflect the cost of tunnel forwarding. Besides, processing of TTL and hop count information in protocol headers depends on the tunneling technology, which is out of scope of this document.
The encapsulation performed by an upstream AFBR will increase the size of packets. As a result, the outgoing I-IP link MTU may not accommodate the larger packet size. It is not always possible for core operators to increase the MTU of every link, thus source fragmentation after encapsulation and reassembling of encapsulated packets MUST be supported by AFBRs [RFC5565]. PMTUD [RFC8201] SHOULD be enabled and ICMPv6 packets MUST NOT be filtered in the I-IP network. Fragmentation and tunnel configuration considerations are provided in Section 8 of [RFC5565]. The detailed procedure can be referred in Section 7.2 of [RFC2473].
Because the PIM-SM Specification is independent of the underlying unicast routing protocol, the packet format in Section 4.9 of [RFC7761] remains the same, except that the group address and source address MUST be translated when traversing an AFBR.
For example, Figure 5 shows the register-stop message format in the IPv4 and IPv6 address families.
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |PIM Ver| Type | Reserved | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IPv4 Group Address (Encoded-Group format) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IPv4 Source Address (Encoded-Unicast format) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ (1). IPv4 Register-Stop Message Format 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |PIM Ver| Type | Reserved | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IPv6 Group Address (Encoded-Group format) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IPv6 Source Address (Encoded-Unicast format) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ (2). IPv6 Register-Stop Message Format
Figure 5: Register-Stop Message Format
In Figure 5, the semantics of fields "PIM Ver", "Type", "Reserved", and "Checksum" can be referred in Section 4.9 of [RFC7761].
IPv4 Group Address (Encoded-Group format): The encoded-group format of the IPv4 group address described in Section 4.9.1 of [RFC7761]
IPv4 Source Address (Encoded-Group format): The encoded-unicast format of the IPv4 source address described in Section 4.9.1 of [RFC7761]
IPv6 Group Address (Encoded-Group format): The encoded-group format of the IPv6 group address described in Section 5.2.
IPv6 Source Address (Encoded-Group format): The encoded-unicast format of the IPv6 source address described in Section 5.3.
Softwire mesh multicast encapsulation does not require the use of any one particular encapsulation mechanism. Rather, it MUST accommodate a variety of different encapsulation mechanisms, and allow the use of encapsulation mechanisms mentioned in [RFC4925]. Additionally, all of the AFBRs attached to the I-IP network MUST implement the same encapsulation mechanism, and follow the requirements mentioned in Section 8 of [RFC5565].
The security concerns raised in [RFC4925] and [RFC7761] are applicable here.
The additional workload associated with some schemes, such as interface agents, could be exploited by an attacker to perform a DDoS attack.
Compared with [RFC4925], the security concerns should be considered more carefully: an attacker could potentially set up many multicast trees in the edge networks, causing too many multicast states in the core network. To defend against these attacks, BGP policies SHOULD be carefully configured, e.g., AFBRs only accept Well-Known prefix advertisements from trusted peers. Besides, cryptographic methods for authenticating BGP sessions [RFC7454] could be used.
This document includes no request to IANA.
Wenlong Chen, Xuan Chen, Alain Durand, Yiu Lee, Jacni Qin and Stig Venaas provided useful input into this document.