L3VPN P. Kurapati
Internet-Draft M. Rodrigues
Intended status: Standards Track K. Windisch
Expires: November 25, 2013 Juniper Networks
S. Asif
AT&T LABS
May 24, 2013

Dynamic RP encodings in BGP based MVPNs
draft-kurapati-l3vpn-dynamicrp-bgpmvpn-00.txt

Abstract

PIM Group-to-RP mappings are distributed dynamically using protocols such as BSR or Auto-RP. The BGP-MVPN specification provides for this information to be encapsulated in an I-PMSI or S-PMSI provider tunnel between the PEs in an MVPN environment. Since this is control information, it is desirable to signal this information in BGP between PEs, similar to carrying other customer control state such as C-Multicast routes. This document specifies the mechanisms and procedures to carry bootstrap information via BGP to provide true control and data plane separation.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on November 25, 2013.

Copyright Notice

Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

This document may contain material from IETF Documents or IETF Contributions published or made publicly available before November 10, 2008. The person(s) controlling the copyright in some of this material may not have granted the IETF Trust the right to allow modifications of such material outside the IETF Standards Process. Without obtaining an adequate license from the person(s) controlling the copyright in such materials, this document may not be modified outside the IETF Standards Process, and derivative works of it may not be created outside the IETF Standards Process, except to format it for publication as an RFC or to translate it into languages other than English.


Table of Contents

1. Introduction

In PIM-SM [RFC4601], the Group-to-RP mapping information is distributed dynamically through methods such as BSR [RFC5059] or Auto-RP. When multicast is deployed in an VPN environment, PEs in the provider space need to carry this information transparently across its core so that CEs in all the sites can access this RP information. MVPN specification [RFC6513] defined a mechanism in section 5.3.4 where the BSR messages can be transmitted in the provider space over PMSI tunnels. However, carrying control messages like BSR in the data tunnels is not always desirable. BGP encodings in BGP-MVPN specification [RFC6514] already define mechanisms to carry C-Multicast route information in BGP. This document specifies carrying BSR Group-to-RP mapping information through BGP. Auto-RP mechanism is out of scope for this specification.

2. Terminology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].

This document uses the following terms:

"MVPN"

"Multicast in MPLS/BGP IP VPNs" [RFC6513] includes two different methods, BGP and PIM, for exchanging customer's multicast control information. This document only deals with BGP for exchanging the customer multicast control information. MVPN in the following sections refers to BGP-MVPN.

"C-Multicast routes"

MVPN customer's multicast routing information that is carried in BGP is referred in this document as C-Multicast routes.

"RP-Set"

RP-Set is the Group-to-RP mapping information distributed by BSR.

3. Motivation

3.1. Unified BGP based control plane

In BGP based MVPN, PE Auto Discovery and the exchange of PIM Join/Prune state, are a part of the customer multicast control plane and are accomplished via advertisements of BGP MVPN NLRI. BSR/Auto-RP protocols also carry a type of customer multicast control information. Carrying them in BGP is not only logical but also gives a unified BGP based control plane to carry all the Customer space control messages.

3.2. Manageability at a common provider location

A provider of MVPN services would be interested to know the customer's RP topology and the detailed mappings. This information will be critical for operations to troubleshoot and/or manage customer's multicast deployments. Like the C-Multicast routes, obtaining this information at a centralized location like a BGP Route-Reflector is desired. In the current MVPN specifications, if a provider wants such information, it needs to be obtained from the customer VRFs on all PE's importing these customer VRF's. More so, obtaining this information on each PE through intervals is prone to information lost between intervals, especially with soft-state control protocols like PIM. Since the RP-Set information is not a control message passed through BGP, obtaining this information is not possible through BGP Route Reflector.

3.3. Avoiding unnecessary Provider Tunnels

With the current specification, the RP-Set is carried via PMSI tunnels. There may be deployments which uses only S-PMSI. Carrying BSR information through S-PMSI implies creating tunnels just to carry control information. Also, if the BSR is located at the receiver-only site, an S-PMSI tunnel needs to be created with the PE at receiver site as root. This PE needs to create an S-PMSI tunnel with bursty source just for distributing control message which was otherwise not required to be setup as it is a receiver-only PE.

4. MCAST-VPN-BSR NLRI

This document defines a new NLRI called the MCAST-VPN-BSR NLRI. The format of the MCAST-VPN-BSR NLRI is identical to what is defined in BGP-MVPN [RFC6514]

      +-------------------------------------+
      |   Route Type (1 octet)              |
      +-------------------------------------+
      |   Length (1 octet)                  |
      +-------------------------------------+
      |   Route Type specific (variable)    |
      +-------------------------------------+
	  

The following 3 Route Types are defined for MCAST-VPN-BSR NLRI.

The MCAST-VPN-BSR NLRI is carried in BGP [RFC4271] with an AFI of 1 or 2 and a SAFI of MCAST-VPN-BSR. In order for two BGP speakers to exchange MCAST-VPN-BSR NLRIs, they must use a BGP Capabilities Advertisement to ensure that they both are capable of properly processing such an NLRI. This is done as specified in [RFC4760], by using capability code 1 (multiprotocol BGP) with an AFI of 1 or 2 and a SAFI of MCAST-VPN-BSR.

4.1. BSR Parameters NLRI

A BSR Parameters Route Type specific MCAST-VPN-BSR NLRI consists of the following:

      +-------------------------------------+
      |   RD   (8 octets)                   |
      +-------------------------------------+
      |   len of BSR Address (1 octet)      |
      +-------------------------------------+
      |   BSR Address (Variable length)     |
      +-------------------------------------+
      |   Hash Mask Length (1 octet)        |
      +-------------------------------------+
      |   BSR Priority (1 octet)            |
      +-------------------------------------+
      |   Originating PE's IP Address       |
      +-------------------------------------+

The RD is encoded as described in [RFC4364].

BSR Address defines the Elected/Candidate BSRs IP address extracted from the Bootstrap message. The field "len of BSR Address" determines whether it is IPv4 or IPv6. If the value is 32, the corresponding IP address is an IPv4 address. If the value is 128 it is an IPv6 address.

BSR priority is a 1 octet value as defined in BSR specification [RFC5059].

Hash Mask Length is a 1 octet value which is used for RP selection by the routers running BSR. This value MUST be taken from the BSM and copied in to the BSR Parameters route by the originating router.

Originating PE's IP Address field is set to the IP address that the PE places in the Global Administrator field of the VRF Route Import Extended Community of the VPN-IP routes advertised by the PE. For a given MVPN, a single such IP address MUST be used, and that same IP address MUST be used as the originating PE's IP address in all route types of the MCAST-VPN-BSR NLRI that the PE transmits.

The usage and details of this NLRI is discussed in "Protocol Details" section.

4.2. BSM Group Parameters NLRI

A BSM Group Parameters Route Type specific MCAST-VPN-BSR NLRI consists of the following:

      +-------------------------------------+
      |   RD   (8 octets)                   |
      +-------------------------------------+
      |   len of BSR Address (1 octet)      |
      +-------------------------------------+
      |   BSR Address (Variable length)     |
      +-------------------------------------+
      |   len of Group Address (1 Octet)    |
      +-------------------------------------+
      | Group Prefix (Encoded Group format) |
      +-------------------------------------+
      |   Originating PE's IP Address       |
      +-------------------------------------+

Group prefix can contain IPv6 or IPv4 address. Similar to RP Address, this can be determined by "len of Group Address" field. A value of 32 indicates IPv4 and 128 indicates IPv6 Group prefix. Note that the length only indicates the length of the Group address amd not the encoded group prefix.

Encoded Group prefix format is defined in PIM-SM specification [RFC4601] and the same is extracted from the BSM message and is placed in this NLRI. Along with the group prefix, the encoded group format also contains important information such as 'BiDir' and Admin Scope zone which needs to be carried across to the egress PEs.

The usage and details of this NLRI is discussed in "Protocol Details" section.

4.3. BSM RP Parameters NLRI

A BSR RP Parameters Route Type specific MCAST-VPN-BSR NLRI consists of the following:

      +-------------------------------------+
      |   RD   (8 octets)                   |
      +-------------------------------------+
      |   len of BSR Address (1 octet)      |
      +-------------------------------------+
      |   BSR Address (Variable length)     |
      +-------------------------------------+
      |   len of RP Address (1 octet)       |
      +-------------------------------------+
      |   RP Address (Encoded unicast addr) |
      +-------------------------------------+
      |   len of Group Address (1 octet)    |
      +-------------------------------------+
      |   Group Prefix(Encoded Group format)|
      +-------------------------------------+	  
      |   RP Hold Time (2 Octets)           |
      +-------------------------------------+
      |   RP Priority (1 Octet)             |
      +-------------------------------------+	  
      |   Originating PE's IP Address       |
      +-------------------------------------+

RP Address is in Encoded unicast format as defined in PIM-SM specification [RFC4601] and the same is extracted from the BSM message and is placed in this NLRI.

Hold Time is a 2 octet value in seconds, as defined in BSR specification [RFC5059]. It is a per RP value which is taken from the BSM and filled in the NLRI by the originating router.

The usage and details of this NLRI is discussed in "Protocol Details" section.

4.4. BSR-BGP Path Attribute

This document defines and uses a new BGP attribute called the "BSR-BGP attribute". This is an optional transitive BGP attribute. The format of this attribute is defined as follows:

      +-------------------------------------+
      |   Fragment Tag (2 Octets)           |
      +-------------------------------------+
      |   Group Count  (1 Octet)            |
      +-------------------------------------+
      |   RP Count  (1 Octet)               |
      +-------------------------------------+

All MCAST-VPN-BSR NLRI route types carry the BSR-BGP Path attribute. However, based on the route-type the corresponding fields are set. Fragment Tag MUST be set by all 3 route types. The usage of Fragment Tag is discussed in the details section.

BSR Parameters NLRI also sets the Group Count field indicating the number of Groups this BSM is carrying. RP Count is not used by the originator of BSR Parameters NLRI and SHOULD be ignored by the receiver of BSR Parameters NLRI.

BSM Multicast Group Parameters NLRI sets the RP Count indicating the number of RPs that this Group carries. The Group Count for this NLRI is not set and SHOULD be ignored by the receiver of BSM MUlticast Group Parameters NLRI

BSM RP parameters NLRI sets only the Fragment Tag. Group Count and RP Count fields are not set by the originator and SHOULD be ignored by the receiver of BSM RP Parameters NLRI.

5. Protocol Overview

In this section, we will discuss the overview of the mechanism to carry BSR through BGP. The full specification will be discussed in the next section. The actual processing of BSR message is not changed by this specification for BSR messages sent/received on PE-CE links. i.e. the way BSR is elected based on BSM or RP selection procedure will continue to follow the BSR specification. This document only describes how the information carried in BSR is redistributed into BGP at the ingress PEs, and how the remote PEs redistribute this information from BGP back into BSR. The same applies when the PE itself is configured as C-RP or BSR for the given VRF. When the PE is C-RP, the RP advertisements are unicasted to the elected BSR based on the BSR specification. When the PE is configured as BSR for a specific MVPN VRF, it MUST originate the BSMs as it would normally do and send them towards CE facing interfaces. PE also originates MVPN-MCAST-BSR NLRIs from the RP-Set it created by virtue of being a BSR and advertises them to other remote PEs.

5.1. Handling Bootstrap messages

BSR specification can be broadly classified into 3 stages:

(a)
BSR election
(b)
Candidate RP advertisements
(c)
Elected BSR (E-BSR) advertising the RP-Set information

Candidate RP advertisements are unicasted and hence it is not necessary to carry them in PMSI tunnels or convert them into BGP routes. On the other hand, both the empty BSMs and the BSMs sent by the Elected BSRs (E-BSR) needs to be converted into BGP routes.

5.1.1. Data for BSR Parameters Route

The Bootstrap Router (BSR) election is based on the Bootstrap Message (BSM) transmitted with BSR priority, by the candidate BSRs. Whenever a PE receives a BSM on a PE-CE link, it needs to originate a BSR Parameters Route with the BSR Address and Priority extracted from the BSM. The BSR election on the PEs is done based on BSR Parameters Route received. BSR Parameters Route also MUST contain the BSR-BGP Path Attribute with Fragment-Tag and "Group Count" fields set. "Group Count" is the number of Group prefixes this BSM is carrying. The "Group Count" field is not present in the original BSR specification and needs to be populated by the originating PE. The Fragment Tag field can be any locally generated unique value at the originating PE. The usage of this field is discussed in the next section.

5.1.2. Fragmented BSMs

BSR specification also provides a way to deal with fragmentation such that, if number of group-to-rp mappings exceed the packet size, semantic fragmentation is performed. 'Fragment tag' in the BSM distinguishes the fragments of the same BSM. If the fragmentation boundary happens to be within a group prefix, the difference of "RP Count" and "Frag RP Cnt" in the BSM determines how many more RPs are to come. When the fragmentation boundary falls at the group prefix (i.e a group range is fit entirely into a BSM fragment), then there is no way to determine if more such fragments are coming in other BSMs. An originating PE SHOULD wait for all the fragmented BSMs to arrive before propagating the same in BGP, if it is known that more fragments are coming based on "Frag RP Cnt" value. Otherwise, the PE SHOULD start converting the mappings into BGP NLRIs and advertise the routes as soon as the first BSM is received. The receiving PE will start assembling the RP-Set based on the received Group-to-RP mapping routes and send the resulting BSM to the CE. In the event of originating PE receiving subsequent fragments of the same BSM, it MUST advertise "BSR Parameters" NLRI with change in BSR-BGP path attribute reflecting the modified "Group Count". However, the existing "BSM Group Parameters" and "BSM RP Parameters" routes are still considered valid and the new routes are to be treated as incremental routes by the egress PE. This is possible because the Fragment tag value is same for the new mappings in the subsequent fragment. The details of this are discussed in the next section.

5.1.3. Mappings arriving in different BGP UPDATES

Even without fragmentation, all the Group-to-RP mappings may not fit a single BGP UPDATE message at the originating PE. The NLRIs can be split and sent into multiple BGP UPDATE messages. In such a situation, egress PEs can know whether all the mappings are recieved by using the "Group Count" and the "RP Count" in the "BSR-BGP" Path attributes present in the BSR Parameters and BSM Group Parameters NLRIs respectively. BGP Graceful Restart (GR) specification [RFC4724] also proposes End-Of-RIB marker that can be used for non GR purpose. PEs MAY use End-Of-RIB marker to indicate the completion of all the route updates to the peer. The peer to the ingress PE can be a BGP route reflector. If negotiated for the MCAST-MVPN-BSR family, a BGP peer SHOULD wait for the End-of-RIB from the peer before advertising it to the other clients. Egress PE SHOULD wait for the End-Of-RIB marker before considering the routes for BSM Group-to-RP mapping calculation and informing its corresponding CEs.

5.2. Mapping individual groups to RP

For a router or management system to determine the RP mapping for an individual multicast stream, the "PIM Group-to-Rendezvous-Point Mapping" specification [RFC6226] requires that the following be evaluated in addition to any dynamic Group-to-RP mappings: source-specific multicast (SSM) group ranges, dense mode (DM) group-ranges, embedded-RP encoded in the IPv6 group address, and statically configured Group-to-RP mappings. This specific Group-to-RP mapping given by the algorithm in RFC6226 determines the RP that a router would use for joining PIM shared trees or sending PIM Register messages for individual streams.

In addition to the dynamic RP group-range-to-RP mappings obtainable from BGP, a router or management system that needs to make these specific group-to-RP mapping decisions for individual streams is assumed to have knowledge of the same information required by RFC6226 as all the routers in the multicast domain. Specifically, it must know the source-specific multicast (SSM) group ranges, dense mode (DM) group-ranges, embedded-RP encoded in the IPv6 group address, and statically configured group-range-to-RP mappings. How it learns these is outside the scope of this document.

5.3. Triggered BSMs by egress PE

Unlike the regular BSR implementations where the BSM is flooded, egress PE in this implementation is acting as a proxy and generating BSMs. An egress PE MUST generate BSMs periodically from the mapping information taken from the MCAST-VPN-BSR NLRIs. In addition to the periodic BSMs, whenever there is a change in BSM mapping, a triggered BSM MUST be generated which will then refresh the information on the CE. Ingress PE and egress PE may not be in sync with each other in terms of timing. Assume that periodic BSM is sent at t=0 and it received a change in BGP route with one RP withdrawn. In this case, an egress PE generates a new BSM without waiting for BS_Period to expire. There MUST however be a minimum of BS_Min_Interval time between each time a BSM is sent as noted in [RFC5059]. This will cause an extra BSM to be generated towards the CE whenever there is a change in Group-to-RP mapping in the egress PE.

5.4. Reverse Path Forwarding for Dynamic RP advertisements

BSR relies on variations of Reverse Path Forwarding (RPF) to ensure that advertisement messages do not loop through the network. RPF delivery semantics must also be maintained across the BGP-MVPN service provider core for dynamic RP advertisements encoded in BGP.

The BGP NLRI to be defined for dynamic RP advertisements includes a Originating PE's IP address field which can be used by PEs receiving advertisements via BGP to conduct RPF checks when handling these advertisements. When a PE receives the MCAST-VPN-BSR NLRI for a particular MVPN from some other PE, the PE accepts the message only if the 'Originating PE's IP address' field is the selected upstream PE for the IP address of the Bootstrap router. Otherwise, the PE simply discards the update.

5.5. Interoperation with tunneled BSR

In order for this specification to be incrementally deployable in a network, PEs that implement this specification must be able to interoperate with PEs that do not. Such PEs that are not capable of advertising dynamic RP information in BGP will send tunneled BSR messages.

For example, it is possible that a BSR router could be multihomed to multiple PEs, some of which advertise dynamic RP mappings in BGP and some of which encapsulate the native packets. In such a topology, it's possible that each of the PEs connecting the site of the BSR sender will forward the redundant advertisements for the same sender to the other PEs across the core via the different protocol mechanisms. Further, it is possible that different senders are connected to PEs with differing capabilities and unique advertisements will arrive from the core at PEs via different protocol mechanisms.

In these scenarios in which there are non-capable PEs in the network, PEs sending dynamic RP advertisements via BGP may also choose to encapsulate the same advertisements as native BSR packets tunnel via the PMSIs of the BGP-MVPN for delivery to receiving PEs that are not capable of handling the dynamic RP advertisements from BGP. However, when it is known that ALL PEs are capable of dynamic RP advertisements in BGP, PEs should filter multicasted BSR messages such that they are not encapsulated in PMSI tunnels.

PEs receiving dynamic RP advertisements from the service provider core must apply RPF rules to the received advertisements regardless of the mechanism of delivery. Propagation of native BSR encapsulated advertisements by receiving PEs enabled for dynamic RP advertisements in BGP should occur as if these advertisements were received from BGP, and as specified in this document.

5.6. Route Targets for Group-to-RP mapping routes

By default the Group-to-RP mapping routes SHOULD have the same Route Targets as the VPN-IP unicast routes towards BSR/Mapping Agent/C-RP carried in these routes. An implementation SHOULD allow to modify the default via configuration. With the use of Route Target Constraint [RFC4684], the distribution of these routes can be controlled to only those PEs who have the RT configured.

5.7. BSR multihomed

When a BSR is multihomed, say to two PEs, both the PEs will originate the MCAST-VPN-BSR NLRIs. In such a case, egress PEs SHOULD take the NLRIs from the PE based on single forwarder selection procedure described in section 9.1.2 of [RFC6513].

6. Protocol Details

6.1. Originating Group-to-RP Mapping route for bootstrap messages received on a VRF

When a PE router receives a BSM message on its CE facing interface that is the RPF towards the BSR or configures an RP locally and is the elected BSR, the router will add the mappings to the local copy of Group-to-RP set. The PE router will then form BGP NLRIs as mentioned in the previous section based on the received BSR message. The PE router will determine if there were any previous advertisements from the same BSR and if there is any change in the BSM content. If the routes are already advertised and is not changed as a result of the BSR message, then the same is not re-advertised in BGP. Refer to section 3.4 of BSR specification [RFC5059] for forwarding the received BSR messages. Unless the BSR implementation requires a particular BSM to be blocked, BSR messages needs to be forwarded via BGP to the egress PEs.

In the case where a new BSR has come up, it generates a BSM with empty content. In such a case, only a "BSR Parameters" route is generated by the PE with BSR address and priority fields filled. Even if the BSR which sent empty BSM is not a preferred BSR (the current EBSR is better), "BSR Parameters" NLRI MUST be generated with this BSR address and sent to the remote PEs in order to maintain consistency with the BSR implementation. The local BSR implementation will take care of chosing the right BSR.

If a PE is rebooted or newly added, it may receive a BSM with "No-Forward" bit set or a unicasted BSM from the CE to which it formed PIM neighborship. In either case, PE MUST originate the required NLRIs from the BSM and forward the same to the remote PEs. There is no need to carry "No-Forward" bit in BGP for this scenario.

As discussed in Section 5.1.2, if the BSM is fragmented and if the fragmentation boundary is at at group prefix, there is no way to tell whether more fragments will arrive. Hence, the BSR routes are advertised as soon as the BSM is received. If another BSM is received with same "Fragment Tag" field at a later time, the BSR implementation treats this as part of the same BSM that was received earlier. Hence, these BSMs are converted to the respective BSR NLRIs and advertised to the BGP peers.

6.2. Handling changes in BSR messages

A PE router may need to withdraw a Group-to-RP mapping for which it has originated an advertisement based on several conditions. If a BSM is received from a CE with a holdtime of zero for the mapping, or if a local PE is BSR and an RP is unconfigured, then the advertisement MUST be withdrawn immediately. In addition, scenarios such as a BSM missing an RP mapping entry or missing BSMs entirely may necessitate withdrawal of advertised mappings. A change may also happen to a group, where a new group may get added or existing group may be removed. These needs to be propagated accordingly through BGP.

6.2.1. Missing RP or Group mapping entry

Assume a scenario where a given Group prefix had 100 RPs in the received BSM from a BSR. In the next periodic update after BS_Period interval, only 99 RPs are present for that group. This can happen when an RP did not go down gracefully (i.e, it did not advertise with Hold Timer = 0). The BSR implementation on the PE will continue to keep the RP mapping until the "RP Hold Time" expires. However, this needs to be communicated via BGP.

In this scenario, originating PE will continue to keep the "BSR Parameters" route unchanged. The "BSM Group Parameters" route for the respective group is re-advertised with change in the "RP Count" value in the BSR-BGP path attribute. "Fragment Tag" field in the path attribute MUST NOT be changed. Along with that, the respective "BSM RP Parameters" route (Type-3) MUST be withdrawn.

Egress PE receiving a changed Type-2 route (BSM Group Parameters) MUST wait until the RP count matches before generating the BSM towards the CE with this change. In case the "End-Of-RIB" is negotiated, this check SHOULD be performed only after the "End-Of-RIB" is received. The RP will not be removed from the local group-to-rp mapping table until the RP hold time expires. BSM that is advertised towards the CE is changed to reflect the missing RP. This scenario also handles any out of sequence messages arising. For example, if Type-3 (BSM RP Parameters) withdrawal comes before the changed Type-2 (BSM Group Parameters), the RP count check would fail making the egress PE wait until the update is complete.

Another scenario to consider is when a RP for a group is removed and a new RP is added. In this case, the "RP Count" remains same, but the BSM RP Parameters route (Type-3) corresponding to old RP is withdrawn, and Type-3 for new RP is advertised. Even if these two arrive in two different BGP updates, the corresponding checks at egress PE will ensure that a BSM is triggered only when the Type-3s for the group is matched with the RP count in the corresponding BSM Group Parameters route (Type-2).

Consider another scenario where initially the Group Count was 100. The new BSM received has the same Group Count, but one group removed and a new group added. In this situation, an ingress PE need not generate a new "BSR Parameters" (Type-1) route since the group count did not change. For the group which was removed from the BSM, the corresponding Type-2 route and its Type-3 routes MUST be withdrawn and a new BSM Group parameters route (Type-2) route with its RP parameters route (Type-3) MUST be advertised. At the egress PE, as soon as the Type-2 withdrawal comes, all the corresponding RP entries (Type-3) are placed as inactive and MUST not be considered, irrespective of whether a withdrawal for those routes are received or not. However, PEs SHOULD keep those routes until the actual withdrawals arrive. The "Group Count" and its "RP Count" per group are to be matched before BSM is generated towards the CE.

Lastly, consider a scenario where a new Group is added over the existing entries. In this case, the "BSR Parameters" route is re-advertised with modified "Group Count" value, keeping the "Fragment Tag" same. Along with that, "BSM Group Parameters" route and "BSM RP Parameters" route are generated for the new Group and RPs. Again, the egress PE MUST wait for the values to match before generating the BSM towards egress.

6.2.2. Missing BSM

Like missing a particular mapping, missing an entire BSM can also happen due to several reasons. It could be that the BSR went down ungracefully or BSM is missed due to congestion. BSR specification [RFC5059] defines a timer BS_Timeout (defaults to 2*BS_Period + 10 seconds) before declaring a BSR as dead and electing a new BSR. While most of the scenarios are taken care by the local BSR implementation on the PEs, we need to handle communicating the missing BSM between the PEs through BGP. In the scenario of missing BSM, the corresponding "BSR Parameters" (Type-1) route is withdrawn, however the corresponding "BSM Group Parameters" route (Type-2) and "BSM RP Parameters" route (Type-3) entries MUST NOT be withdrawn by the ingress PE.

Egress PE receiving a withdrawn "BSR Parameters" route (Type-1) MUST still keep the corresponding Type-2 and Type-3 entries. However, it MUST NOT advertise the BSM to the CE without the Type-1 route present. As soon as the Type-1 is withdrawn, BS_Timeout period has to be started at the egress and upon its expiry, all the Type-2 and Type-3 entries MUST be deleted.

Say the egress has generated BSM at t=0. At t=1 BS_Period expired at ingress PE and ingress PE did not get the periodic BSM. So, it withdraws type-1 (BSR Parameters). Egress PE has already generated BSM just before the type-1 withdrawal was received. The egress PE skips the next periodic BSM towards the CE. But CE is "off" by BS_Period interval by now. Once the BS_Timeout expires, egress PE removes all the type-2 and type-3 entries. CEs connected to egress PE will remove the same, a whole BS_Period later. Hence, to avoid this issue, once the BS_Timeout expires,an egress PE MUST generate a new BSM towards CE with RP hold time set to "0" for all the type-2 and type-3 entries. This will make the CEs in sync with the the PEs. After generating the BSM, PE removes all the Type-2 and Type-3 entries as stated above.

After the BSR is timed out (after BS_Timeout), when a new BSM comes from the same BSR, a new "Fragment Tag" MUST be generated by the ingress PE.

6.2.3. Change of Elected BSR

As per the BSR specification [RFC5059], when a preferred BSM is received, the current 'Elected BSR' will transfer its state to 'Candidate BSR' and forward the received BSM. All the routers will also change the elected BSR based on the preferred BSM. When an originating PE's local bootstrap module elects a new BSR, all the old Group-to-RP mapping entries advertised by the previous BSR MUST be withdrawn.

6.3. Receiving Group-to-RP Mapping routes for BSR

The PEs receiving the BGP Group-to-RP Mapping route NLRIs will act as a proxy. First step is to check if the received routes are valid. If the "Fragment Tag" present in the "BSR Parameters" route does not match with the "BSM Group Parameters" (Type-2) and "BSM RP Parameters" (Type-3) routes, then those entries are considered invalid. Similarly, if the len of BSR/RP/Group Address field contains any value other than "0","32" or "128" it MUST be considered as a malformed message and MUST be discarded. The PE MUST also run a RPF check for the BSR IP address and see if the originating PE address is the router through which the BSR is reachable. If the preferred route to the BSR is through the core, RPF check is done as per the MVPN upstream multicast hop (UMH) selection described in MVPN specification [RFC6513]. If the advertising PE is not the PE matching UMH selection, or if the preferred route to the BSR is through one of the CE interfaces, the RPF check fails and the routes MUST be ignored.

The receiving PE collects the Group-to-RP mapping routes per BSR IP address and makes an entry in its BSR Group-to-RP mapping table. From the Group-to-RP mapping per BSR, the egress PE forms a BSM message. In order to generate the BSM message, the PE need to construct certain fields such as Checksum which is not available in the advertised NLRIs.

Unlike CE PIM routers, the PE receiving Group-to-RP mapping routes via BGP will not receive periodic soft-state refreshes of the mappings every BS_Period. The receiving PE MUST generate periodic BSMs every BS_Period as specified in the BSR RFC [RFC5059]. When there is a change in the corresponding Group-to-RP mapping routes, a fresh BSM MUST be triggered after the calculation of "Group Count" and "RP Count" matches. Egress PE MUST also ensure that there is a minimum period of BS_Min_Interval between each time a BSM is sent towards the CE as noted in BSR specification [RFC5059].

When a "BSR Parameters" (Type-1) route is received with "Group Count" as 0, the PE MUST treat it as an empty BSM. An empty BSM MUST be formed and sent to the CEs with other relevant fields populated. Other aspects of electing a BSR based on the BSR priority MUST be same as what is specified in the BSR specification [RFC5059].

7. Security Considerations

Since a BSR message allows semantic fragmentation, a message can be very big with lot of mappings there by leading to PE generating several Group-to-RP mapping route NLRIs. An implementation SHOULD be able to restrict the number of Groups and RP mappings allowed on a VRF or interface level so that the number of BGP routes generated for the mapping are controlled.

8. IANA Considerations

This document defines a new NLRI, called MCAST-VPN-BSR, to be carried in BGP using multiprotocol extensions. It requires assignment of a new SAFI.

This document defines a new BGP optional transitive attribute, called BSR-BGP.

9. Acknowledgments

The authors would like to thank Huajin Jeng (AT&T), Jeffrey Haas (Juniper), Yakov Rekhter (Juniper) and Eric Rosen (Cisco) for their valuable review and feedback.

10. References

10.1. Normative Reference

[RFC4271] Rekhter, Y., Li, T. and S. Hares, "A Border Gateway Protocol 4 (BGP-4)", RFC 4271, January 2006.
[RFC4601] Fenner, B., Handley, M., Holbrook, H. and I. Kouvelas, "Protocol Independent Multicast - Sparse Mode (PIM-SM): Protocol Specification (Revised)", RFC 4601, August 2006.
[RFC4684] Marques, P., Bonica, R., Fang, L., Martini, L., Raszuk, R., Patel, K. and J. Guichard, "Constrained Route Distribution for Border Gateway Protocol/MultiProtocol Label Switching (BGP/MPLS) Internet Protocol (IP) Virtual Private Networks (VPNs)", RFC 4684, November 2006.
[RFC4724] Sangli, S., Chen, E., Fernando, R., Scudder, J. and Y. Rekhter, "Graceful Restart Mechanism for BGP", RFC 4724, January 2007.
[RFC5015] Handley, M., Kouvelas, I., Speakman, T. and L. Vicisano, "Bidirectional Protocol Independent Multicast (BIDIR-PIM)", RFC 5015, October 2007.
[RFC5059] Bhaskar, N., Gall, A., Lingard, J. and S. Venaas, "Bootstrap Router (BSR) Mechanism for Protocol Independent Multicast (PIM)", RFC 5059, January 2008.
[RFC6513] Rosen, E. and R. Aggarwal, "Multicast in MPLS/BGP IP VPNs", RFC 6513, February 2012.
[RFC6514] Aggarwal, R., Rosen, E., Morin, T. and Y. Rekhter, "BGP Encodings and Procedures for Multicast in MPLS/BGP IP VPNs", RFC 6514, February 2012.
[RFC6515] Aggarwal, R. and E. Rosen, "IPv4 and IPv6 Infrastructure Addresses in BGP Updates for Multicast VPN", RFC 6515, February 2012.
[RFC6226] Joshi, B., Kessler, A. and D. McWalter, "PIM Group-to-Rendezvous-Point Mapping", RFC 6226, May 2011.

10.2. Informative Reference

[RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private Networks (VPNs)", RFC 4364, February 2006.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.

Authors' Addresses

Pavan Kurapati Juniper Networks 1194 N. Mathilda Ave. Sunnyvale, CA , 94089 USA EMail: kurapati@juniper.net URI: http://www.juniper.net/
Marco Rodrigues Juniper Networks 1194 N. Mathilda Ave. Sunnyvale, CA , 94089 USA EMail: mprodrigues@juniper.net URI: http://www.juniper.net/
Kurt Windisch Juniper Networks 1194 N. Mathilda Ave. Sunnyvale, CA , 94089 USA EMail: kurtw@juniper.net URI: http://www.juniper.net/
Saud Asif AT&T LABS 200 S Laurel Ave. Middletown, NJ , 07748 USA EMail: sasif@att.com URI: http://www.att.com/