Internet DRAFT - draft-drao-l3vpn-virtual-network-overlay-multicast
draft-drao-l3vpn-virtual-network-overlay-multicast
l3vpn D. Rao
Internet-Draft V. Jain
Intended status: Standards Track Cisco Systems
Expires: January 17, 2014 July 16, 2013
L3VPN Virtual Network Overlay Multicast
draft-drao-l3vpn-virtual-network-overlay-multicast-00
Abstract
Virtual network overlays are extremely useful for supporting
multicast applications in multi-tenant data center networks, by
distributing the per-tenant multicast state and forwarding actions at
the network edges with minimal control plane load and simpler
forwarding in the core. A virtual overlay network may use existing
encapsulations such as MPLS-in-GRE or newer IP based encapsulations
such as VXLAN and NVGRE.
IP multicast sources and receivers are very commonly spread out
across multiple subnets. Sources and receivers may also be spread
within and outside a single network domain such as a data center.
Hence, a Layer-3 multicast paradigm is the most suitable and
efficient approach for delivery of IP multicast traffic.
BGP based MVPNs provide a good basis for providing a solution to
support IP multicast across these overlay networks. An appropriate
subset of the MVPN control plane and procedures are sufficient to
support the requirements, providing for a simpler model of operation.
This document describes the use of BGP based MVPNs alongwith the new
IP-based virtual network overlay encapsulations to provide a Layer-3
virtualization solution for IP multicast traffic, and specifies
mechanisms to use the new encapsulations while continuing to leverage
the BGP MVPN control plane techniques and extensions.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Rao & Jain Expires January 17, 2014 [Page 1]
Internet-Draft July 2013
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 17, 2014.
Copyright Notice
Copyright (c) 2013 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Requirements Language . . . . . . . . . . . . . . . . . . . . 3
3. Solution Requirements . . . . . . . . . . . . . . . . . . . . 4
4. Network Topology . . . . . . . . . . . . . . . . . . . . . . 4
5. Solution Scenarios . . . . . . . . . . . . . . . . . . . . . 5
6. Fine-grained Transit Pruning . . . . . . . . . . . . . . . . 6
7. Originating interest at a receiver edge device . . . . . . . 6
8. Source mapping from a sender edge device . . . . . . . . . . 7
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8
10. Security Considerations . . . . . . . . . . . . . . . . . . . 8
11. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . 8
12. References . . . . . . . . . . . . . . . . . . . . . . . . . 8
12.1. Normative References . . . . . . . . . . . . . . . . . . 8
12.2. Informative References . . . . . . . . . . . . . . . . . 8
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 8
1. Introduction
Rao & Jain Expires January 17, 2014 [Page 2]
Internet-Draft July 2013
Virtual network overlays are extremely useful for supporting
multicast applications in multi-tenant data center networks, by
distributing the per-tenant multicast state and forwarding actions at
the network edges with minimal control plane load and simpler
forwarding in the core. A virtual overlay network may use existing
encapsulations such as MPLS over GRE or newer IP based encapsulations
such as VXLAN and NVGRE.
IP multicast sources and receivers are very commonly spread out
across multiple subnets. Sources and receivers may also be spread
within and outside a single network domain such as a data center.
Hence, a Layer-3 multicast paradigm is the most suitable and
efficient approach for delivery of multicast traffic.
To send multicast data to multiple receivers across the overlay,
packets are sent on a core tree (P-tunnel) that is typically realized
using an IP multicast encapsulation such as the ones mentioned above.
In order to reduce the number of multicast P-tunnels that need to be
set up and maintained in the core network, aggregate core trees may
be used, with multiple VPNs being supported over a given P-tunnel.
The P-tunnel encapsulations such as MPLS-in-GRE, VXLAN and NVGRE
support a VN-ID or VPN label in the encapsulation header which can be
used to distinguish the VPN.
BGP based MVPNs provide a good basis for providing a solution to
support IP multicast across these overlay networks. An appropriate
subset of the MVPN control plane and procedures are sufficient to
support the requirements, providing for a simpler model of operation.
This document describes the use of BGP based MVPNs alongwith the IP-
based virtual network overlay encapsulations to provide a Layer-3
virtualization solution for IP multicast traffic, and specifies
mechanisms to use the new encapsulations while continuing to leverage
the BGP MVPN control plane techniques and extensions.
This mechanism provides an efficient incremental solution to support
forwarding for IP traffic, irrespective of whether it is destined
within or across an IP subnet boundary.
2. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to
be interpreted as described in [RFC2119] only when they appear in all
upper case. They may also appear in lower or mixed case as English
words, without any normative meaning.
Rao & Jain Expires January 17, 2014 [Page 3]
Internet-Draft July 2013
3. Solution Requirements
1. Support for large number of senders and receivers for a given
group
2. Receivers and groups spread out across large number of PEs or
edge nodes
3. Support for large number of C-multicast routes
4. Support for Multi-tenancy
5. Support for optimal multicast replication within and across
subnets
6. Support for optimal pruning on transit nodes
7. Minimal control plane churn
8. Minimal state in core
9. Support for multiple overlay encapsulations
10. Support for redundancy and load-balancing
4. Network Topology
In an environment such as the data center where overlay networks are
used, the IP multicast sources and receivers are hosts typically
resident on servers attached to network access devices such as
virtual software routers or physical access switches. These hosts
may belong to different tenants who require isolation and segregation
in forwarding state and traffic. At the same time, the physical
transport or core network must be shared for traffic from multiple
tenants.
This requires the edge devices to support multiple VPNs facing the
access interfaces, with a shared overlay encapsulation towards the
core.
Due to the high degree of traffic that flows among the hosts on
various servers within the data center, a DC environment is likely to
use uniform, densely meshed topologies, such as a spine-leaf
topology, which provide high redundancy and bandwidth. The servers
may be dually attached to a pair of access switches for edge
redundancy.
Rao & Jain Expires January 17, 2014 [Page 4]
Internet-Draft July 2013
In order to scale the core multicast, as well as to efficiently
support the common case where a large number of senders and receivers
are present in a VPN, shared trees are used for the overlay. Bidir
multicast is the preferred mode. Also, to support a large number of
VPNs efficiently, aggeegate trees are used with multiple VPNs being
multiplexed over a core tree. The various IP overlay encapsulations
such as VXLAN, NVGRE or MPLS-in-GRE contain a VN-ID or VPN label in
their header which is used as a distinguisher for a MVPN in the data
plane.
5. Solution Scenarios
There are a couple of overall scenarios which are applicable for
these virtual overlay networks.
One mode of operation is where all overlay multicast traffic is
encapsulated within a fixed number of core trees or P-tunnels that
all edge nodes or PEs can be a part of. Then both source and
receiver edge devices can independently encapsulate and decapsulate
overlay traffic, without requiring additional signaling among them.
In this mode, source edge devices map a given C-flow from a locally
attached source onto a specific core multicast tree or P-tunnel based
on a local mapping decision. Receiver edge devices join all the
available P-tunnels, and hence are able to receive traffic from these
source edge devices. They then discard the traffic that they do not
have local receivers for, and replicate the interesting flows towards
local receivers.
It is, however, desirable that all receiver edge devices do not
receive all traffic and have to filter the unnecessary flows. This
is especially applicable in high-bandwidth traffic environments. In
order to support this ability, it is required to have signaling from
a receiver edge device indicating its interest in specific C-groups.
In addition, high-bandwidth sourced traffic flows may be sent on a
specific P-tunnel which is determined by the source edge device, and
requires interested receiver edge devices to join that core tree. In
such cases, it is also beneficial if the sourced multicast traffic is
sent out into the overlay only if there are known to be receivers at
other edge devices or external to the overlay.
For all such cases, it is required to have signaling from both source
and receiver edge nodes. Signaling involves host group interest from
receiver edge nodes as well as the C-flow to P-tunnel mapping
signaling from source edge nodes.
Rao & Jain Expires January 17, 2014 [Page 5]
Internet-Draft July 2013
6. Fine-grained Transit Pruning
Typically, when an overlay is used, the transit nodes in the physical
topology only participate in the underlay control plane and forward
all traffic based on the encpasulated packets. They are unaware of
the inner header or payload. This allows them to be simpler and
scale better.
One consequence of using an overlay is that multicast traffic needs
to make it to the receiver edge nodes before they can be pruned in
case there are no downstream receivers.
A widely deployed option to avoid redundant traffic from getting to
all the receiver edge nodes is to use a larger number of core
multicast trees, thereby reducing the ratio of overlay flows to each
multicast tree, and allowing the receiver edge nodes to only join the
multicast trees that the interesting flows map to. This has the side
effect of increasing the underlay multicast state in the core, and
hence the load on the underlay multicast protocols such as PIM. It
also does not provide for complete pruning of multicast traffic.
However, it is possible in certain topologies to support an
alternative mechanism for fine-grained pruning of multi-destination
traffic.
In the uniform, meshed topologies that are used as mentioned earlier,
certain transit nodes can use the receiver interest information sent
by the receiver edge devices for the overlay, to filter traffic on
outgoing links towards the receiver edge nodes on a per-VPN or more
fine-grained basis.
This allows the core multicast trees built by PIM to be smaller in
number. The transit nodes do need to run MP-BGP to obtain the
overlay multicast group information sent by the various receiver
nodes. The transit nodes will typically obtain this information from
the RRs.
This assumes the core devices can look into the inner headers of
packets to prune traffic based on either VN-ID or *G/S,Gs. In order
for the join routes to be sent to the transit nodes, the appropriate
RTs will be provisioned on the transit nodes.
7. Originating interest at a receiver edge device
Rao & Jain Expires January 17, 2014 [Page 6]
Internet-Draft July 2013
A receiver edge device will originate shared or source tree joins on
the overlay for receivers that are locally attached or downstream to
it. This will be triggered by locally received IGMP or PIM joins.
The receiver edge device will also typically act as a PIM first-hop
or last-hop router with attached sources and receivers.
These join routes need to be propagated towards potential sources as
well as the RPs. In a virtual network overlay topology, sources are
attached to edge devices. So the other edge devices must receive
these join routes.
For a situation where there are multiple sources which are spread out
across a large number of edge devices, or for a case where the groups
are Bidir groups, a shared tree join route may be propagated to all
edge devices.
For high bandwidth sources, it is desirable to direct the specific
group or source joins to the appropriate source leafs. More granular
filtering is needed in this case - in addition to VRF, group based
active source discovery and advertisement is used to control join
propagation.
A receiver leaf uses C-multicast route of type Shared Tree Join
(C-RP, C-G) or Source Tree Join (C-S, C-G). To be able to propagate
the shared joins to all edge devices, the join routes may be
originated with an RD that is specific to the originating device.
This RD can be the same value as that used by unicast routes. The
routes are sent with a RT that all interested edge devices may use as
an import RT for this VPN, if they need to receive the join routes.
Route propagation is constrained based on policy (RT) along the path.
8. Source mapping from a sender edge device
An edge device that is attached to a source may signal the active
sources information on the overlay, along with the core multicast
group that the edge device decides to use. Receiver edge nodes can
use this information to join the core multicast trees.
This mapping is advertised by using the S-PMSI A-D routes, with the
PMSI Tunnel Attribute type indicating the appropriate overlay
encapsulation type - VXLAN or NVGRE. Aggregate trees will be used,
with the VN-ID being signaled along with the route.
Rao & Jain Expires January 17, 2014 [Page 7]
Internet-Draft July 2013
9. IANA Considerations
To be completed
10. Security Considerations
To be completed
11. Change Log
12. References
12.1. Normative References
[RFC1771] Rekhter, Y. and T. Li, "A Border Gateway Protocol 4
(BGP-4)", RFC 1771, March 1995.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P.
Traina, "Generic Routing Encapsulation (GRE)", RFC 2784,
March 2000.
[RFC5512] Mohapatra, P. and E. Rosen, "The BGP Encapsulation
Subsequent Address Family Identifier (SAFI) and the BGP
Tunnel Encapsulation Attribute", RFC 5512, April 2009.
12.2. Informative References
[I-D.mahalingam-dutt-dcops-vxlan]
Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger,
L., Sridhar, T., Bursell, M., and C. Wright, "VXLAN: A
Framework for Overlaying Virtualized Layer 2 Networks over
Layer 3 Networks", draft-mahalingam-dutt-dcops-vxlan-02
(work in progress), August 2012.
[I-D.sridharan-virtualization-nvgre]
Sridharan, M., Greenberg, A., Venkataramaiah, N., Wang,
Y., Duda, K., Ganga, I., Lin, G., Pearson, M., Thaler, P.,
and C. Tumuluri, "NVGRE: Network Virtualization using
Generic Routing Encapsulation", draft-sridharan-
virtualization-nvgre-02 (work in progress), February 2013.
Authors' Addresses
Rao & Jain Expires January 17, 2014 [Page 8]
Internet-Draft July 2013
Dhananjaya Rao
Cisco Systems
170 W. Tasman Drive
San Jose, CA 95124 95134
USA
Email: dhrao@cisco.com
Vipin Jain
Cisco Systems
170 W. Tasman Drive
San Jose, CA 95124 95134
USA
Email: vipijain@cisco.com
Rao & Jain Expires January 17, 2014 [Page 9]