Internet DRAFT - draft-bitar-nvo3-vpn-applicability
draft-bitar-nvo3-vpn-applicability
L2VPN Working Group
Internet Draft
Intended status: Informational
Expires: April 2014
Nabil Bitar
Verizon
Florin Balus
Marc Lasserre
Wim Henderickx
John Drake Alcatel-Lucent
Juniper Networks
Ali Sajassi
Luyuan Fang
Cisco
Lucy Yong
Huawei
Yuichi Ikejiri
Susan Hare NTT Communications
ADARA
Mircea Pisica
BT
October 21, 2013
Cloud Networking: VPN Applicability and NVo3 Gap Analysis
draft-bitar-nvo3-vpn-applicability-02.txt
Abstract
Multi-tenant data centers and clouds provide computing,
storage and network resources dedicated per tenant. The
current focus in the evolution of multi-tenant data-center and
cloud networks is to (1) support a large number of tenants
with a large number of communicating systems, (2) provide
isolation among tenant virtual networks, (3) provide for
efficient network utilization, and (4) support virtual machine
mobility and network elasticity that match compute and storage
elasticity.
The NVo3 work effort is initially targeted to identify the
requirements for large multi-tenant data centers, and develop
a framework architecture that addresses those requirements. In
addition, it is targeted to identify existing or evolving
solutions used in cloud networking, their applicability to
NVo3, and any gaps that they may have in addressing the NVo3
requirements. This document describes the applicability of
existing work in various IETF Working Groups (e.g., RFCs and
drafts developed or evolving in IETF L2VPN and L3VPN Working
Groups) to cloud networking and NVo3, as well as the gaps and
problems that need to be further addressed.
Bitar, et al. Expires April 21, 2014 [Page 1]
Internet-Draft Cloud Networking VPN Applicability October 2013
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet
Engineering Task Force (IETF), its areas, and its working
groups. Note that other groups may also distribute working
documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-
Drafts as reference material or to cite them other than as
"work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed
at http://www.ietf.org/shadow.html
This Internet-Draft will expire on April 21, 2014.
Copyright Notice
Copyright (c) 2013 IETF Trust and the persons identified as
the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date
of publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with
respect to this document.
Bitar, et al. Expires April 21, 2014 [Page 2]
Internet-Draft Cloud Networking VPN Applicability October 2013
Table of Contents
1. Introduction.............................................. 4
2. General terminology....................................... 6
2.1. Conventions used in this document.................... 7
3. Brief overview of Ethernet, L2VPN and L3VPN deployments... 7
4. Generic Cloud Networking Architecture..................... 9
5. Challenges in Existing Deployments....................... 13
5.1. VLAN Space Limitation............................... 14
5.2. MAC, IP, and ARP Issues............................. 14
5.3. Per VLAN flood containment.......................... 17
5.4. Convergence and multipath support................... 17
5.5. Optimal traffic forwarding.......................... 18
5.6. Efficient multicast................................. 20
5.7. L3 virtualization................................... 21
5.8. Connectivity to existing tenant VPN sites........... 21
5.9. DC Inter-connect requirements....................... 22
5.10. VM Mobility........................................ 22
6. L2VPN Applicability to Cloud Networking.................. 24
6.1. VLANs and L2VPN toolset............................. 24
6.2. E-VPN............................................... 27
6.3. PBB and L2VPN toolset............................... 30
6.3.1. Addressing VLAN space exhaustion and MAC
explosion............................................. 32
6.3.2. Fast convergence and L2 multi-pathing.......... 32
6.3.3. Per ISID flood containment..................... 34
6.3.4. Efficient multicast support.................... 34
6.3.5. Tunneling options for PBB ELAN: Ethernet, IP and
MPLS.................................................. 34
6.3.6. Use Case examples.............................. 35
6.3.7. NVo3 applicability............................. 38
6.3.8. Connectivity to existing VPN sites and Internet 40
6.3.9. DC Interconnect................................ 43
6.3.10. Interoperating with existing DC VLANs......... 44
6.4. TRILL and L2VPN toolset............................. 46
7. L3VPN applicability to Cloud Networking.................. 47
8. VM Mobility with E-VPN................................... 50
8.1. Layer 2 Extension Solution.......................... 50
8.2. VM Default Gateway Solutions........................ 53
8.2.1. VM Default Gateway Solution 1.................. 53
8.2.2. VM Default Gateway Solution 2.................. 54
9. Solutions and Considerations for other DC challenges..... 55
9.1. Addressing IP/ARP explosion......................... 55
9.2. Optimal traffic forwarding.......................... 55
9.3. VM Mobility......................................... 55
9.4. Dynamic provisioning of network services............ 56
Bitar, et al. Expires April 21, 2014 [Page 3]
Internet-Draft Cloud Networking VPN Applicability October 2013
9.5. Considerations for Layer2 and Layer3 VPNS on End-
systems.................................................. 57
10. Operator Considerations................................. 57
11. Security Considerations................................. 58
12. IANA Considerations..................................... 58
13. References.............................................. 58
13.1. Normative References............................... 58
13.2. Informative References............................. 59
14. Acknowledgments......................................... 61
1. Introduction
The initial Data Center (DC) networks were built to address the
needs of individual enterprises and/or individual applications.
Ethernet VLANs and regular IP routing were used to provide
connectivity between compute, storage resources and the related
customer sites.
The virtualization of compute resources in a Data Center (DC)
environment provides the foundation for providing compute and
storage resources to multiple tenants (customers), and/or for
providing application services to multiple tenants. For example,
a tenant may be provided a group of Virtual Machines (VMs) that
may reside on server blades distributed throughout a DC or across
DCs. In this latter case, the DCs may be owned and operated by a
cloud service provider connected to one or more network service
providers, two or more cloud service providers each connected to
one or more network service providers, or a hybrid of DCs
operated by the customer and the cloud service provider(s). In
addition, multiple tenants may be assigned resources on the same
compute and storage hardware.
In order to provide access for multiple tenants to the
virtualized compute and storage resources, the DC network and DC
interconnect have to evolve from the basic VLAN and IP routing
architecture to provide equivalent connectivity virtualization at
a large scale.
Bitar, et al. Expires April 21, 2014 [Page 4]
Internet-Draft Cloud Networking VPN Applicability October 2013
[NVo3-problem-statement] describes the problems faced in large
multi-tenant data centers, and motivates the need for overlays to
address these problems. The main problems highlighted are: (1)
support for a large number of tenants, (2) network infrastructure
scale, (3) isolation among tenant virtual networks with
overlapping address spaces across tenants, and (4) support for
virtual machine mobility, network elasticity, and accompanying
dynamic network provisioning. [NVo3-fmwk] describes a framework
architecture for NVo3, while [NVo3-dp-reqts] and [NVo3-cp-reqts]
describe NVo3 data plane and control plane requirements,
respectively. Prior to the NVo3 effort initiation, a number of
technologies had been used to address network virtualization.
Some had also been deployed in data centers and cloud networks,
and/or had been further evolved to address requirements of large
multi-tenant data centers and cloud networks. The natural
question is how these technologies address multi-tenant cloud
networking problems as described in [NVo3-problem-statement],
what challenges or gaps they need to still further address, and
how they compare to the NVo3 architecture framework [NVo3-frmwk].
This document addresses that question. Further evolution of this
document may target a more detailed comparison of these
technologies to the evolving NVo3 data plane and control plane
requirements.
Virtual LAN bridging and Virtual Private Network (VPN)
technologies had been developed and deployed to support virtual
networks with overlapping address spaces over a common
infrastructure. Some of these technologies also use various
overlay technologies to enable the sharing and scale of an
undelay network infrastructure. Those technologies have been used
in data-center and cloud networks. However, these technologies
originally developed for relatively static environments in terms
of communicating endpoints, do not address all the requirements
arising in cloud-computing environments, and specifically multi-
tenant environments.
This document starts with a brief overview of Ethernet, Layer2
and Layer3 VPN deployments. It then describes generic data center
architecture. This architecture is used in subsequent sections as
basis for describing how different VPN technologies apply in DCs
and cloud networks, and what problems described in [Nvo3-problem-
statement] they address. In addition, it provides a comparison
Bitar, et al. Expires April 21, 2014 [Page 5]
Internet-Draft Cloud Networking VPN Applicability October 2013
among these technologies and the NVo3 architecture framework at
the functional level.
2. General terminology
Some general terminology is defined here; most of the
terminology used is from [802.1ah], [RFC4026] and [NVo3-fmwk].
Terminology specific to this memo is introduced as needed in
later sections.
DC: Data Center
ELAN: MEF ELAN, multipoint-to-multipoint Ethernet service
EVPN: Ethernet VPN as defined in [EVPN]
PBB: Provider Backbone Bridging, new Ethernet encapsulation
designed to address VLAN exhaustion and MAC explosion issues;
specified in IEEE 802.1ah [802.1ah]
PBB-EVPN: defines how EVPN can be used to transport PBB frames
BMAC: Backbone MACs, the backbone source or destination MAC
address fields defined in the 802.1ah provider MAC
encapsulation header.
Bitar, et al. Expires April 21, 2014 [Page 6]
Internet-Draft Cloud Networking VPN Applicability October 2013
CMAC: Customer MACs, the customer source or destination MAC
address fields defined in the 802.1ah customer MAC
encapsulation header.
BEB: A backbone edge bridge positioned at the edge of a
provider backbone bridged network. It is usually the point in
the network where PBB encapsulation is added or removed from
the frame.
BCB: A backbone core bridge positioned in the core of a
provider backbone bridged network. It performs regular
Ethernet switching using the outer Ethernet header.
2.1. Conventions used in this document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described
in RFC-2119 [RFC2119].
In this document, these words will appear with that
interpretation only when in ALL CAPS. Lower case uses of
these words are not to be interpreted as carrying RFC-2119
significance.
3. Brief overview of Ethernet, L2VPN and L3VPN deployments
Initial Ethernet networks have been deployed in LAN
environments, where the total number of hosts (hence MAC
addresses) to manage was limited. Physical Ethernet topologies
in LANs were pretty simple. Hence, a simple loop resolution
protocol such as the Spanning Tree Protocol (STP) was
Bitar, et al. Expires April 21, 2014 [Page 7]
Internet-Draft Cloud Networking VPN Applicability October 2013
sufficient in the early days. Efficient utilisation of
physical links was not a major concern in LANs, while at the
same time leveraging existing and mature technologies.
As more hosts got connected to a LAN, or the need arose to
create multiple LANs on the same physical infrastructure, it
became necessary to partition the physical topology into
multiple Virtual LANs (VLANs) [802.1q]. A VLAN is identified
by a VLAN ID in the 802.1q VLAN tag inserted in the Ethernet
header. STP evolved to cope with multiple VLANs with Multiple-
STP (MSTP). Bridges/Switches evolved to learn behind which
VLAN specific MACs resided, a process known as qualified
learning, requiring MACs to be unique only in the VLAN
context. As Ethernet LANs moved into the provider space, the
12-bit VLAN space limitation (i.e. a total of 4094 VLANs,
VLANs 0 and 4095 reserved) led to VLAN stacking (Q-in-Q) and
later to Provider backbone Bridging (PBB).
With PBB, not only can over 16M virtual LAN instances (24-bit
Service I-SID) be supported, but also a clean separation
between customer and provider domains has been defined with
separate MAC address spaces (Customer-MACs (CMACs) versus
Provider Backbone-MACs (BMACs)). CMACs are only learned at the
edge of the PBB network on PBB Backbone Edge Bridges (BEBs) in
the context of an I-component while only B-MACs are learnt by
PBB Backbone Core Bridges (BCBs). This results in BEB switches
creating MAC-in-MAC tunnels to carry customer traffic, thereby
hiding C-MACs in the core.
In the meantime, interconnecting L2 domains across
geographical areas has become a necessity. VPN technologies
have been defined to carry both L2 and L3 traffic across
IP/MPLS core networks. The same technologies could also be
used within the same data center to provide for scale or for
interconnecting services across L3 domains, as needed. Virtual
Private LAN Service (VPLS) has been used to provide
transparent LAN services over IP/MPLS WANs while IP VPNs,
including BGP/MPLS IP VPNs and IPsec VPNs, have been used to
provide virtual IP routing instances over a common IP/MPLS
core network.
Bitar, et al. Expires April 21, 2014 [Page 8]
Internet-Draft Cloud Networking VPN Applicability October 2013
All these technologies have been combined to maximize their
respective benefits. At the edge of the network, such as in
access networks, VLAN and PBB are commonly used technologies.
Aggregation networks typically use VPLS or BGP/MPLS IP VPNs to
groom traffic on a common IP/MPLS core.
It should be noted that Ethernet has kept evolving because of
its attractive features, specifically its auto-discovery
capabilities and the ability of hosts to physically relocate
on the same LAN without requiring renumbering. In addition,
Ethernet switches have become commodity, creating a financial
incentive for interconnecting hosts in the same community with
Ethernet switches. The network layer (layer3), on the other
hand, has become pre-dominantly IP. Thus, communication across
LANs uses IP routing.
4. Generic Cloud Networking Architecture
A generic architecture for Cloud Networking is depicted in
Figure 1.
Bitar, et al. Expires April 21, 2014 [Page 9]
Internet-Draft Cloud Networking VPN Applicability October 2013
,---------.
,' `.
( IP/MPLS )
`. ,'
`-+------+'
+--+--+ +-+---+
| GW |+-+| GW |
+-+---+ +-----+
/ \
+----+---+ +---+-----+
| Core | | Core |
| SW/Rtr | | SW/Rtr |
+-+----`.+ +-+---+---+
/ \ .' \
+---+--+ +-`.+--+ +--+----+
| ToR | | ToR | | ToR |
+-+--`.+ +-+-`.-+ +-+--+--+
.' \ .' \ .' `.
__/_ _i./ i./_ _\__
:VSw : :VSw : :VSw : :VSw :
'----' '----' '----' '----'
Figure 1 : A Generic Architecture for Cloud Networking
A cloud network is composed of intra-Data Center (DC) networks
and network services, and inter-DC network connectivity. DCs
may belong to a cloud service provider connected to one or
more network service providers, different cloud service
providers each connected to one or more network service
providers, or a hybrid of DCs operated by the enterprise
customers and the cloud service provider(s). It may also
provide access to the public and/or enterprise customers.
The following network components are present in a DC:
- VSw or virtual switch: software based Ethernet switch
running inside the server blades. VSw may be single or
dual-homed to the Top of Rack switches (ToRs). The
individual VMs appear to a VSw as IP hosts connected via
Bitar, et al. Expires April 21, 2014 [Page 10]
Internet-Draft Cloud Networking VPN Applicability October 2013
logical Ethernet interfaces. The VSw may evolve to
support IP routing functionality.
- ToR or Top of Rack: hardware-based Ethernet switch
aggregating all Ethernet links from the server blades in
a rack representing the entry point in the physical DC
network for the hosts. ToRs may also perform routing
functionality. ToRs are usually dual-homed to the Core
SW. Other deployment scenarios may use an EoR (End of
Row) switch to provide a similar function as a ToR.
- Core SW (switch): high capacity core node aggregating
multiple ToRs. This is usually a cost effective Ethernet
switch. Core switches can also support IP routing
capabilities.
- DC GW: gateway to the outside world providing DC
Interconnect and connectivity to Internet and VPN
customers. In the current DC network model, this may be a
Router with Virtual Routing capabilities and/or an IPVPN
[RFC4364]/L2VPN [RFC7461][RFC[4762] Provider Edge [PE].
A DC network also contains other network services, such as
firewalls, load-balancers, IPsec gateways, and SSL
acceleration gateways. These network services are not
currently discussed in this document as the focus is on the
routing and switching services. The traditional DC deployment
employs VLANs to isolate different VM groups throughout the
Ethernet switching network within a DC. The VM Groups are
mapped to VLANs in the vSws. The ToRs and Core SWs may employ
VLAN trunking to eliminate provisioning touches in the DC
network. In some scenarios, IP routing is extended down to the
ToRs, and may be further extended to the hypervisor as
discussed earlier. However, this routing unless it provides
for virtual forwarding function, it would require it to be
limited to one IP domain addressed from the same address
realm.
Bitar, et al. Expires April 21, 2014 [Page 11]
Internet-Draft Cloud Networking VPN Applicability October 2013
Any new DC and cloud networking technology should to be able
to fit as seamlessly as possible with this existing DC model,
at least in a non-greenfield environment. In particular, it
should be possible to introduce enhancements to various tiers
in this model in a phased approach without disrupting the
other elements.
Depending upon the scale, DC distribution, operations model,
Capex and Opex aspects, DC switching elements can act as
strict L2 switches and/or provide IP routing capabilities,
including VPN routing and/or MPLS support if the operation
environment allows. In smaller DCs, it is likely that some
tier layers get collapsed, and that Internet connectivity,
inter-DC connectivity and VPN support will be handled by Core
Nodes that perform the DC GW role as well.
The DC network architecture described in this section can be
used to provide generic L2-L3 service connectivity to each
tenant as depicted in Figure 2:
---+--. ---+---
....( VRF1 )...... ( VRF2 )
| '-----' | '-----'
| Tenant1 |ELAN12 Tenant1|
|ELAN11 ....|........ |ELAN13
'':'''''''':' | | '':'''''''':'
,'. ,'. ,+. ,+. ,'. ,'.
(VM )....(VM ) (VM )... (VM ) (VM )....(VM )
`-' `-' `-' `-' `-' `-'
Figure 2 : Logical Service connectivity for one tenant
In this example one or more virtual routing contexts
distributed on multiple DC GWs and one or more ELANs (e.g.,
one per Application) running on DC switches are assigned for
DC tenant 1. ELAN is a generic term for Ethernet multipoint
service, which in the current DC environment is implemented
Bitar, et al. Expires April 21, 2014 [Page 12]
Internet-Draft Cloud Networking VPN Applicability October 2013
using 12-bit VLAN tags. Other possible ELAN technologies are
discussed in section 6.
For a multi-tenant DC, this type of service connectivity or a
variation could be used for each tenant. In some cases only L2
connectivity is required, i.e., only an ELAN may be used to
interconnect VMs and customer sites.
5. Challenges in Existing Deployments
This section summarizes the challenges faced with the present
mode of operation described in the previous section and the
issues arising for next generation DC networks as described in
[NVo3-problem-statement].
With the introduction of multi-tenant DCs, providing each
tenant dedicated virtual compute and storage resources and/or
application services, the DC network must also provide each
tenant access to these resources and services. In addition,
some tenants may require some aspect of their services
available to other businesses in a B-to-B model or to the
public. Every tenant requires service connectivity to its own
resources with secure separation from other tenant domains.
Connectivity needs to support various deployment models,
including interconnecting customer-hosted data center
resources to Cloud Service Provider (CSP) hosted resources
(Virtualized DC for the tenant). This connectivity may be at
layer2 or layer3.
Currently, large DCs are often built on a service architecture
where each tenant is assigned two or more VLANs. VLANs
configured in Ethernet edge and core switches are
interconnected by IP routing running in a few centralized
routers. There may be some cases though where IP routing might
be used in the DC core nodes or even in the TORs inside a DC.
Bitar, et al. Expires April 21, 2014 [Page 13]
Internet-Draft Cloud Networking VPN Applicability October 2013
5.1. VLAN Space Limitation
Existing DC deployments provide customer separation and flood
containment, including support for DC infrastructure
interconnectivity, using Ethernet VLANs [802.1q]. A 12-bit
VLAN tag provides support for a maximum of 4094 VLANs.
4094 VLANs are inadequate for a CSP looking to expand its
customer base. For example, there are a number of VPN
deployments (VPLS and IP VPN) that serve more than 20K
customers. If a VPN service provider with 20K VPN customers
wants to provide cloud services to these customers or teams up
with an independent CSP that does, and If 50% (10k) of these
customers are likely to become cloud customers each requiring
multiple VLANs in a multi-tenant DC, 4094 VLANs will not be
able to support the demand. In general, 4094 VLANs will
support less than 4K tenants in a multi-tenant DC unless
constraints are imposed on the VM placement so that the DC is
subdivided into multiple non-congruent domains, each with 4K
VLANs.
The cloud networking infrastructure needs to provide support
for a much bigger number of virtual Layer2 (L2) domains than
4K, as discussed in [NVo3-problem-statement] Section 2.7,
allowing for resource placement flexibility and efficient
resource utilization as discussed in [NVo3-problem-statement]
Section 2.2.
5.2. MAC, IP, and ARP Issues
Virtual Machines are the basic compute blocks provided to
cloud tenants. Every server blade typically supports 16-40 VMs
today with 100 or more VMs per server blade possibly becoming
common in the near future. Every VM may have multiple
Bitar, et al. Expires April 21, 2014 [Page 14]
Internet-Draft Cloud Networking VPN Applicability October 2013
interfaces for provider and enterprise management, VM mobility
and tenant access, each with its own MAC and IP addresses. For
a sizable DC, this may translate into millions of VM IP and
MAC addresses. From a cloud network viewpoint, this scale
number will be an order of magnitude higher.
Supporting this amount of IP and MAC addresses, including the
associated dynamic behavior (e.g., ARP), throughout the DC
Ethernet switches and routers is very challenging in an
Ethernet VLAN and regular routing environment.
A Core Ethernet switch supporting VLAN bridging domains
[802.1q] learns the MAC addresses for every single VM
interface that sends traffic through the switch albeit in the
context of VLANs to which these MACs belong. VLANs, as
discussed earlier, provide for MAC address separation across
tenants and therefore address the problem in [NVo3-problem-
statement] Section 2.5 for L2 bridged domains. Throwing
memory to increase the MAC Forwarding DataBase (FDB) size
affects the cost of these switches, and there could still be a
scale constraint. MAC address table scale is highlighted in
[NVo3-problem-statemnt] Section 2.3. In addition, as the
number of MACs that switches need to learn increases,
convergence time could increase, and flooding activity will
increase upon a topology change as the core switches flush and
re-learn the MAC addresses. Simple operational mistakes may
lead to duplicate MAC entries within the same VLAN domain and
security issues due to administrative MAC assignment used
today for VM interfaces. Similar concerns about memory
requirements and related cost apply to DC Edge switches
(ToRs/EoRs) and DC GWs.
From a router perspective, it is important to maximize the
utilization of available resources in both control and data
planes through flexible mapping of VMs and related VLANs to
routing interfaces. This is not easily done in the current
VLAN based deployment environment where the use of VLAN
trunking limits the allocation of VMs to only local routers.
Bitar, et al. Expires April 21, 2014 [Page 15]
Internet-Draft Cloud Networking VPN Applicability October 2013
The amount of ARP traffic grows linearly with the number of
hosts on a LAN. For 1 million VM hosts, it can be expected
that the amount of ARP traffic will be in the range of half
million ARPs per second at the peak, which corresponds to over
200 Mbps of ARP traffic [MYERS]. Similarly, on a server, the
amount of ARP traffic grows linearly with the number of
virtual L2 domains/ELANs instantiated on that server and the
number of VMs in that domain. Besides the link capacity
wasted, which may be small compared to the link capacities
deployed in DCs, the computational burden may be prohibitive.
In a large-DC environment, the large number of hosts and the
distribution of ARP traffic may lead to a number of
challenges:
- Processing overload and overload of ARP entries on the
Server/Hypervisor. This is caused by the increased number
of VMs per server blade and the size of related ELAN
domains. For example, a server blade with 100 VMs, each
in a separate L2 domain with 100 VMs each would need to
support 10K ARP entries and the associated ARP processing
while performing the other compute tasks.
- Processing overload and exhaustion of ARP entries on the
Routers/PEs and any other L3 Service Appliances (Firewall
(FW), Load-Balancer (LB) etc.). This issue is magnified
by the L3 virtualization at the service gateways. For
example, a gateway PE handling 10K ELANs each with 10 VMs
will result in 100K hosts sending/receiving traffic
to/from the PE, thus requiring the PE to learn 100K ARP
entries. It should be noted that if the PE supports
Integrated Routing and Bridging (IRB), it must support
the associated virtual IP RIBs/FIBs and MAC FDBs for
these hosts in addition to the ARP entries.
- Flood explosion throughout Ethernet switching network.
This is caused by the use of VLAN trunking and implicitly
by the lack of per VPN flood containment.
DC and DC-interconnect technologies, including control
plane, that minimize the negative impact of ARP, MAC and IP
entry explosion on individual network elements in a DC or
cloud network hierarchy are needed.
Bitar, et al. Expires April 21, 2014 [Page 16]
Internet-Draft Cloud Networking VPN Applicability October 2013
5.3. Per VLAN flood containment
From an operational perspective, DC operators try to minimize
the provisioning touches required for configuring a VLAN
domain by employing VLAN trunks on the L2 switches. This comes
at the cost of flooding broadcast, multicast and unknown
unicast frames outside of the boundaries of the actual VLAN
domain. Containment of a broadcast domain identified by a VLAN
ID to a POD, and connecting a broadcast domain to a local
router limits the L2 broadcast domain span but also limits the
flexibility of placing VMs across PODs in a DC or a cloud.
This is the problem identified in [NVo3-problem-statement]
Section 3.4.
The cloud-networking infrastructure needs to prevent
unnecessary traffic from being sent/leaked to undesired
locations.
5.4. Convergence and multipath support
Spanning Tree is used in the current DC environment for loop
avoidance in the Ethernet switching domain.
STP can take 30 to 50 seconds to repair a topology. Practical
experience shows that Rapid STP (RSTP) can also take multiple
seconds to converge, such as when the root bridge fails.
STP eliminates loops by disabling ports. The result is that
only one path is used to carry traffic. The capacity of
disabled links cannot be utilized, leading to inefficient use
of resources.
Bitar, et al. Expires April 21, 2014 [Page 17]
Internet-Draft Cloud Networking VPN Applicability October 2013
In a small DC deployment, multi-chassis LAG (MC-LAG) support
may be sufficient initially to provide for loop-free
redundancy as an STP alternative. However, in medium or large
DCs it is challenging to use MC-LAGs solely across the network
to provide for resiliency and loop-free paths without
introducing a layer2 routing protocol: i.e. for multi-homing
of server blades to ToRs, ToRs to Core SWs, Core SWs to DC
GWs. MC-LAG may work as a local mechanism but it has no
knowledge of the end-to-end paths so it does not provide any
degree of traffic steering across the network.
Efficient and mature link-state protocols, such as IS-IS,
provide rapid failover times, can compute optimal paths and
can fully utilize multiple parallel paths to forward traffic
between 2 nodes in the network.
Unlike OSPF, IS-IS runs directly at L2 (i.e. no reliance on
IP) and does not require any configuration. Therefore, IS-IS
based DC networks are to be favored over STP-based networks.
IEEE Shortest Path Bridging (SPB), based on IEEE 802.1aq and
IEEE 802.1Qbp, and IETF TRILL [RFC6325] are technologies that
enable Layer2 networks using IS-IS for Layer2 routing.
5.5. Optimal traffic forwarding
Optimal traffic forwarding requires (1) efficient utilization
of all available link capacity in a DC and DC-interconnect,
and (2) traffic forwarding on the shortest path between any
two communicating VMs within the DC or across DCs.
Optimizing traffic forwarding between any VM pair in the same
virtual domain is dependent on (1) the placement of these VMs
and their relative proximity from a network viewpoint, and (2)
the technology used for computing the routing/switching path
Bitar, et al. Expires April 21, 2014 [Page 18]
Internet-Draft Cloud Networking VPN Applicability October 2013
between these VMs. The latter is especially important in the
context of VM Mobility, moving a VM from one network location
to another, while maintaining its layer2 and Layer3 (IP)
addresses.
Ethernet-based forwarding between two VMs in traditional DCs
relies on the MAC-destination Address that is unique per VM
interface in the context of a virtual domain (e.g., VLAN). In
traditional IEEE technologies (e.g., 802.1q, 802.1ad, 802.1ah)
and IETF L2VPN (i.e., VPLS), Ethernet MAC reachability is
always learnt in the data plane. Other IEEE and IETF
technologies allow MAC reachability to be learnt in the
control plane as discussed further in Section 6. . In all
these cases, it is important that as a VM is moved from one
location to another: (1) VM MAC reachability convergence
happens fast to minimize traffic black-holing, and (2)
forwarding takes the shortest path.
IP-based forwarding relies on the destination IP address. ECMP
load balancing relies on flow-based criteria. An IP host
address is unique per VM interface. However, hosts on a LAN
share a subnet mask, and IP routing entries are based on that
subnet address. Thus, when VMs are on the same LAN and
traditional forwarding takes place, these VMs forward traffic
to each other by relying on ARP or IPv6 Neighbor discovery to
identify the MAC address of the destination and on the
underlying layer2 network to deliver the resulting MAC frame
to is destination. However, when VMs, as IP hosts across
layer2 virtual domains, need to communicate they rely on the
underlying IP routing infrastructure.
In addition, when a DC is an all-IP DC, VMs are assigned a
host address with /32 subnet in the IPv4 case, or /64 or /128
host address in the IPv6 case, and rely on the IP routing
infrastructure to route the IP packets among VMs. In this
latter case, there is really no need for layer2 awareness
potentially beyond the hypervisor switch at the server hosting
the VM. In either case, when a VM moves location from one
physical router to another while maintaining its IP identity
(address), the underlying IP network must be able to route the
Bitar, et al. Expires April 21, 2014 [Page 19]
Internet-Draft Cloud Networking VPN Applicability October 2013
traffic to the destination and must be able to do that on the
shortest path.
Thus, in the case of IP address aggregation as in a subnet,
optimality in traffic forwarding to a VM will require
reachability to the VM host address rather than only the
subnet. That is what is often referred to as punching a hole
in the aggregate at the expense of routing and forwarding
table size increase.
As in layer2, layer3 may capitalize on hierarchical tunneling
to optimize the routing/FIB resource utilization at different
places in the network. If a hybrid of subnet-based routing and
host-based routing (host-based routing here is used to refer
to hole-punching in the aggregate) is used, then during VM
mobility, routing transition can take place, and traffic may
be routed to a location based on subnet reachability or to a
location where the VM used to be attached. In either of these
cases, traffic must not be black-holed. It must be directed
potentially via tunneling to the location where the VM is.
This requires that the old routing gateway knows where the VM
is currently attached. How to obtain that information can be
based on different techniques with tradeoffs. However, this
traffic triangulation is not optimal and must only exist in
the transition until the network converges to a shortest path
to the destination.
5.6. Efficient multicast
STP bridges typically perform IGMP and/or PIM snooping in
order to optimize multicast data delivery. However, this
snooping is performed locally by each bridge following the STP
topology where all the traffic goes through the root bridge.
This may result in sub-optimal multicast traffic delivery. In
addition, each customer multicast group is associated with a
forwarding tree throughout the Ethernet switching network.
Solutions must provide for efficient Layer2 multicast. In an
all-IP network, explicit multicast trees in the DC network can
Bitar, et al. Expires April 21, 2014 [Page 20]
Internet-Draft Cloud Networking VPN Applicability October 2013
be built via multicast signaling protocols (e.g., PIM-SSM)
that follows the shortest path between the destinations and
source(s). In an IPVPN context, Multicast IPVPN based on
[MVPN] can be used to build multicast trees shared among
IPVPNs, specific to VPNs, and/or shared among multicast groups
across IPVPNs.
5.7. L3 virtualization
In order to provide tenant L3 separation while supporting
overlapping IP addressing and privacy across tenants, as
discussed in [NV03-roblem-statement] Section 2.5, a number of
schemes were implemented in the DC environment. Some of these
schemes, such as double NATing are operationally complex and
prone to operator errors. Virtual Routing contexts, Virtual
Device contexts, or dedicated hardware-routers are positioned
in the DC environment as an alternative to these mechanisms.
Every customer is assigned a dedicated routing context with
associated control plane protocols. For instance, every
customer gets an IP routing instance controlled by its own
routing. Assigning virtual or hardware routers to each
customer, while supporting thousands of customers in a DC,
is neither scalable nor cost-efficient. Section 6 further
discusses the applicability of BGP/MPLS IP VPNs to
L3vitualization.
5.8. Connectivity to existing tenant VPN sites
It is expected that cloud services will have to span larger
geographical areas in the near future and that existing VPN
customers will require access to VM and storage facilities
for virtualized data center applications. Hence, the DC
network virtualization must interoperate with deployed and
evolving VPN solutions (e.g., IP VPN, VPLS, VPWS, PBB-VPLS,
E-VPN and PBB-EVPN).
Bitar, et al. Expires April 21, 2014 [Page 21]
Internet-Draft Cloud Networking VPN Applicability October 2013
Section 6 discusses this type of connectivity.
5.9. DC Inter-connect requirements
Cloud computing requirements such as VM Mobility across DCs,
Management connectivity, and support for East-West traffic
between customer applications located in different DCs imply
that inter-DC connectivity must be supported. These DCs can be
part of a hybrid cloud operated by the cloud service
provider(s) and/or the end-customers.
Mature VPN technologies can be used to provide L2/L3 DC
interconnect among VLANs/virtual domains located in different
DCs. DC-interconnect using existing VPN technologies is
described in Section 6.
5.10. VM Mobility
The ability to move VMs within a resource pool, whether it is
a local move within the same DC to another server or to a
distant DC, offers multiple advantages for a number of
scenarios, for example:
- In the event of a possible natural disaster, moving VMs to a
safe DC location decreases downtime and allows for meeting
Service Level Agreement (SLA) requirements.
-
- Optimized resource location: VMs can be moved to locations
that offer significant cost reduction (e.g. power savings),
or locations close to the application users. They can also
be moved to simply load-balance across different locations.
Bitar, et al. Expires April 21, 2014 [Page 22]
Internet-Draft Cloud Networking VPN Applicability October 2013
When VMs change location, it is often important to maintain
the existing client sessions. The VM MAC and IP addresses must
be preserved, and the state of the VM sessions must be copied
to the new location.
Current VM mobility tools like VMware VMotion require L2
connectivity among the hypervisors on the servers
participating in a VMotion pool. This is in addition to
"tenant ELAN" connectivity that provides for communication
between the VM and the client(s).
A VMotion ELAN might need to cross multiple DC networks to
provide the required protection or load-balancing. In
addition, in the current VMotion procedure, the new VM
location must be part of the tenant ELAN domain. When the new
VM is activated, a Gratuitous ARP is sent so that the MAC FIB
entries in the tenant ELAN are updated to direct traffic
destined to that VM to the new VM location. In addition, if a
portion of the path requires IP forwarding, the VM
reachability information must be updated to direct the traffic
on the shortest path to the VM.
VM mobility requirements may be addressed through the use of
Inter-DC VLANs to address VMotion and "tenant ELANs". However,
expanding "tenant ELANs" across two or more DCs will
accelerate VLAN exhaustion and MAC explosion issues. In
addition, STP needs to run across DCs leading to increased
convergence times and the blocking of expensive WAN bandwidth.
VLAN trunking used throughout the network creates
indiscriminate flooding across DCs.
L2 VPN solutions over IP/MPLS are designed to interconnect
sites located across the WAN as described in Section 6.
Bitar, et al. Expires April 21, 2014 [Page 23]
Internet-Draft Cloud Networking VPN Applicability October 2013
6. L2VPN Applicability to Cloud Networking
The following sections will discuss different solution
alternatives, re-using IEEE and IETF technologies that can
provide a gradual migration path from the current Ethernet
switching VLAN-based model to more advanced Ethernet switching
and IP/MPLS based models. In addition, they discuss how these
solutions compare to the NVo3 framework [NVo3-fmwk] and the
problems in [Nvo3-problem-statement] that they would still
need to address. This evolution is targeted to address inter-
DC requirements, cost considerations, and the efficient use of
processing/memory resources on DC networking components.
6.1. VLANs and L2VPN toolset
One approach to address some of the DC challenges discussed in
the previous section is to gradually deploy additional
technologies within existing DC networks. For example, an
operator may start by breaking its DC VLAN domains into
different VLAN islands so that each island can support up to
4K VLANs. VLAN Domains can then be interconnected via VPLS
using the DC GW as a VPLS PE [RFC4761][RFC4762]. An ELAN
service can be identified with one VLAN ID in one island and
another VLAN ID in another island with the appropriate VLAN ID
processed at the GW.
As the number of tenants in individual VLAN islands surpasses
4K and no further sub-division of VLAN domains is feasible or
desired, the operator could push VPLS deployment deeper in the
DC network closer to tenant systems as defined in [NVo3-fmwk],
it is possible in the end to retain existing VLAN-based
solution only in VSw and to provide L2VPN support starting at
the ToRs. The ToR and DC core elements need to be MPLS enabled
with existing VPLS solutions.
Bitar, et al. Expires April 21, 2014 [Page 24]
Internet-Draft Cloud Networking VPN Applicability October 2013
VPLS represents a mature virtualization and overlay technology
for private LAN services. This is the way it has been deployed
in service provider networks. It also addresses many of the
problems described in Section 5 and in [NVo3-problem-
statement] but still lacks some capabilities to address
others.
Table 1 provides a comparison between the VPLS functional
elements and the NVo3 framework functional elements [NVo3-
fmwk].
Table 1: Functional comparison between VPLS and NVo3 framework
Nvo3 Function Matching VPLS Function
-----------------------------------------------------------
Virtual Access Point (VAP) Attachment Circuit (AC)
Network Virtual Edge (NVE) Provider Edge (PE)
Virtual Network Instance (VNI) Virtual Switching Instance
(VSI)
Virtual Network Context (VN A 20-bit MPLS label
Context) identifier
Overlay Module and tunneling -PWE3 over IP/GRE in an IP
network
-PWE3 and MPLS in an MPLS
network
Control Plane: TBD Control plane:
Service signaling
- PWE3 T-LDP or MP-BGP
Core Routing:
- IGP: OSPF/ISIS -(TE)
Core Signaling:
- RSVP or LDP for MPLS
LSPs
Bitar, et al. Expires April 21, 2014 [Page 25]
Internet-Draft Cloud Networking VPN Applicability October 2013
Depending on the implementation model, VPLS can address some
of the issues described in Section 5 and in [NVo3-problem-
statement], but not all:
-Dynamic Provisioning as described in [NVo3-problem-
statement] Section 2.1: This is not addressed today in
VPLS solutions, as it has not been in scope of that work.
VPLS provisioning today requires management of both VLAN
and L2VPN addressing, and mapping of service profiles.
Per VLAN, per port and per VPLS configurations are
required at the ToR, increasing the time it takes to
bring up service connectivity and complicating the
operational model. However, a mechanism may be developed
to perform such provisioning dynamically as compute
resources are configured. It should be noted that VPLS
currently supports auto-discovery of PEs with instances
of the same VPLS service, as a component of the dynamic
provision of a VPLS service.
-VM Mobility as also defined in [NVo3-problem-statement]
section 2.2: VPLS supports MAC discovery as in any LAN
switch based on MAC learning in the data plane. Thus, as
a VM moves, a VPLS may lean the location of a new MAC
from an ARP message initiated by the VM or by seeing
Ethernet frames from that VM.
-MAC table sizes in Switches as also described in [NVo3-
problem-statement] Section 2.3: As opposed to an 802.1q
based core Ethernet network, tenant VM addresses are only
learned at a VPLS PE with a corresponding service
instance. This is because VPLS is built as an overlay on
a core IP/MPLS network and the core interconnecting the
PEs will have no knowledge of the tenant MACs.
-VLAN limitation as also described in [NV03-proble-
statement] Section 2.7: VPLS enables service instance
scale in a DC as it connects VLAN domains as described
earlier and as the service identifier for a VPLS instance
at a PE is based on a 20-bit MPLS label.
Bitar, et al. Expires April 21, 2014 [Page 26]
Internet-Draft Cloud Networking VPN Applicability October 2013
This model does not solve the potential MAC explosion on VPLS
PEs, depending on how close to the tenant systems the PE
functionality is deployed. The closer to the systems, the
smaller the number of VPLS instances that need to be supported on
a VPLS PE, and the lower should be the MAC scale need.
6.2. E-VPN
Ethernet VPN (E-VPN) [E-VPN] is evolving work in IETF L2VPN WG.
Ethernet VPN provides private LAN service over an IP/MPLS core.
E-VPN was driven by some gaps in the existing VPLS solution, and
by large multi-tenant DC requirements. E-VPN, similar to VPLS, is
provided on a PE where an E-VPN instance (EVI) provides the
virtual LAN bridging service. E-VPN defines types of EVIs
depending on the bridging domains supported in an EVI. As opposed
to VPLS, E-VPN provides for active-active multi-homing of CEs to
different PEs while eliminating loops and traffic duplications.
In addition, it provides for effective load-balancing across the
IP/MPLS core to PEs with access to the same MAC address on
connected CEs. In addition, as opposed to IEEE 802.1q/ad/ah
standards and VPLS where MAC reachability is learned in the data
plane, E-VPNS distributes MAC reachability across the IP/MPLS
core using MP-BGP extensions. Along with MAC address
distribution, E-VPN also distributes the IP address(es)
associated with the MAC, equivalent in IPv4 to ARP entries. In
addition, as opposed to VPLS, and more in synergy with BGP/MPLS
VPNs [RFC4364], E-VPN uses (MP)-BGP extensions to discover and
signal the service MPLS label(s) among PEs across the IP/MPLS
core and does not require a Pseudowire (PW) mesh among PEs per E-
VPN. E-VPN also allows an option for flooding suppression of BUM
traffic.
E-VPN, can be implemented at the same network elements as VPLS
discussed in the previous section. However, with reduced set of
protocols needed, namely PW signaling via T-LDP, and in synergy
with [endsystem], E-VPN could more likely be implemented at an
end-system than VPLS.
Bitar, et al. Expires April 21, 2014 [Page 27]
Internet-Draft Cloud Networking VPN Applicability October 2013
E-VPN represents an evolving virtualization and overlay
technology for private LAN services, albeit capitalizing on the
synergy with mature BGP/MPLS IPVPNs. It also addresses many of
the problems described in Section 5 and in [NVo3-problem-
statement] and some of the VPLS problems, but still lacks some
capabilities to address others.
Table 2 provides a comparison between the E-VPN functional
elements and the NVo3 framework functional elements [NVo3-fmwk].
Table 2: Functional comparison between E-VPN and NVo3
framework
Nvo3 Function Matching E-VPN Function
-----------------------------------------------------------
Virtual Access Point (VAP) Attachment Circuit (AC)
based on VLAN ID
Network Virtual Edge (NVE) PE
Virtual Network Instance (VNI) EVPN Instance (EVI)
Virtual Network Context (VN A 20-bit MPLS label
Context) identifier
Overlay Module and tunneling -MPLS over MPLS tunnels
-MPLS over IP/GRE in an
IP network
Control Plane: TBD Control plane:
- MP-BGP for E-VPN
Core Routing:
- IGP: OSPF/ISIS -(TE)
Core Signaling:
Bitar, et al. Expires April 21, 2014 [Page 28]
Internet-Draft Cloud Networking VPN Applicability October 2013
- RSVP or LDP for MPLS LSPs
Depending on the implementation model, E-VPN can address some
of the issues described in Section 5 and in [NVo3-problem-
statement], but not all:
-Dynamic Provisioning as described in [NVo3-problem-statement]
Section 2.1: This is not addressed today in E-VPN solutions,
as it has not been in scope of that work. E-VPN provisioning
today requires management of VLAN and service profiles. Per
VLAN, per port and per E-VPN configurations are required,
increasing the time it takes to bring up service connectivity
and complicating the operational model. However, a mechanism
may be developed to perform such provisioning dynamically as
compute resources are configured. It should be noted that E-
VPN currently supports auto-discovery of PEs with instances of
the same E-VPN service, as a component of the dynamic
provisioning of an E-VPN service.
-VM Mobility as also defined in [NVo3-problem-statement]
section 2.2: E-VPN supports VM mobility as described in
Section 8.
-MAC-table sizes in Switches as also described in [NVo3-
problem-statement] Section 2.3: As opposed to an 802.1q based
core Ethernet network, tenant VM addresses are only learned at
a E-VPN PE with a corresponding service instance. This is
because E-VPN is built as an overlay on a core IP/MPLS network
and the core interconnecting the PEs will have no knowledge of
the tenant MACs.
-VLAN limitation as also described in [NV03-proble-statement]
Section 2.7: E-VPN enables service instance scale in a DC as
it connects VLAN domains similarly to VPLS and as the service
identifier for an E-VPN instance at a PE is based on a 20-bit
MPLS label.
This model does not solve the potential MAC explosion on E-VPN
PEs, depending on how close to the tenant systems the PE
Bitar, et al. Expires April 21, 2014 [Page 29]
Internet-Draft Cloud Networking VPN Applicability October 2013
functionality is deployed. The closer to the systems, the
smaller the number of VPLS instances that need to be supported on
a VPLS PE, and the lower should be the MAC scale need. E-VPN
could be potentially implemented at an end-system hosting the VMs
to which the E-VPN services are provided.
6.3. PBB and L2VPN toolset
As highlighted in Section 5, the expected large number of VM
MAC addresses in the DC calls out for a VM MAC hiding solution
so that the ToRs and the Core Switches only need to handle a
limited number of MAC addresses.
PBB IEEE 802.1ah encapsulation is a standard L2 technique
developed by IEEE to achieve this goal. It was designed also
to address other limitations of VLAN-based encapsulations
while maintaining the native Ethernet operational model
deployed in the DC network.
A conceptual PBB encapsulation is described in Figure 3 (for
detailed encapsulation see [802.1ah]):
+-------------+
Backbone | BMAC DA,SA |12B
Ethernet |-------------|
Header |BVID optional| 4B
|-------------|
Service ID| PBB I-tag | 6B
|-------------|
Regular |VM MAC DA,SA |
Payload |-------------|
| |
|VM IP Payload|
| |
+-------------+
Figure 3 PBB encapsulation
Bitar, et al. Expires April 21, 2014 [Page 30]
Internet-Draft Cloud Networking VPN Applicability October 2013
The original Ethernet packet used in this example for Inter-VM
communication is encapsulated in the following PBB header:
- I-tag field - organized similarly with the 802.1q VLAN
tag; it includes the Ethertype, PCP and DEI bits and a 24
bit ISID tag which replaces the 12 bit VLAN tag,
extending the number of virtual L2 domain support to 16
Million. It should be noted that the PBB I-Tag includes
also some reserved bits, and most importantly the C-MAC
DA and SA. What is designated as 6 bytes in the figure is
the I-tag information excluding the C-MAC DA and SA.
- An optional Backbone VLAN field (BVLAN) may be used if
grouping of tenant domains is desired.
- An outer Backbone MAC header contains the source and
destination MAC addresses for the related server blades,
assuming the PBB encapsulation is done at the hypervisor
virtual switch on the server blade.
- The total resulting PBB overhead added to the VM-
originated Ethernet frame is 18 or 22 Bytes (depending on
whether the BVID is excluded or not).
- Note that the original PBB encapsulation allows the use
of CVLAN and SVLAN in between the VM MACs and IP Payload.
These fields were removed from Figure 3 since in a VM
environment these fields do not need to be used on the
VSw, their function is relegated to the I-SID tag.
Bitar, et al. Expires April 21, 2014 [Page 31]
Internet-Draft Cloud Networking VPN Applicability October 2013
6.3.1. Addressing VLAN space exhaustion and MAC explosion
In a DC environment, PBB maintains traditional Ethernet
forwarding plane and operational model. For example, a vSw
implementation of PBB can make use of the 24 bit ISID tag
instead of the 12 bit VLAN tag to identify the virtual
bridging domains associated with different VM groups. The vSw
uplink towards the ToR in Figure 1 can still be treated as an
Ethernet backbone interface. A frame originated by a VM can be
encapsulated with the ISID assigned to the VM vSw interface
and with the outer DA and SA MACs associated with the
respective destination and source server blades, and then sent
to the ToR switch. Performing this encapsulation at the vSw
distributes the VM MAC learning to server blades with
instances in the corresponding layer2 domain, and therefore
alleviates this load from ToRs that aggregate multiple server
blades. Alternatively, the PBB encapsulation can be done at
the ToR.
With PBB encapsulation, ToRs and Core SWs do not have to
handle VM MAC addresses so the size of their MAC FDB tables
may decrease by two or more orders of magnitude, depending on
the number of VMs configured in each server blade and the
number of VM virtual interfaces and associated MACs.
The original PBB specification [802.1ah] did not introduce any
new control plane or new forwarding concepts for the PBB core.
Spanning Tree and regular Ethernet switching based on MAC
learning and flooding were maintained to provide a smooth
technology introduction in existing Ethernet networks.
6.3.2. Fast convergence and L2 multi-pathing
Additional specification work for PBB control plane has been
done since then in both IEEE and IETF L2VPN.
Bitar, et al. Expires April 21, 2014 [Page 32]
Internet-Draft Cloud Networking VPN Applicability October 2013
As stated earlier, STP-based layer2 networks underutilize the
available network capacity as links are put in an idle state
to prevent loops. Similarly, existing VPLS technology for
interconnecting Layer2 network-islands over an IP/MPLS core
does not support active-active dual homing scenarios.
IS-IS controlled layer2 networks allow traffic to flow on
multiple parallel paths between any two servers, spreading
traffic among available links on the path. IEEE 802.1aq
Shortest Path Bridging (SPB) [802.1aq] and emerging IEEE
802.1Qbp [802.1Qbp] are PBB control plane technologies that
utilize different methods to compute parallel paths and
forward traffic in order to maximize the utilization of
available links in a DC. In addition, a BGP based solution
[PBB-EVPN] is progressing in the IETF L2VPN WG.
One or both mechanisms may be employed as required. IS-IS
could be used inside the same administrative domain (e.g., a
DC), while BGP may be employed to provide reachability among
interconnected Autonomous Systems. Similar architectural
models have been widely deployed in the Internet and for large
VPN deployments.
IS-IS and/or BGP are also used to advertise Backbone MAC
addresses and to eliminate B-MAC learning and unknown unicast
flooding in the forwarding plane, albeit with tradeoffs. The
B-MAC FIB entries are populated as required from the resulting
IS-IS or BGP RIBs.
Legacy loop avoidance schemes using Spanning Tree and local
Active/Active MC-LAG are no longer required as their function
(layer2 routing) is replaced by the indicated routing
protocols (IS-IS and BGP).
Bitar, et al. Expires April 21, 2014 [Page 33]
Internet-Draft Cloud Networking VPN Applicability October 2013
6.3.3. Per ISID flood containment
Service auto-discovery provided by 802.1aq SPB [802.1aq] and
BGP [PBB-EVPN] is used to distribute ISID related information
among DC nodes, eliminating any provisioning touches
throughout the PBB infrastructure. This implicitly creates
backbone distribution trees that provide per ISID automatic
flood and multicast containment.
6.3.4. Efficient multicast support
IS-IS [802.1aq] and BGP [PBB-EVPN] could be used to build
optimal multicast distribution trees. In addition, PBB and
IP/MPLS tunnel hierarchy may be used to aggregate multiple
customer multicast trees sharing the same nodes by associating
them with the same backbone forwarding tree that may be
represented by a common Group BMAC and optionally a P2MP LSP.
More details will be discussed in a further version of the
draft.
6.3.5. Tunneling options for PBB ELAN: Ethernet, IP and MPLS
A solution for DC ELAN domains based on PBB ISIDs, PBB
encapsulation and IS-IS and/or BGP control plane was
introduced.
IETF L2 VPN specifications [PBB-VPLS] or [PBB-EVPN] enable the
transport of PBB frames using PW over MPLS or simply MPLS, and
implicitly allow the use of MPLS Traffic Engineering and
resiliency toolset to provide for advanced traffic steering
and faster convergence.
Transport over IP/L2TPv3 [RFC 4719] or IP/GRE [RFC 4797] is
also possible as an alternative to MPLS tunneling. Additional
Bitar, et al. Expires April 21, 2014 [Page 34]
Internet-Draft Cloud Networking VPN Applicability October 2013
header optimization for PBB over IP/GRE encapsulated packets
may be feasible. These specifications would allow for ISID
based L2 overlay using a regular IP backbone.
6.3.6. Use Case examples
6.3.6.1. PBBN in DC, L2VPN in DC GW
DC environments based on VLANs and native Ethernet operational
model may want to consider using the native PBB option to
provide L2 multi-tenancy, in effect the DC ELAN from Figure 2.
An example of a network architecture that addresses this
scenario is depicted in Figure 4:
,---------.
,' Inter-DC `.
(L2VPN (PBB-VPLS)
`.or PBB-EVPN),'
`|-------|-'
+--+--+ +-+---+
|PE GW|+-+|PE GW|
.+-----+ +-----+.
.' `-.
.-' `\
,' `.
+ Intra-DC PBBN \
| +
: ;
`\+------+ +------+ +--+----+-'
| ToR |.. | ToR |..| ToR |
+-+--+-+ +-+--+-+ +-+--+--+
.'PBB `. .'PBB `. .'PBB `.
+--+-+ +-+-++ +-++-+ +-+--+
|VSw | :VSw : :VSw : :VSw :
+----+ +----+ +----+ +----+
Figure 4 PBB in DC, PBB-VPLS or PBB-EVPN for DC Interconnect
PBB inside the DC core interoperates seamlessly with VPLS used
for L2 DC-Interconnect to extend ELAN domains across DCs. This
Bitar, et al. Expires April 21, 2014 [Page 35]
Internet-Draft Cloud Networking VPN Applicability October 2013
expansion may be required to address VM Mobility requirements
or to balance the load on DC PE gateways. Note than in PBB-
VPLS case, just one or a handful of infrastructure B-VPLS
instances are required, providing Backbone VLAN equivalent
function.
PBB encapsulation addresses the expansion of the ELAN service
identification space with 16M ISIDs and solves MAC explosion
through VM MAC hiding from the Ethernet core.
PBB SPB [802.1aq] is used for core routing in the ToRs, Core
SWs and PEs. If the DCs that need to be interconnected at L2
are part of the same administrative domain, and scaling is not
an issue, SPB/IS-IS may be extended across the VPLS
infrastructure. If different AS domains are present, better
load balancing is required between the DCs and the WAN, or IS-
IS extension across DCs causes scaling issues, then BGP
extensions described in [PBB-EVPN] must be employed.
The forwarding plane, MAC FIB requirements and the Layer2
operational model in the ToR and Core SW are maintained. The
VSw sends PBB encapsulated frames to the ToR as described in
the previous section. ToRs and Core SWs still perform standard
Ethernet switching using the outer Ethernet header.
From a control plane perspective, VSw uses a default gateway
configuration to send traffic to the ToR, as in regular IP
routing case. VSw BMAC learning on the ToR is done through
either LLDP or VM Discovery Protocol (VDP) described in
[802.1Qbg]. Identical mechanisms may be used for the ISID.
Once this information is learned on the ToR it is
automatically advertised through SPB. If PBB-EVPN is used in
the DC GWs, MultiProtcol (MP)-BGP will be used to advertise
the ISID and BMAC over the WAN as described in [PBB-EVPN].
Bitar, et al. Expires April 21, 2014 [Page 36]
Internet-Draft Cloud Networking VPN Applicability October 2013
6.3.6.2. PBBN in vSw, L2VPN in the ToR
A variation of the use case example from the previous section
is depicted in Figure 5:
,---------.
,' Inter-DC `.
(L2VPN (PBB-VPLS)
`.or PBB-EVPN),'
`|-------|-'
+--+--+ +-+---+
|PE GW|+-+|PE GW|
.+-----+ +-----+.
.' `-.
.-' `\
,' `.
+ Intra-DC L2VPN over \
| IP or MPLS tunneling +
: ;
`\+------+ +------+ +--+----+-'
| ToR |.. | ToR |..| ToR |
+-+--+-+ +-+--+-+ +-+--+--+
.'PBB `. .'PBB `. .'PBB `.
+--+-+ +-+-++ +-++-+ +-+--+
|VSw | :VSw : :VSw : :VSw :
+----+ +----+ +----+ +----+
Figure 5 PBB in VSw, L2VPN at the ToR
The procedures from the previous section are used at the VSw:
PBB encapsulation and Ethernet BVLANs can be used on the VSw
uplink. L2VPN infrastructure is replacing the BVLAN at the ToR
enabling the use of IP (GRE or L2TP) or MPLS tunneling.
L2 networking still has the same control plane choices: IS-IS
[802.1aq] and/or BGP [PBB-EVPN], independently from the
tunneling choice.
Bitar, et al. Expires April 21, 2014 [Page 37]
Internet-Draft Cloud Networking VPN Applicability October 2013
6.3.7. NVo3 applicability
Table 3 provides a comparison between the PBB-VPLS VPN functional
elements and the NVo3 framework functional elements [NVo3-fmwk].
Table 3: Functional comparison between PBB-VPLS and NVo3
framework
Nvo3 Function Matching PBB-VPLS
Function
----------------------------------------------------------
Virtual Access Point (VAP) Attachment Circuit (AC)
based on I-SID
Network Virtual Edge (NVE) PE
Virtual Network Instance (VNI) VSI
Virtual Network Context (VN MPLS-label for PBB-VSI
Context) identifier and I-SID if the PE is
PBB edge
Overlay Module and tunneling -MPLS over MPLS tunnels
-MPLS over IP/GRE in an
IP network
Control Plane: TBD Control plane:
- MP-BGP for auto-
discovery
- PWE3 T-LDP for PW
signaling
Core Routing:
- IGP: OSPF/ISIS -(TE)
Core Signaling:
- RSVP or LDP for MPLS LSPs
Bitar, et al. Expires April 21, 2014 [Page 38]
Internet-Draft Cloud Networking VPN Applicability October 2013
Table 4 provides a comparison between the PBB-EVPN functional
elements and the NVo3 framework functional elements [NVo3-fmwk].
Table 4: Functional comparison between E-VPN and NVo3
framework
Nvo3 Function Matching PBB-EVPN
Function
----------------------------------------------------------
Virtual Access Point (VAP) Attachment Circuit (AC)
based on I-SID
Network Virtual Edge (NVE) PE
Virtual Network Instance (VNI) EVPN Instance (EVI)
Virtual Network Context (VN MPLS label for PBB-EVI
Context) identifier and I-SID if the PE is
PBB edge
Overlay Module and tunneling -MPLS over MPLS tunnels
-MPLS over IP/GRE in an
IP network
Control Plane: TBD Control plane:
- MP-BGP for E-VPN
Core Routing:
- IGP: OSPF/ISIS -(TE)
Core Signaling:
- RSVP or LDP for MPLS LSPs
Bitar, et al. Expires April 21, 2014 [Page 39]
Internet-Draft Cloud Networking VPN Applicability October 2013
Depending on the implementation model, PBB-EVPN and PBB-VPLS
can address some of the issues described in Section 5 and in
[NVo3-problem-statement], but not all:
-Dynamic Provisioning as described in [NVo3-problem-
statement] Section 2.1: This is not addressed today in
PBB and PBB-VPN solutions, as it has not been in scope of
the work for either. However, a mechanism may be
developed to perform such provisioning dynamically as
compute resources are configured. It should be noted that
PBB-VPLS and PBB-EVPN currently support auto-discovery of
PEs with instances of the same VPLS or E-VPN service, as
a component of the dynamic provisioning of a VPLS/E-VPN
service.
-VM Mobility as also defined in [NVo3-problem-statement]
section 2.2: PBB-EVPN and PBB-VPLS support VM MAC
mobility as the 802.1q and VPLS solution do based on MAC
learning in the data plane.
-MAC table sizes in Switches as also described in [NVo3-
problem-statement] Section 2.3: As opposed to an 802.1q-
based core Ethernet network, tenant VM addresses are only
learned at a PBB edge. If the VsW implements PBB edge
functionality and the ToR implements PBB-EVPN or PBB-
VPLS, then the vsW will learn the MAC addresses of other
VMs and devices in the same LAN, but the ToR will also
learn the MAC addresses of Backbone bridges that will be
on the order of number of servers not VMs, conserving MAC
FDB entries on the ToR. This is because there are two
layers of overlay, one at the VsW for PBB, and one at the
ToR for VPLS or E-VPN, on a core IP/MPLS network.
-VLAN limitation as also described in [NV03-proble-
statement] Section 2.7: The number of service instances
that can supported is 16 Millions.
6.3.8. Connectivity to existing VPN sites and Internet
Bitar, et al. Expires April 21, 2014 [Page 40]
Internet-Draft Cloud Networking VPN Applicability October 2013
The main reason for extending the ELAN space beyond the 4K
VLANs is to be able to serve multiple DC tenants whereby the
total number of service domains needed exceeds 4K. Figure 6
represents the logical service view where PBB ELANs are used
inside one or multiple DCs to connect to existing IP VPN
sites. It should be noted that the PE GW should be able to
perform integrated routing in a VPN context and bridging in
VSI context:
Tenant 1 sites connected over IP VPN
,--+-'. ;-`.--.
( PE ) VRFs on PEs . PE )
'-----' '-----'
| |
,-------------------------------.
( IP VPN over IP/MPLS WAN )
`---.'-----------------------`.-'
+--+--+ IP VPN VRF on PE GWs +-+---+
.....|PE GW|...... |PE GW|
DC with PBB | +-----+ | +--+--+
Tenant 1 | |PBB ELAN12 |
view PBB|ELAN11 ......|...... PBB|ELAN13
'':'''''''':' | | '':'''''''':'
,'. ,'. ,+. ,+. ,'. ,'.
(VM )....(VM ) (VM )... (VM ) (VM )....(VM )
`-' `-' `-' `-' `-' `-'
Compute Resources inside DC
Figure 6 Logical Service View with IP VPN
DC ELANs are identified with 24-bit ISIDs instead of VLANs. At
the PE GWs, an IP VPN VRF is configured for every DC tenant.
Each "ISID ELAN" for Tenant 1 is seen as a logical Ethernet
endpoint and is assigned an IP interface on the Tenant 1 VRF.
Tenant 1 enterprise sites are connected to IP VPN PEs
distributed across the WAN. IP VPN instances on PE GWs can be
automatically discovered and connected to the WAN IP VPN using
standard procedures [RFC4364].
Bitar, et al. Expires April 21, 2014 [Page 41]
Internet-Draft Cloud Networking VPN Applicability October 2013
In certain cases, the DC GW PEs are part of the IPVPN service
provider network providing IPVPN services to the enterprise
customers. In other cases, DC PEs are operated and managed by
the DC/cloud provider and interconnect to multiple IPVPN
service providers using inter-AS BGP/MPLS models A, B, or C
[RFC4364]. The same discussion applies to the case of IPSec
VPNs from a PBB ELAN termination perspective.
If tenant sites are connected to the DC using WAN VPLS, the PE
GWs need to implement the BEB function described in the PBB-
VPLS PE model [PBB-VPLS] and the procedures from [PBB-Interop]
to perform the required translation. Figure 7 describes the
VPLS WAN scenario:
Customer sites connected over VPLS
,--+-'. ;-`.--.
( PE ) VPLS on PEs . PE )
'-----' '-----'
| |
,-------------------------------.
( VPLS over IP/MPLS WAN )
`---.'-----------------------`.-'
+--+--+ +-+---+
|PE GW| <-- PBB-VPLS/BEB --> |PE GW|
DC with PBB +--+--+ +--+--+
Tenant 1 | |
view PBB|ELAN11 PBB|ELAN13
'':'''''''':' '':'''''''':'
,'. ,'. ,'. ,'.
(VM ) .. (VM ) (VM ) .. (VM )
`-' `-' `-' `-'
Compute Resources inside DC
Figure 7 Logical Service View with VPLS WAN
One VSI is required at the PE GW for every DC ELAN domain.
Same as in the IP VPN case, DC PE GWs may be fully integrated
as part of the WAN provider network or using Inter-AS/Inter-
provider models A,B or C.
Bitar, et al. Expires April 21, 2014 [Page 42]
Internet-Draft Cloud Networking VPN Applicability October 2013
The VPN connectivity may be provided by one or multiple PE
GWs, depending on capacity need and/or the operational model
used by the DC/cloud operator.
If a VM group is serving Internet connected customers, the
related ISID ELAN will be terminated into a routing context
(global public instance or another VRF) connected to the
Internet. Same as in the IP VPN case, the 24bit ISID will be
represented as a logical Ethernet endpoint on the Internet
routing context and an IP interface will be allocated to it.
Same PE GW may be used to provide both VPN and Internet
connectivity with the routing contexts separated internally
using the IP VPN models.
6.3.9. DC Interconnect
L2 DC interconnect may be required to expand the ELAN domains
for Management, VM Mobility or when a VM Group needs to be
distributed across DCs.
PBB may be used to provide ELAN extension across multiple DCs
as depicted in Figure 8:
,-------------------------------.
( IP/MPLS WAN )
`---.'------------------------`.'
+--+--+ +-+---+
|PE GW| <----- PBB BCB ----> |PE GW|
DC with PBB +--+--+ +--+--+
Tenant 1 | |
view PBB|ELAN11 PBB|ELAN11
'':'''''''':' '':'''''''':'
,'. ,'. ,'. ,'.
(Hvz) .. (Hvz) (Hvz) .. (Hvz)
`-' `-' `-' `-'
Compute Resources inside DC
Figure 8 PBB BCB providing VMotion ELAN
Bitar, et al. Expires April 21, 2014 [Page 43]
Internet-Draft Cloud Networking VPN Applicability October 2013
ELAN11 is expanded across DC to provide interconnect for the
pool of server blades assigned to the same VMotion domain.
This time Hypervisors are connected directly to ELAN11. The PE
GW operates in this case as a PBB Backbone Core Bridge (BCB)
combined with PBB-EVPN capabilities [PBB-EVPN]. The I-SID
ELANs do not require any additional provisioning touches and
do not consume additional MPLS resources on the PE GWs. Per I-
SID auto-discovery and flood containment is provided by IS-
IS/SPB [802.1aq] and BGP [PBB-EVPN].
6.3.10. Interoperating with existing DC VLANs
While green field deployments will definitely benefit from all
the advantages described in the previous sections, in many
other scenarios, existing DC VLAN environments will have to be
gradually migrated to the new architecture. Figure 9 depicts
an example of a possible migration scenario where both PBB and
VLAN technologies are present:
,---------.
,' Inter-DC `.
(L2VPN (PBB-VPLS)
`.or PBB-EVPN),'
`-/------\-'
+---+-+ +-+---+
|PE GW|+-+|PE GW|
.-+-----+ +-----+:-.
.-' `-.
,' `-:.
+ PBBN/SPB DC \
| +
: ;
`-+------+ +------+ +--+----+-'
| ToR |.. | ToR |..| ToR |
+-+--+-+ +-+--+-+ +-+--+--+
.'PBB `. .' `. .'VLAN`.
+--+-+ +-+-++ +-++-+ +-+--+
|VSw | :VSw : :VSw : :VSw :
+----+ +----+ +----+ +----+
Figure 9 DC with PBB and VLANs
Bitar, et al. Expires April 21, 2014 [Page 44]
Internet-Draft Cloud Networking VPN Applicability October 2013
This example assumes that the two VSWs on the right do not
support PBB but the ToRs do. The VSw on the left side are
running PBB while the ones on the right side are still using
VLANs. The left ToR is performing only Ethernet switching
whereas the one on the right is translating from VLANs to
ISIDs and performing PBB encapsulation using the BEB function
[802.1ah] and [PBB-VPLS]. The ToR in the middle is performing
both functions: core Ethernet tunneling for the PBB VSw and
BEB function for the VLAN VSw.
The SPB control plane is still used between the ToRs,
providing the benefits described in the previous section. The
VLAN VSw must use regular multi-homing functions to the ToRs:
for example STP or Multi-chassis-LAG.
DC VLANs may be also present initially on some of the legacy
ToRs or Core SWs. PBB interoperability will be performed as
follows:
-If VLANs are used in the ToRs, PBB BEB function may be
performed by the Core SW(s) where the ToR uplink is
connected.
-If VLANs are used in the Core SW, PBB BEB function may
be performed by the PE GWs where the Core SW uplink is
connected.
It is possible that some DCs may run PBB or PBB-VLAN
combination while others may still be running VLANs. An
example of this interoperability scenario is described in
Figure 10:
,-------------------------------.
Bitar, et al. Expires April 21, 2014 [Page 45]
Internet-Draft Cloud Networking VPN Applicability October 2013
( IP/MPLS WAN )
`------/-----------------\-------'
+--/--+ +--\--+
|PE GW|PBB-VPLS |PE GW|VPLS
.'+-----+-' .'+------+.
/ \ / \
| | | |
| PBB DC | | VLAN DC |
\ / \ /
+---+ +---+ +---+ +---+
|VSw|.|VSw| |VSw|.|VSw|
+---+ +---+ +---+ +---+
Figure 10 Interoperability to a VLAN-based DC
Interoperability with existing VLAN DC is required for DC
interconnect. The PE-GW in the PBB DC or the PE GW in the VLAN
DC must implement PBB-VPLS PE model described in [PBB-VPLS].
This interoperability scenario is addressed in detail in [PBB-
Interop].
Connectivity to existing VPN customer sites (IP VPN, VPLS,
IPSec) or Internet does not require any additional procedures
beyond the ones described in the VPN connectivity section. The
PE GW in the DC VLAN will aggregate DC ELANs through IP
interfaces assigned to VLAN logical endpoints whereas the PE
GW in the PBB DC will assign IP interfaces to ISID logical
endpoints.
If EVPN is used to interconnect the two DCs, PBB-EVPN
functions described in [PBB-EVPN] must be implemented in one
of the PE-GWs.
6.4. TRILL and L2VPN toolset
TRILL and SPB control planes provide similar functions. IS-IS
is the base protocol used in both specifications to provide
Bitar, et al. Expires April 21, 2014 [Page 46]
Internet-Draft Cloud Networking VPN Applicability October 2013
multi-pathing and fast convergence for core networking. [PBB-
EVPN] describes how seamless Inter-DC connectivity can be
provided over an MPLS/IP network for both TRILL [RFC6325] and
SPB [802.1aq]/[802.1Qbp] networks.
The main differences exist in the encapsulation and data plane
forwarding. TRILL encapsulation [RFC6325] was designed
initially for large enterprise and campus networks where 4k
VLANs are sufficient. As a consequence the ELAN space in
[RFC6325] is limited to 4K VLANs; however, this VLAN scale
issue is being addressed in [Fine-Grained].
7. L3VPN applicability to Cloud Networking
This section discusses the role of IP VPN technology in
addressing the L3 Virtualization challenges described in
section 5.
IP VPN technology defined in L3VPN working group may be used
to provide L3 virtualization in support of multi-tenancy in
the DC network as depicted in Figure 11.
,-------------------------------.
( IP VPNs over IP/MPLS WAN )
`----.'------------------------`.'
,--+-'. ;-`.--.
..... VRF1 )...... . VRF2 )
| '-----' | '-----'
| Tenant1 |ELAN12 Tenant1|
|ELAN11 ....|........ |ELAN13
'':'''''''':' | | '':'''''''':'
,'. ,'. ,+. ,+. ,'. ,'.
(VM )....(VM ) (VM )... (VM ) (VM )....(VM )
`-' `-' `-' `-' `-' `-'
Figure 11 Logical Service View with IP VPN
Bitar, et al. Expires April 21, 2014 [Page 47]
Internet-Draft Cloud Networking VPN Applicability October 2013
Tenant 1 might buy Cloud Services in different DC locations
and choose to associate the VMs in 3 different groups, each
mapped to a different ELAN: ELAN11, ELAN12 and ELAN13. L3
interconnect between the ELANs belonging to tenant1 is
provided using a BGP/MPLS IPVPN and associated VRF1 and VRF2,
possibly located in different DCs. Each tenant that requires
L3 virtualization will be allocated a different IP VPN
instance. Using full fledge IP VPN for L3 Virtualization
inside a DC presents the following advantages compared with
existing DC technologies like Virtual Routing:
- Interoperates with existing WAN VPN technology
- Deployment tested, provides a full networking toolset
- Scalable core routing: only one MP-BGP routing instance
is required compared with one per customer/tenant in the
Virtual Routing case
- Service Auto-discovery: automatic discovery and route
distribution between related service instances
- Well defined and deployed Inter-Provider/Inter-AS models
- Supports a variety of VRF-to-VRF tunneling options
accommodating different operational models: MPLS
[RFC4364], IP or GRE [RFC4797]
To provide Cloud services to related customer IP VPN instances
located in the WAN the following connectivity models may be
employed:
- DC IP VPN instance may participate directly in the WAN IP
VPN
- Inter-AS Options A, B or C models may be employed with
applicability to both Intra and Inter-Provider use cases
[RFC4364]
VRF implementation could be done in the endsystem [endsystem]
to facilitate endsystem to endsystem direct communication.
Bitar, et al. Expires April 21, 2014 [Page 48]
Internet-Draft Cloud Networking VPN Applicability October 2013
Table 5 summarizes the comparison between BGP/MPLS IPVPN
functional elements and those of NVo3 [NV03-fmwk].
Table 5: Functional comparison between BGP/MPLS IPVPN and NV03
functional elements
Nvo3 Function Matching BGP/MPLS-IPVPN Fun
-------------------------------------------------------------
Virtual Access Point (VAP) Attachment Circuit (AC)
Network Virtual Edge (NVE) Provider Edge (PE)
Virtual Network Instance (VNI) Virtual Routing and
Forwarding (VRF)
Virtual Network Context (VN A 20-bit MPLS label
Context) identifier
Overlay Module and tunneling -MPLS over MPLS tunnels
-MPLS over IP/GRE in an
IP network
Control Plane: TBD Control plane:
- MP-BGP for VPN
signaling /routing
Core Routing:
- IGP: OSPF/ISIS -(TE)
Core Signaling:
- RSVP or LDP for MPLS LSPs
Depending on the implementation model, BGP/MPLS IPVPN can
address some of the issues described in Section 5 and in
[NVo3-problem-statement], but not all:
Bitar, et al. Expires April 21, 2014 [Page 49]
Internet-Draft Cloud Networking VPN Applicability October 2013
-Dynamic Provisioning as described in [NVo3-problem-
statement] Section 2.1: This is not addressed today in
the BGP/MPLS IPVPN solution, as it was not a requirement
for that solution. However, a mechanism may be developed
to perform such provisioning dynamically as compute
resources are configured. Considerations must be given to
the cases where VMs and the VRF, providing connectivity
to these VMs, are co-located on the same end-system vs.
being on different physical devices. It should be noted
that BGP/MPLS IPVPN currently supports auto-discovery of
PEs with instances of the same IPVPN, as a component of
the dynamic provisioning of and IPVPN service.
-VM Mobility as also defined in [NVo3-problem-statement]
section 2.2: VM mobility is supported in [endsystem] when
the NVE, being a VRF, is co-located with the VM(s) to
which the VRF provides connectivity. However, further
enhancements must provide support for VM mobility in
other cases.
-IP table sizes in edge routers as also described in
[NVo3-problem-statement] Section 2.3: As opposed to an
802.1q based core Ethernet network, tenant VM addresses
are only learned at a PBB edge. If the VsW implements PBB
edge functionality and the ToR implements PBB-EVPN or
PBB-VPLS, then the vsW will learn the MAC addresses of
other VMs and devices in the same LAN, but the ToR will
also learn the MAC addresses of Backbone bridges that
will be on the order of number of servers not VMs,
conserving MAC FDB entries on the ToR. This is because
there are two layers of overlay, one at the VsW for PBB,
and one at the ToR for VPLS or E-VPN, on a core IP/MPLS
network.
-VLAN limitation as also described in [NV03-proble-
statement] Section 2.7: The number of service instances
that can supported is 16 Millions.
8. VM Mobility with E-VPN
8.1. Layer 2 Extension Solution
This document illustrates a solution for the layer 2 extension
Bitar, et al. Expires April 21, 2014 [Page 50]
Internet-Draft Cloud Networking VPN Applicability October 2013
based on E-VPNs [E-VPN]. That is, the L2 sites that contain VMs
of a given L2-based Community User Group (CUG) or Virtual Network
(VN) are interconnected together using E-VPN. Thus, a given E-VPN
corresponds/associated with a single L2-based VN. An L2-based VN
is associated with a single E-VPN Ethernet Tag Identifier.
This section provides a brief overview of how E-VPN is used as
the solution for the "layer 2 extension problem". Details of E-
VPN operations can be found in [E-VPN].
A single L2 site could be as large as the whole network within a
single data center, in which case the Data Center Border Routers
(DCBRs) of that data center, in addition to acting as IP routers
for the L2-based VNs present in the data center, also act as PEs.
In this scenario, E-VPN is used to handle VM migration between
servers in different data centers.
A single L2 site could be as small as a single ToR with the
server connected to it, in which case the ToR acts as a PE. In
this scenario, E-VPN is used to handle VM migration between
servers that are either in the same or in different data centers.
Note that in this scenario this document assumes that DCBRs, in
addition to acting as IP routers for the L2-based VNs present in
their data center, also participate in the E-VPN procedures,
acting as BGP Route Reflectors for the E-VPN routes originated by
the ToRs acting as PEs.
In the case where E-VPN is used to interconnect L2 sites in
different data centers, the network that interconnects DCBRs of
these data centers could provide either (a) only Ethernet or
IP/MPLS connectivity service among these DCBRs, or (b) may offer
the E-VPN service. In the former case DCBRs exchange E-VPN routes
among themselves relying only on the Ethernet or IP/MPLS
connectivity service provided by the network that interconnects
these DCBRs. The network does not directly participate in the
exchange of these E-VPN routes. In the latter case the routers at
the edge of the network maybe either co-located with DCBRs, or
Bitar, et al. Expires April 21, 2014 [Page 51]
Internet-Draft Cloud Networking VPN Applicability October 2013
may establish E-VPN peering with DCBRs. Either way, in this case
the network facilitates exchange of E-VPN routes among DCBRs (as
in this case DCBRs would not need to exchange E-VPN routes
directly with each other).
Please note that for the purpose of solving the layer 2 extension
problem the propagation scope of E-VPN routes for a given L2-
based VN is constrained by the scope of the PEs connected to the
L2 sites that presently contain VMs of that VN. This scope is
controlled by the Route Target of the E-VPN routes. Controlling
propagation scope could be further facilitated by using Route
Target Constrain [RFC4684].
Use of E-VPN ensures that traffic among members of the same L2-
based VN is optimally forwarded, irrespective of whether members
of that VN are within the same or in different data centers. This
follows from the observation that E-VPN inherently enables
(disaggregated) forwarding at the granularity of the MAC address
of the VM.
Optimal forwarding among VMs of a given L2-based VN that are
within the same data center requires propagating VM MAC
addresses, and comes at the cost of disaggregated forwarding
within a given data center. However, such disaggregated
forwarding is not necessary between data centers if a given L2-
based VN spans multiple data centers. For example, when a given
ToR acts as a PE, this ToR has to maintain MAC advertisement
routes only to the VMs within its own data center (and
furthermore, only to the VMs that belong to the L2-based VNs
whose site(s) are connected to that ToR), and then point a
"default" MAC route to one of the DCBRs of that data center. In
this scenario a DCBR of a given data center, when it receives MAC
advertisement routes from DCBR(s) in other data centers, does not
re-advertise these routes to the PEs within its own data center,
but just advertises a single "default" MAC advertisement route to
these PEs.
When a given VM moves to a new L2 site, if in the new site this
Bitar, et al. Expires April 21, 2014 [Page 52]
Internet-Draft Cloud Networking VPN Applicability October 2013
VM is the only VM from its L2-based VN, then the PEs connected to
the new site need to be provisioned with the E-VPN Instances
(EVI) of the E-VPN associated with this L2-based VN. Likewise, if
after the move the old site no longer has any VMs that are in the
same L2 bas the VM that moved, the PEs connected to the old site
need be de-provisioned with the EVI of the E-VPN. Procedures to
accomplish this are outside the scope of this document.
8.2. VM Default Gateway Solutions
Once a VM moves to a new L2 site, solving the VM Default Gateway
problem would require PEs connected to that L2 site to apply IP
forwarding to the inter-L2VN/inter-subnet traffic originated from
that VM. That implies that (a) PEs should be capable of
performing both MAC-based and IP-based forwarding (although IP-
based forwarding functionality could be limited to just
forwarding either based on IP host routes, or based on the IP
default route), and (b) PEs should be able to distinguish between
intra-L2VN/intra-subnet and inter-L2VN/inter-subnet traffic
originated by that VM (in order to apply MAC-based forwarding to
the former and IP-based forwarding to the latter).
As a VM moves to a new L2 site, the default gateway IP address of
the VM may not change. Further, the ARP cache of the VM may not
time out. Thus, the destination MAC address in the inter-
L2VN/inter-subnet traffic originated by that VM would not change
as the VM moves to the new site. Given that, how would PEs
connected to the new L2 site be able to recognize inter-
L2VN/inter-subnet traffic originated by that VM? The following
describes two possible solutions.
Both of the solutions assume that for inter-L2VN/inter-subnet
traffic between a VM and its peers outside of VM's own data
center, one or more DCBRs of that data center act as fully
functional default gateways for that traffic.
8.2.1. VM Default Gateway Solution 1
The first solution relies on the use of an anycast default
gateway IP address and an anycast default gateway MAC address.
Bitar, et al. Expires April 21, 2014 [Page 53]
Internet-Draft Cloud Networking VPN Applicability October 2013
If DCBRs act as PEs for an E-VPN corresponding to a given L2-
based VN, then these anycast addresses are configured on these
DCBRs. Likewise, if ToRs act as PEs, then these anycast addresses
are configured on these ToRs. All VMs of that L2-based VN are
(auto)configured with the (anycast) IP address of the default
gateway. This ensures that a particular DCBR (or ToR), acting as
a PE, can always apply IP forwarding to the packets sent by a VM
to the (anycast) default gateway MAC address. It also ensures
that such DCBR (or ToR) can respond to the ARP Request generated
by a VM for the default gateway (anycast) IP address.
Note that with this approach when originating E-VPN MAC
advertisement routes for the MAC address of the default gateways
of a given L2-based VN, all these routes MUST indicate that this
MAC address belongs to the same Ethernet Segment Identifier
(ESI).
8.2.2. VM Default Gateway Solution 2
The second solution does not require configuration of the anycast
default gateway IP and MAC address on the PEs.
Each DCBR (or each ToR) that acts as a default gateway for a
given L2-based VN advertises in the E-VPN control plane its
default gateway IP and MAC address using the MAC advertisement
route, and indicates that such route is associated with the
default gateway. The MAC advertisement route MUST be advertised
as per procedures in [E-VPN]. The MAC address in such an
advertisement MUST be set to the default gateway MAC address of
the DCBR (or ToR). The IP address in such an advertisement MUST
be set to the default gateway IP address of the DCBR (or ToR). To
indicate that such a route is associated with a default gateway,
the route MUST carry the "Default Gateway" community.
Each PE that receives this route and imports it as per procedures
of[E-VPN] MUST create MAC forwarding state that enables it to
apply IP forwarding to the packets destined to the MAC address
carried in the route. The MES that receives this E-VPN route
follows procedures in Section 12 of [E-VPN] when replying to ARP
Requests that it receives if such Requests are for the IP address
Bitar, et al. Expires April 21, 2014 [Page 54]
Internet-Draft Cloud Networking VPN Applicability October 2013
in the received E-VPN route.
9. Solutions and Considerations for other DC challenges
9.1. Addressing IP/ARP explosion
This section will be updated in the next revision.
9.2. Optimal traffic forwarding
IP networks, built using links-state protocols such as OSPF or
ISIS and BGP provide optimal traffic forwarding through the
use of equal cost multiple path (ECMP) and ECMP traffic load-
balancing, and the use of traffic engineering tools based on
BGP and/or MPLS-TE as applicable. In the Layer2 case, SPB or
TRILL based protocols provide for load-balancing across
parallel paths or equal cost paths between two nodes. Traffic
follows the shortest path. For multicast, data plane
replication at layer2 or layer3 happens in the data plane
albeit with different attributes after multicast trees are
built via a control plane and/or snooping. In the presence of
VM mobility, optimal forwarding relates to avoiding
triangulation and providing for optimum forwarding between any
two VMs. Solutions that provide for routing in presence of VM
mobility are described in [VM-Mobility].
9.3. VM Mobility
IP VPN technology may be used to support DC Interconnect for
different functions like VM Mobility and Cloud Management. A
description of VM Mobility between server blades located in
different IP subnets using extensions to existing MP-BGP and
IP VPN procedure is described in [VM-Mobility]. Support for VM
Bitar, et al. Expires April 21, 2014 [Page 55]
Internet-Draft Cloud Networking VPN Applicability October 2013
mobility is also described in [endystems]. Other solutions can
exist as well. What is needed is a solution that provides for
fast convergence toward the steady state whereby communication
among any two VMs can take place on the shortest path or most
optimum path, transit triangulation time is minimized, traffic
black-holing is avoided, and impact on routing scale for both
IPv4 and IPv6 is controllable or minimized.
9.4. Dynamic provisioning of network services
The need for fast dynamic provisioning of virtual network
services is described in [NVo3-problem-statement] to match the
elasticity and mobility in compute and storage resource
allocation. Such dynamic provisioning was not part of initial
L2VPN or L3VPN work except for some work to provide for
dynamic bandwidth access [VPN-RSVP-TE]. In current L2VPN and
L3VPN targeted deployments, the customer equipment connected
to a VPN PE is static in location. Thus, the logical
forwarding instance on the connected PE (e.g., IPVPN VRF, VSI
or EVI) and the attachment circuit to that instance as well
any routing and forwarding policies within that instance are
provisioned via network management systems upon a service
order at a much larger time scale than needed in this case. In
dynamic data centers, services (e.g., VRF, attachment circuit)
need to be established and torn down at a much smaller time
scale to match the dynamicity of compute resources connected
via these services. Mechanisms to provide for such timely
dynamic provisioning at a scale are needed.
In addition, CEs in traditional L3VPN deployments are routers
able to exchange signaling and routing protocol information
with the PE, providing for the dynamic exchange of routing
information and liveliness check between the CEs and the PEs.
In NVo3, the CE equivalent is a TS that may be a virtualized
or a physical CE with the same capabilities as the traditional
CE in L3VPN deployments. However, in some other cases, VMs
providing compute rather than network services may connect
directly to the NVE providing Lyaer3 forwarding service
(equivalent to a PE). In that case, control plane mechanisms
that enable fast and dynamic connectivity of a VM to an NVE
Bitar, et al. Expires April 21, 2014 [Page 56]
Internet-Draft Cloud Networking VPN Applicability October 2013
and reachability exchange among NVEs providing connectivity
among VMs in the same NV must be provided.
9.5. Considerations for Layer2 and Layer3 VPNS on End-systems
With the advent of computing power on end-systems providing VM
services, and to provide for more efficient communication
among VMs minimizing middle boxes in the path, there is a
strong requirement for enabling Layer2 and Layer3 VN
forwarding on these servers. Layer2 VN forwarding today is
supported via vSW implementations and is often limited to
intra data centers. Evolving proprietary technologies such as
vxlan and provide for L2 service transport over an IP network.
If Layer2 and Layer3 VN forwarding solutions on end-systems
are to leverage existing L2VPN and L3VPN solutions,
considerations should be given to new PE models and
specifically to decoupling of forwarding from control plane
functions across different systems to best utilize compute
resources of end-systems and provide for scale. [endystems] is
one of the solutions being adopted for implementation of
BGP/MPLS VPNs in a DC end-system environment. In that case,
the end-system uses XMPP to exchange labeled IP VPN routes
with a route server that supports MP-BGP labeled IPVPN route
exchange with tradition VPN Route Reflectors and PEs. [sdn-
control] proposes a more generic model for PE functionality
decomposed across forwarding end-systems and control plane
systems that control the forwarding function on these end-
systems and interact with other systems such as other similar
control systems, PEs and Route reflectors. These efforts
targeting new PE models that best fit a scalable multi-tenant
environment may also require extensions of existing protocols
or definition of new ones.
10. Operator Considerations
To be filled in a later version of this document.
Bitar, et al. Expires April 21, 2014 [Page 57]
Internet-Draft Cloud Networking VPN Applicability October 2013
11. Security Considerations
No new security issues are introduced beyond those described
already in the related L2VPN and L3VPN solutions drafts and
RFCs in relation to the VPN technologies themselves when the
deployment model and PE model remain the same. Allowing for
dynamic provisioning of VPN services within a DC must ensure
that tenant network privacy is preserved. In addition, when
provisioning, dynamically or statically, VPN services for a
tenant across domain boundaries, the tenant privacy must be
preserved. Dynamic provisioning must include communication of
a secure channel and ensure that the service is provided to an
authorized tenant and connected to the right tenant service.
In addition, changing the PE model by separating the
forwarding plane and control plane must consider and address
security implications.
12. IANA Considerations
IANA does not need to take any action for this draft.
13. References
13.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC4761] Kompella, K. and Rekhter, Y. (Editors), "Virtual
Private LAN Service (VPLS) Using BGP for Auto-
Discovery and Signaling", RFC 4761, January 2007.
[RFC4762] Lasserre, M. and Kompella, V. (Editors), "Virtual
Private LAN Service (VPLS) Using Label Distribution
Protocol (LDP) Signaling", RFC 4762, January 2007.
[PBB-VPLS] Balus, F. et al. "Extensions to VPLS PE model for
Provider Backbone Bridging", draft-ietf-l2vpn-pbb-
vpls-pe-model-07.txt (work in progress), June 2013.
Bitar, et al. Expires April 21, 2014 [Page 58]
Internet-Draft Cloud Networking VPN Applicability October 2013
[PBB-Interop] Sajassi, A. et al. "VPLS Interoperability with
Provider Backbone Bridging", draft-ietf-l2vpn-pbb-
vpls-interop-05.txt (work in progress), July 2013.
[802.1ah] IEEE 802.1ah "Virtual Bridged Local Area Networks,
Amendment 6: Provider Backbone Bridges", Approved
Standard June 12th, 2008.
[802.1aq] IEEE Draft P802.1aq/D4.3 "Virtual Bridged Local Area
Networks, Amendment: Shortest Path Bridging", Work
in Progress, September 21, 2011.
[RFC6325] Perlman, et al., "Routing Bridges (Rbridges): Base
Protocol Specification", RFC 6325, July 2011.
[RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual
Private Networks (VPNs)", RFC 4364, February 2006.
[RFC4797] Rosen, E. and Y. Rekhter, "Use of Provider Edge to
Provider Edge (PE-PE) Generic Routing encapsulation
(GRE) or IP in BGP/MPLS IP Virtual Private
Networks", RFC 4797, January 2007.
13.2. Informative References
[RFC4026] Andersson, L. et Al., "Provider Provisioned Virtual
Private Network (VPN) Terminology", RFC 4026, May
2005.
[802.1Qbp] IEEE Draft P802.1Qbp/D0.1 "Virtual Bridged Local
Area Networks, Amendment: Equal Cost Multiple Paths
(ECMP)", Work in Progress, October 13, 2011.
[802.1Qbg] IEEE Draft P802.1Qbg/D1.8 "Virtual Bridged Local
Area Networks, Amendment: Edge Virtual Bridging",
Work in Progress, October 17, 2011.
[EVPN] Raggarwa, R. et al. "BGP MPLS based Ethernet VPN",
draft-ietf-l2vpn-evpn-04.txt (work in
progress), July 2013.
[PBB-EVPN] Sajassi, A. et al. "PBB-EVPN", draft-ietf-l2vpn-
pbb-evpn-05.txt (work in progress), July 2013.
Bitar, et al. Expires April 21, 2014 [Page 59]
Internet-Draft Cloud Networking VPN Applicability October 2013
[VM-Mobility] Raggarwa, R. et al. "Data Center Mobility based
on BGP/MPLS, IP Routing and NHRP", draft-raggarwa-
data-center-mobility-05.txt (work in progress), June
2013.
[RFC4719] Aggarwal, R. et al., "Transport of Ethernet over
Layer 2 Tunneling Protocol Version 3 (L2TPv3)", RFC
4719, November 2006.
[MVPN] Rosen, E. and Raggarwa, R. "Multicast in MPLS/BGP IP
VPN", RFC 6513, February 2012.
[ARPproxy] Carl-Mitchell, S. and Quarterman, S., "Using ARP to
implement transparent subnet gateways", RFC 1027,
October 1987.
[MYERS] Myers, A., Ng, E. and Zhang, H., "Rethinking the
Service Model: Scaling Ethernet to a Million Nodes",
http://www.cs.cmu.edu/~acm/papers/myers-
hotnetsIII.pdf.
[Fine-Grained] Eastlake, D. et Al., "RBridges: Fine-Grained
Labeling", draft-eastlake-trill-rbridge-fine-
labeling-02.txt (work in progress), October 2011.
[Nvo3-problem-statement] Narten, T., et al., "Problem
Statement: Overlays for Network Virtualization",
draft-ietf-nvo3-overlay-problem-statement-04.txt
(work in progress), July 2013.
[Nvo3-fmwk] Lasserre, M., et al., "Framework for DC Network
Virtualization", draft-ietf-nvo3-framework-03.txt
(work in progress), July 2013.
[Nvo3-cp-reqts] Kreeger, L., et al., "Network Virtualization
Overlay Control Protocol Requirements", draft-
kreeger-nvo3-overlay-cp-04.txt (work in progress),
June 2013.
[Nvo3-dp-reqts] Bitar, N., Lasserre, M., et al., "NVO3 Data
Plane Requirements", draft-ietf-nvo3-dataplane-
requirements-01.txt (work in progress), July
2013.
Bitar, et al. Expires April 21, 2014 [Page 60]
Internet-Draft Cloud Networking VPN Applicability October 2013
[endsystem] Marques, P., wt al., "BGP-signaled end-system
IP/VPNs", draft-ietf-l3vpn-end-system-01.txt (work in
progress), April 2013.
14. Acknowledgments
In addition to the authors the following people have
contributed to this document:
Javier Benitez, Colt
Dimitrios Stiliadis, Alcatel-Lucent
Samer Salam, Cisco
Yakov Rekhter, Juniper
Authors' Addresses
Nabil Bitar
Verizon
40 Sylvan Road
Waltham, MA 02145
Email: nabil.bitar@verizon.com
Florin Balus
Alcatel-Lucent
777 E. Middlefield Road
Mountain View, CA, USA 94043
Email: florin.balus@alcatel-lucent.com
Marc Lasserre
Alcatel-Lucent
Email: marc.lasserre@alcatel-lucent.com
Bitar, et al. Expires April 21, 2014 [Page 61]
Internet-Draft Cloud Networking VPN Applicability October 2013
Wim Henderickx
Alcatel-Lucent
Email: wim.henderickx@alcatel-lucent.com
Ali Sajassi
Cisco
170 West Tasman Drive
San Jose, CA 95134, USA
Email: sajassi@cisco.com
Luyuan Fang
Cisco
111 Wood Avenue South
Iselin, NJ 08830
Email: lufang@cisco.com
Yuichi Ikejiri
NTT Communications
1-1-6, Uchisaiwai-cho, Chiyoda-ku
Tokyo, 100-8019 Japan
Email: y.ikejiri@ntt.com
Mircea Pisica
BT
Telecomlaan 9
Brussels 1831, Belgium
Email: mircea.pisica@bt.com
John E. Drake
Juniper Networks
Email: jnadeau@juniper.net
Lucy Yong
Huawei Technologies (USA)
5340 Legacy Drive
Plano, TX75025
Email: lucy.yong@huawei.com
Susan Hares
ADARA
Email: shares@ndzh.com
Bitar, et al. Expires April 21, 2014 [Page 62]