Internet DRAFT - draft-dunbar-nvo3-overlay-mobility-issues
draft-dunbar-nvo3-overlay-mobility-issues
NVo3 L. Dunbar
Internet Draft Huawei
Intended status: Informational June 28, 2012
Expires: December 2012
Issues of Mobility in DC Overlay network
draft-dunbar-nvo3-overlay-mobility-issues-00.txt
Abstract
This draft describes the issues introduced by VM mobility in Data
center overlay network.
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on December 28, 2011.
Copyright Notice
Copyright (c) 2012 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the BSD License.
Expires December 28, 2012 [Page 1]
Internet-Draft Mobility Issues in Overlay Nov 1, 2011
Table of Contents
1. Introduction ................................................ 2
2. Terminology ................................................. 3
3. Issues associated with Multicast in Overlay Network........... 3
4. Issues associated with more than 4k Tenant Separation......... 4
4.1. Collision of local VLAN Identifiers when VMs Move........ 7
4.1.1. Local VIDs Managed by External Controller.......... 10
4.1.2. Local VIDs Managed by NVE ......................... 11
4.2. Tenant Virtual Network separation at the physical gateway
routers .................................................... 11
5. Summary and Recommendations................................. 12
6. Manageability Considerations................................ 13
7. Security Considerations..................................... 13
8. IANA Considerations ........................................ 13
9. Acknowledgments ............................................ 13
10. References ................................................ 13
Authors' Addresses ............................................ 14
Intellectual Property Statement................................ 14
Disclaimer of Validity ........................................ 14
1. Introduction
Overlay networks, such as VxLAN, NvGRE, etc, have been proposed to
scale networks in Data Center with massive number of hosts as the
result of server virtualization and business demand.
Overlay network can hide the massive number of VMs' addresses from
the switches/routers in the core (i.e. underlay network).
One of the key requirements stated in [NVo3-problem] is the ability
of moving VMs across wider range of locations, which could be
Dunbar December 28, 2012 [Page 2]
Internet-Draft Mobility Issues in Overlay Nov 1, 2011
multiple server racks, PODs, or locations, without changing VM's
IP/MAC addresses. That means the association of VMs to their
corresponding NVE is changing as VMs migrate. This dynamic nature of
VM mobility in Data Center introduces new challenges and
complications to overlay networks.
This draft describes some of the issues introduced by VM migration in
overlay environment. The purpose of the draft is to ensure those
issues will be addressed by future solutions.
2. Terminology
CE: VPN Customer Edge Device
DC: Data Center
DA: Destination Address
EOR: End of Row switches in data center.
VNID: Virtual Network Identifier
NVE: Network Virtualization Edge
PE: VPN Provider Edge Device
SA: Source Address
ToR: Top of Rack Switch. It is also known as access switch.
VM: Virtual Machines
VPLS: Virtual Private LAN Service
3. Issues associated with Multicast in Overlay Network
Some data centers avoid the use of IP Multicast due, primarily, to
the perceptions of configuration/protocol complexity and multicast
scaling limits. There are also many data center operators for whom
multicast is critical. Among the latter group, multicast is used for
Internet Television (IPTV), market data, cluster load balancing,
gaming, just to name a few.
The use of multicast in overlay environment can impose some issues to
network when VMs move, in particular:
Dunbar December 28, 2012 [Page 3]
Internet-Draft Mobility Issues in Overlay Nov 1, 2011
The association between multicast members to NVE becomes
dynamic as VMs move. At one moment, all members of a multicast
group could be attached to one NVE. At another moment, some
members of the multicast group could be attached to different
NVEs. Among VMs attached to one NVE, some can send, while
others can only receive.
In addition, Overlay, which hides the VM addresses, introduces the
IGMP snooping issue in the core. With NVE adding outer header to
data frames from VMs (i.e. applications), multicast addresses are
hidden from the underlay networks, making switches in the underlay
network not being able to snoop on the IGMP reports from multicast
members.
For unicast data frames, overlay network edge (e.g. TRILL edge) can
learn the inner-outer address mapping by observing data frames
passing by. Since multicast address is not placed in the inner-
header's SA field of data frame, the learning approach for unicast
won't work for multicast in overlay.
TRILL solves the multicast inner-outer address learning issues by
creating common multicast trees in the TRILL domain. If TRILL's
multicast approach is used for DC with VM mobility, the multicast
states maintained by switches/routers in the underlay network have
to change as VMs move, which means switches in the underlay network
have to be aware of VMs mobility and change multicast states
accordingly.
Overall, the VM mobility in overlay environment make multicast more
complicated for switches/routers in the underlay network and for
NVEs.
4. Issues associated with more than 4k Tenant Separation
The [NVo3-framework] has a good figure showing the logical network
seen by each tenant. There are L2 domains being connected by L3
infrastructure. Each tenant can have multiple virtual networks, which
are identified IEEE802.1Q compliant 12 bits VLAN ID, under its
logical routers (Rtr). Any VMs communicating with peers in different
subnets, either within DC or outside DC, will have their L2 MAC
address destined towards its local Router (Rtr in the figure below).
Dunbar December 28, 2012 [Page 4]
Internet-Draft Mobility Issues in Overlay Nov 1, 2011
+----- L3 Infrastructure ----+
| |
,--+-'. ;--+--.
..... Rtr1 )...... . Rtr2 )
| '-----' | '-----'
| Tenant1 |LAN12 Tenant1|
|LAN11 ....|........ |LAN13
'':'''''''':' | | '':'''''''':'
,'. ,'. ,+. ,+. ,'. ,'.
(VM ) .. (VM ) (VM ) .. (VM ) (VM ) .. (VM )
`-' `-' `-' `-' `-' `-'
Figure 1: Logical Service Connectivity for a single tenant
The overlay introduced by [NVo3-problem] makes the core (i.e. the
underlay network) switches/routers forwarding tables not be impacted
when VMs belonging to different tenants are placed or moved to
anywhere, as shown in the Figure below (copied from [NVo3-
framework]).
Dunbar December 28, 2012 [Page 5]
Internet-Draft Mobility Issues in Overlay Nov 1, 2011
+--------+ +--------+----+
| Tenant | | TES: |VM1 |
| End +--+ +---| Blade |VM2 |
| System | | | | server |.. |
+--------+ | ................... | +--------+----+
| +-+--+ +--+-+ |----VM-a
| | NVE| |NVE | |----VM
+--| #1 | |#2 |--+----VM
+-+--+ +--+-+
/ . L3 Overlay . \
+--------+ / . Network . \ +--------+
| Tenant +--+ . . +----| Tenant |
| End | . . | End |
| System | . +----+ . | System |
+--------+ .....|NVE |........ +--------+
|#3 |
+----+
|
|
+--------+
| Tenant |--VM-b
| End |--VM
| System |
+--------+
Figure 2: Overlay example
For client traffic "VM-a" to "VM-b", the ingress NVE encapsulates the
client payload with an outer header which includes at least egress
NVE as DA, ingress NVE as SA, and a VNID. The VNID is a 24-bits
identifier proposed by [NVo3-Problem] to separate tens of thousands
of tenant virtual networks. When the egress NVE receives the data
frame from its ports facing the underlay network, the egress NVE
decapsulates the outer header and then forward the decapsulated data
frame to the attached VMs.
When "VM-b" is on the same subnet (or VLAN) as "VM-a" and located
within the same data center, the corresponding egress NVE is usually
on a virtual switch in a server, on a ToR switch, or on a blade
switch.
When "VM-b" is on a different subnet (or VLAN), the corresponding
egress NVE should be next to (or located on) the logical Rtr (Figure
1), which is most likely located on the data center gateway
router(s).
Dunbar December 28, 2012 [Page 6]
Internet-Draft Mobility Issues in Overlay Nov 1, 2011
4.1. Collision of local VLAN Identifiers when VMs Move
Since the VMs attached to one NVE could belong to different virtual
networks, the traffic under each NVE have to be identified by local
network identifiers, which is usually VLAN if VMs are attached to NVE
access ports via L2.
To support tens of thousands of virtual networks, the local VID
associated with client payload under each NVE has to be locally
significant. If ingress NVE simply encapsulates an outer header to
data frames received from VMs and forward the encapsulated data
frames to egress NVE via underlay network, the egress NVE can't
simply decapsulate the outer header and send the decapsulated data
frames to attached VMs as done by TRILL. Egress NVE needs to convert
the VID carried in the data frame to a local VID for the virtual
network before forwarding the data frame to the VMs attached.
In VPLS, operator has to configure the local VIDs under each PE to
specific VPN instances. In VPLS, the local VID mapping to VPN
instance ID doesn't change very much. In addition, most likely CE is
not shared by multiple tenants, so the VIDs on one physical port of
PE to CE are only for one tenant. For rare occasion of multiple
tenants sharing one CE, the CE can convert the tuple [local customer
VIDs & Tenant Access Port] to the VID designated by VPN operator for
each VPN instance on the shared link between CE port and PE port. For
example, in the figure below, the VIDs under CE#21 and the VIDs under
CE#22 can be duplicated as long as the CEs can convert the local VIDs
from their downstream links to the VIDs given by the VPN operators
for the links between PE and CEs.
Dunbar December 28, 2012 [Page 7]
Internet-Draft Mobility Issues in Overlay Nov 1, 2011
+--------+ +--------+
| CE | | CE |-> local VIDs
| #11 +--+ +---| |
| | | | | #21 |
+--------+ | ................... | +--------+
| +-+--+ +--+-+ |
| | PE | | PE | |<-VIDs configured by VPN operator
+--| 1 | | 2 |--+
+-+--+ +--+-+
/ . VPLS . \
+--------+ / . Network . \ +--------+
| CE +--+ . . +----| CE |-> Local VIDs
| #12 | . . | #22 |
| | . +----+ . | |
+--------+ .....| PE |........ +--------+
| 3 |
+----+
|
|
+--------+
| CE |
| #31 |
| |
+--------+
Figure 3: VPLS example
When all VMs under one virtual network are moved away from a NVE, the
local VID, which was designated for this virtual network, might need
to be used for different virtual network whose VMs are moved in
later.
In the Figure below, the NVE#1 may have local VID #100~#200 assigned
to some virtual networks attached. The NVE#2 may have local VID
#100~#150 assigned to different virtual networks. With VNID encoded
in the outer header of data frames, the traffic in the L3 Overlay
Network is strictly separated.
Dunbar December 28, 2012 [Page 8]
Internet-Draft Mobility Issues in Overlay Nov 1, 2011
+--------+ +--------+
| Tenant | | TES: |
| End +--+ +---| Blade |
| System | | | | server |
+--------+ | ................... | +--------+
| +-+--+ +--+-+ |
| | NVE| |NVE | |
+--| #1 | |#2 |--+
+-+--+ +--+-+ <-local VID to global VNID mapping
/ . L3 Overlay . \ becomes dynamic
+--------+ / . Network . \ +--------+
| Tenant +--+ . . +----| Tenant |
| End | . . | End |
| System | . +----+ . | System |
+--------+ .....|NVE |........ +--------+
|#3 | <-May not aware of VMs added/removed
+----+
|
|
+--------+
| Tenant |
| End |
| System |
+--------+
Figure 4: Overlay example
When some VMs associated with Virtual Network X using VID 120 under
NVE1 are moved to NVE2, a new VID must be assigned for the Virtual
Network X under NVE2.
It gets complicated when the local VIDs are tagged by none-NVE
devices, e.g. VMs themselves, blade server switches, or virtual
switches within servers.
The devices which add VID to untagged frames need to be informed of
the local VID. If data frames from VMs already have VID encoded in
data frames, then there has to be a mechanism to notify the first
switch port facing the VMs to convert the VID encoded by the VMs to
the local VID which is assigned for the virtual network under the new
NVE. That means when a VM is moved to a new location, its immediate
adjacent switch port has be informed of local VID to convert the VID
encoded in the data frames from the VM.
NVE will need the mapping between local VID and the VNID to be used
to face L3 underlay network.
Dunbar December 28, 2012 [Page 9]
Internet-Draft Mobility Issues in Overlay Nov 1, 2011
4.1.1. Local VIDs Managed by External Controller
Most likely the VM assignment to a physical location is managed by a
non-networking entity, e.g. VM Manager or a Server Manager. NVEs may
not be aware of VMs being added or deleted unless NVEs have a north
bound interface to a controller which can communicate with VM/server
Manager(s).
When NVE can be informed of VMs being added/deleted and their
associated tenant virtual networks via its controller, NVE should be
able to get the specific VNID from its controller for untagged data
frames arriving at its Virtual Access Points [VNo3-framework 3.1.1].
Since local VIDs under each NVE are really locally significant, it
might be less confusing to egress NVE if ingress NVE remove the local
VID attached to the data frame. So that egress NVE always has to
assign its own local VID to data frame before sending the
decapsulated data frame to attached VMs.
If, for whatever reason, it is necessary to have local VID in the
data frames before encapsulating outer header of EgressNVE-DA/
IngressNVE-SA /VNID, NVE should get the specific local VID from the
external Controller for those untagged data frames coming to each
Virtual Access Point.
If the data frame is tagged before reaching the NVE's Virtual Access
Point (e.g. tagged data frames from VMs) and NVE is more than one hop
away from VMs, the first (virtual) port facing the VMs has be
informed by the external controller of the new local VID to replace
the VID encoded in the data frames. For reverse direction, i.e. data
frames coming from core towards VMs, the first switching port facing
VMs have to convert the VIDs encoded in the data frames to the VIDs
used by VMs.
The IEEE802.1Qbg's VDP protocol (Virtual Station Interface (VSI)
discovery and configuration protocol) requires hypervisor to send VM
profile upon a new VM is instantiated. However, not all hypervisors
support this function.
Dunbar December 28, 2012 [Page 10]
Internet-Draft Mobility Issues in Overlay Nov 1, 2011
4.1.2. Local VIDs Managed by NVE
If NVEs don't have interface to any controllers which can be informed
of VMs being added to or deleted from NVEs, then NVEs have to learn
new VMs/VLANs being attached, figure out to which tenant virtual
network those VMs/VLANs belong, and/or age out VMs/VLANs after a
specified timer expires. Network management system has to assist NVEs
in making the decision, even if the network management system doesn't
have interface to VM/server managers.
When NVE receives a data frame with a new VM address (e.g. MAC) in a
tagged data frame from its Virtual Access Point, the new VM could be
from an existing local virtual network, from a different virtual
network (being brought in as the VM being added in), or from an
illegal VM.
Upon NVE learns a new VM being added, either by learning a new MAC
address or a new VID, it needs its management system to confirm the
validity of the new VID and/or new address. If the new address or VID
is from invalid or illegal source, the data frame has to be dropped.
4.2. Tenant Virtual Network separation at the physical gateway routers
When a VM communicates with peers in a different subnets, data frames
will be sent to the tenant logical Router (Rtr1 or Rtr2 in the Figure
1). Very often, the logical routers of all tenants in a data center
are just logical entities (e.g. VRF) on the gateway router(s). That
means that all the VLANs for all tenants will be terminated at the
Data Center Gateway router(s), as shown in the figure below.
,---------.
,' `.
( IP/MPLS WAN )
`. ,'
`-+------+'
+--+--+ +-+---+
|DC GW|+-+|DC GW|
+-+---+ +-----+
/ \ <---- All VLANs of all tenants
/ \
+-------+ +------+
+/------+ | +/-----+ |
| Aggr11| + ----- |AggrN1| + Aggregation
+---+---+/ +------+/
/ \ / \
Dunbar December 28, 2012 [Page 11]
Internet-Draft Mobility Issues in Overlay Nov 1, 2011
/ \ / \
+---+ +---+ +---+ +---+
|T11|... |T1x| |T21| .. |T2y| Access Layer
+---+ +---+ +---+ +---+
| | | |
+-|-+ +-|-+ +-|-+ +-|-+
| |... | | | | .. | |
+---+ +---+ +---+ +---+ Server racks
| |... | | | | .. | |
+---+ +---+ +---+ +---+
| |... | | | | .. | |
+---+ +---+ +---+ +---+
Figure 5: Data Center Physical topology
Gateway routers can mitigate the overwhelming number of virtual
network instances by integrating NVE function within the router(s).
That requires routers to map VNID to VRF directly if routers'
outbound to external network is VPN based. That requires routers to
support tens of thousands of VRF instances, which can be challenging
to routers.
Data center can also use multiple gateway routers, with each handling
a subset of tenants in data centers. That means that each tenant's
VMs are only reachable by their designated routers or router ports.
With the typical DC design shown in Figure 5, the number of server
racks reachable by each gateway router is limited by the number of
router ports enabled for the tenant virtual networks. That means the
range of locations where each tenant's VMs can be moved across are
limited.
When VMs in data center communicates with external peers, data frames
have to go through gateway. Even though majority of data centers have
much more east west traffic volume than north south traffic volume,
majority (as high as 90%) of applications (hosted on servers or VMs)
in a data center still communicate with external peers. Just the
volume of north south traffic is much less in many data centers.
5. Summary and Recommendations
Overlay network can hide individual VMs addresses, making
switches/routers in the core scalable. However overlay introduces
other challenges, especially when VMs move across wide range of NVEs.
This draft is to identify those issues introduced by mobility in
Dunbar December 28, 2012 [Page 12]
Internet-Draft Mobility Issues in Overlay Nov 1, 2011
overlay environment, to ensure that they will be addressed by future
solutions.
6. Manageability Considerations
7. Security Considerations
Security will be addressed in a separate document.
8. IANA Considerations
None.
9. Acknowledgments
We want to acknowledge the following people for their valuable
comments to this draft: David Black, Ben MackCrane, Peter
AshwoodSmith, Lucy Yong and Young Lee.
This document was prepared using 2-Word-v2.0.template.dot.
10. References
[NVo3-Problem] Narten, et al, "Problem Statement: Overlays for
Network Virtualization." Draft-narten-nvo3-overlay-problem-
statement-02, June 2012.
[NVo3-framework] Lasserre, et al, "Framework for DC Network
Virtualization". Draft-lasserre-nvo3-framework-02, June
2012
[IEEE802.1Qbg] "MAC Bridges and Virtual Bridged Local Area Networks -
Edge Virtual Switch". IEEE802.1Qbg/D2.2, Feb, 2012. Work
in progress
[ARMD-Problem] Narten,et al "draft-ietf-armd-problem-statement" in
progress, Oct 2011.
[ARMD-Multicast] McBride, Lui, "draft-mcbride-armd-mcast-overview-
01", in progress, March 10, 2012
[Gratuitous ARP] S. Cheshire, "IPv4 Address Conflict Detection", RFC
5227, July 2008.
Dunbar December 28, 2012 [Page 13]
Internet-Draft Mobility Issues in Overlay Nov 1, 2011
Authors' Addresses
Linda Dunbar
Huawei Technologies
5340 Legacy Drive, Suite 175
Plano, TX 75024, USA
Phone: (469) 277 5840
Email: ldunbar@huawei.com
Intellectual Property Statement
The IETF Trust takes no position regarding the validity or scope of
any Intellectual Property Rights or other rights that might be
claimed to pertain to the implementation or use of the technology
described in any IETF Document or the extent to which any license
under such rights might or might not be available; nor does it
represent that it has made any independent effort to identify any
such rights.
Copies of Intellectual Property disclosures made to the IETF
Secretariat and any assurances of licenses to be made available, or
the result of an attempt made to obtain a general license or
permission for the use of such proprietary rights by implementers or
users of this specification can be obtained from the IETF on-line IPR
repository at http://www.ietf.org/ipr
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
any standard or specification contained in an IETF Document. Please
address the information to the IETF at ietf-ipr@ietf.org.
Disclaimer of Validity
All IETF Documents and the information contained therein are provided
on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE
IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL
WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY
WARRANTY THAT THE USE OF THE INFORMATION THEREIN WILL NOT INFRINGE
Dunbar December 28, 2012 [Page 14]
Internet-Draft Mobility Issues in Overlay Nov 1, 2011
ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS
FOR A PARTICULAR PURPOSE.
Acknowledgment
Funding for the RFC Editor function is currently provided by the
Internet Society.
Dunbar December 28, 2012 [Page 15]