Internet DRAFT - draft-chen-svdc
draft-chen-svdc
INTERNET-DRAFT Congjie Chen
Intended Status: Standards Track Dan Li
Expires: Feb 2016 Tsinghua University
Jun Li
University of Oregon
August 2015
SVDC: Software Defined Data Center Network Virtualization Architecture
draft-chen-svdc-00
Abstract
This document describes SVDC, a highly-scalable and low-overhead
virtualization architecture designed for large layer-2 data center
networks. By leveraging the emerging software defined network
framework, SVDC decouples the global identifier of a virtual network
from the identifier carried in the packet header. Hence, SVDC can
scale to a large scale of virtual networks with a very short tag in
the packet header, which is never achieved by previous network
virtualization solutions. SVDC enhances MAC-in-MAC encapsulation in a
way that packets with overlapped MAC addresses are correctly
forwarded even without in-packet global identifiers to differentiate
the virtual networks they belong to. Besides, scalable and efficient
layer-2 multicast and broadcast within virtual networks are also
supported in SVDC. This document also introduces a basic framework to
illustrate SVDC deployment.
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as
Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/1id-abstracts.html
Chen, et al. Expires Feb 2016 [Page 1]
INTERNET DRAFT SVDC August 2015
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
Copyright and License Notice
Copyright (c) 2015 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
This Internet-Draft will expire on January, 2016.
Table of Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 4
2. SVDC Architecture . . . . . . . . . . . . . . . . . . . . . . 5
2.1 Virtual Switch . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Edge switches . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 SVDC Controller . . . . . . . . . . . . . . . . . . . . . . 10
3. Packet Forwarding . . . . . . . . . . . . . . . . . . . . . . 11
3.1 Unicast Traffic . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Multicast/Broadcast Traffic . . . . . . . . . . . . . . . . 12
3.3 SVDC Frame Format . . . . . . . . . . . . . . . . . . . . . 13
4. SVDC Deployment Considerations . . . . . . . . . . . . . . . . 14
4.1 VM Migration . . . . . . . . . . . . . . . . . . . . . . . 14
4.2 Fault Tolerance . . . . . . . . . . . . . . . . . . . . . . 15
5 Security Considerations . . . . . . . . . . . . . . . . . . . . 15
6 IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 16
7 References . . . . . . . . . . . . . . . . . . . . . . . . . . 16
7.1 Normative References . . . . . . . . . . . . . . . . . . . 16
7.2 Informative References . . . . . . . . . . . . . . . . . . 16
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 17
Chen, et al. Expires Feb 2016 [Page 2]
INTERNET DRAFT SVDC August 2015
1 Introduction
Due to the simplicity and easiness to manage, large layer-2 network
is widely accepted as the fabric to build a data center network.
Scalable layer-2 architectures, for example, TRILL [RFC6325] and SPB
[802.1aq] are proposed as industry standards. A large layer-2 network
segment can even cross the Internet via virtualization services such
as VPLS [RFC4762]. However, this kind of layer-2 network fabric
design mainly focus on routing/forwarding rules in the network, and
it is still an open issue how to run a multi-tenant network
virtualization scheme on top of the large layer-2 network fabrics.
Existing network virtualization solutions, including VLAN [802.1q],
VXLAN [RFC7348] and [NVGRE] either face severe scalability problem or
are not specifically designed for layer-2 networks. Particularly,
designing a virtualization solution for large layer-2 network needs
to address following challenges.
For a large-scale, geographically distributed layer-2 network
operated by a cloud provider, the potential number of tenants and
virtual networks can be huge. Network virtualization based on VLAN
can support at most 4094 virtual networks, which is obviously not
enough. Although VXLAN [RFC7348] and [NVGRE] can support 16,777,216
virtual networks, they are at the cost of using much more bits in the
packet header. The fundamental issue is, in existing network
virtualization proposals, the number of virtual networks that can be
differentiated depends on the number of bits used in the packet
header.
Given the possible overlapped MAC addresses for VMs in different
virtual networks and the limited forwarding table size in data center
switches, it is inevitable to encapsulate the original MAC address of
a packet when transmitting it in the core network. MAC-in-UDP
encapsulation used in VXLAN [RFC7348] incur unnecessary packet header
overhead for a layer-2 network. MAC-in-MAC encapsulation framework is
more applicable in the multi-tenant large layer-2 network where MAC
addresses of VMs largely overlap.
Multicast service is common in data center networks, but how to
support scalable multicast service in a multi-tenant virtualized
large layer-2 network is still open. A desired capability with a
layer-2 network virtualization framework is to support efficient and
scalable layer-2 multicast as well as broadcast.
This document describes SVDC, which leverages the framework of [SDN]
to address the challenges above, and achieves the goal of a high
scalability and low overhead large layer-2 network virtualization
architecture. It decouples the global identifier of a virtual network
and the in-packet tag to encompass a great scale of virtual networks
Chen, et al. Expires Feb 2016 [Page 3]
INTERNET DRAFT SVDC August 2015
with a minimal tag length in the packet header. The global identifer
is maintained in the SVDC controller while the in-packet identifier
is only used to differentiate virtual networks residing in the same
server. To mask the VM MAC address overlap in the core network, SVDC
uses MAC-in-MAC encapsulation in ingress edge switches and employs
two techniques to guarantee correct packet forwarding in the first
hop and last hop without in-packet global virtual network identifier.
What's more, SVDC can efficiently support up to tens of billions of
multicast and broadcast groups with possible overlapping multicast or
broadcast addresses in different virtual networks in a layer-2
network by the same framework as in unicast.
1.1 Terminology
This document uses the following terminology.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
Virtual Network (VN): A VN is a logical abstraction of a physical
network that provides L2 network services to a set of Tenant Systems.
Virtual Machine (VM): It is an instance of OS's running on top of
hypervisor over a physical machine or server. Multiple VMs can share
the same physical server via the hypervisor, yet are completely
isolated from each other in terms of compute, storage, and other OS
resources.
Virtual Switch (vSwitch): A function within a hypervisor (typically
implemented in software) that provides similar forwarding services to
a physical Ethernet switch. A vSwitch forwards Ethernet frames
between VMs running on the same server or between a VM and a physical
Network Interface Card (NIC) connecting the server to a physical
Ethernet switch or router. A vSwitch also enforces network isolation
between VMs that by policy are not permitted to communicate with each
other. (e.g., by honoring VLANs).
Global Tenant Network Identifier (GTID): A GTID is a global
identifier of a virtual network. It is never carried in packets that
VMs send out but maintained in the SVDC controller.
Local Tenant Network Identifier (LTID): A LTID is a local identifier
that is used to differentiate virtual networks on the same server.
For the same virtual network, its LTID in different servers can
either be different or the same. When a new virtual network is
created, it will be assigned a LTID in each server that hosts its
VMs.
Chen, et al. Expires Feb 2016 [Page 4]
INTERNET DRAFT SVDC August 2015
Global Identifier of a Multicast/Broadcast Group (Group-G): It is
used to denote the address of a multicast/broadcast group that can be
used in the physical network in the SVDC architecture. When a new
multicast/broadcast group wants to send traffic across the core
network, an available Group-G will be assigned to it. When all the
receivers of a group leave a multicast group, or a broadcast group
lacks of activity for a long duration, the corresponding Group-G will
be removed.
Local Identifier of a Multicast/Broadcast Group (Group-L): It is used
to denote the address of a multicast/broadcast group within a virtual
network. Group-L in different virtual networks can be overlapped.
Edge Switch Identifier (EID): It is used to denote the identifier of
an edge switch. Any identifier of a switch such as the MAC address of
a switch can be represented as it.
Server Identifier (SID): It is used to denote the identifier of a
physical server just like EID.
Virtual Machine MAC Address (VMAC): This is the MAC address assigned
to the virtual NIC of each VM. It is visible to VMs and applications
running within VMs.
Egress Port Identifier (p-ID): It denotes the outgoing port to which
the egress edge switch should forward the packet.
2. SVDC Architecture
The basic architecture of SVDC is depicted in Figure 1.
Chen, et al. Expires Feb 2016 [Page 5]
INTERNET DRAFT SVDC August 2015
+--------------------+ +--------------------+
| Server 1 | | Server 2 |
| +----------+ | | +-----------+ |
| | VN 1 | | | | VN 2 | |
| | +-------+| | | | +-------+ | |
| | | VM 1 || | | | | VM 2 | | |
| | | VMAC 1|| | | | | VMAC 2| | |
| | +-------+| | | | +-------+ | |
| | | | | | | | | |
| +----------+ | | +-----------+ |
| | | | | |
| +----------------+ | | +----------------+ |
| |Virtual Switch 1| | | |Virtual Switch 2| |
| +----------------+ | | +----------------+ |
| | | | | |
+--------------------+ +--------------------+
| |
+-------------+ +----------------+
|Ingress Edge | | Ingress Edge |
| Switch 1 | | Switch 2 |
+-------------+ +----------------+
| | | |
| | ,----------. | |
| | ,' `. | |
| +-----( Core Network )--+ |
| `. ,' |
| `-+-------+' |
| |
| +-------------------------+ |
+-----| SVDC Controller |-----+
+-------------------------+
Figure 1 SVDC Architecture
In minimum configuration, the SVDC architecture only contains an SVDC
controller and the updated edge switches. The controller interacts
with the edge switches using an SDN protocol like [OPENFLOW]. A very
light-weight modification on the virtual switch is required to fill
the server-local identifier of a virtual network into the packet.
Core switches and VMs just run legacy protocols, and can be unaware
of SVDC.
In the core network, any kind of layer-2 forwarding schemes can be
used, for example, Spanning Tree Protocol (STP) [802.1D], TRILL
protocol [RFC6325] and Shortest Path Bridging protocol [802.1aq] for
unicast, while global multicast tree formation protocol for
multicast. However, up to the operator's configuration, the SVDC
controller can also use [OPENFLOW] to configure the unicast/multicast
Chen, et al. Expires Feb 2016 [Page 6]
INTERNET DRAFT SVDC August 2015
forwarding entries in the core network. SVDC can seamlessly coexist
with any forwarding fabric in the core network, either SDN or non-
SDN.
Every virtual switch maintains a local FIB table with entries
destined to VMs on the local server, while packets sent to all the
other VMs are simply forwarded to the edge switch it connects to. An
edge switch maintains both a unicast encapsulation table and a
multicast encapsulation table, used in MAC-in-MAC encapsulation for
every packet. When the first packet of a flow arrives at an ingress
edge switch, the encapsulation table lookup will fail and then the
packet is directed to the SVDC controller. The SVDC controller then
looks up its mapping tables which maintain the global information of
the network, and responds to the ingress switch with the information
to update its encapsulation table. Subsequent packets of the flow
will be directly encapsulated by looking up the encapsulation table,
without interrupting the SVDC controller again. Multicast group join
requests are also directed to the SVDC controller, and then the
controller updates the multicast decapsulation table in corresponding
egress switches with group membership.
SVDC supports a great scale of virtual networks by maintaining a
global identifier for every virtual network in the SVDC controller,
but never carrying the identifier in the packet. Instead, a server-
local identifier is carried in the packet header to identify a
virtual network on a certain physical server. The SVDC controller
maintains the mapping relationship between the global and local
identifiers, and is responsible for the translation when the first
packet of a flow is directed to the SVDC controller. The translation
includes both mapping a server-local virtual network identifier to
the global identifier, and vice-versa. SVDC reuses the 12-bit VLAN
[802.1q] field as the in-packet server-local virtual network
identifier, which should be adequate since the number of virtual
networks in a physical server cannot exceed 4096.
To minimize the packet header overhead introduced due to
encapsulating the original Ethernet packets from VMs in a layer-2
network, SVDC uses MAC-in-MAC encapsulation in ingress switches. It
not only masks the MAC address overlap from VMs in different virtual
networks, but also minimizes the number of forwarding entries in core
switches. The key point here is how to guarantee correct packet
forwarding in the first hop and last hop, since no information is
carried in the packet to globally differentiate the virtual networks
in a direct way. SVDC has two approaches to deal with these problems.
First, for the ingress switch to identify the virtual network an
incoming packet belongs to, only the server-local identifier carried
in the VLAN field is not enough. But the VLAN field together with the
Chen, et al. Expires Feb 2016 [Page 7]
INTERNET DRAFT SVDC August 2015
incoming port of the switch are just enough for the identification,
since the incoming port of the switch can uniquely identify the
physical server where the packet is sent from.
Second, when the egress switch decapsulates the outer MAC header, it
needs a way to correctly forward the packet to an outgoing port.
Local table lookup cannot help because the in-packet virtual network
identifier is not the global one and thus can overlap. The way we
come up with is to reuse the VLAN field of the outer MAC header to
indicate the forwarding port in the egress switch. The field is
filled in the ingress switch for a unicast packet by looking up the
unicast encapsulation table, and filled in the egress switch for a
multicast packet by looking up the multicast decapsulation table. The
12-bit VLAN tag is also more than enough to identify different
servers connecting the egress switch, unless the egress switch has
more than 4096 ports, which cannot happen in practice.
SVDC encompasses multicast and broadcast within each virtual network
with possible overlapping group addresses. In order to avoid traffic
leakage among virtual networks, the SVDC controller maps each
multicast group or broadcast in a virtual network to a global
multicast group, which can be identified by the global multicast
group address, composed of 23-bit multicast MAC address and 12-bit
VLAN field. This 35-bit global multicast group address is enough to
support a potentially huge number of multicast/broadcast groups
within virtual networks and can be carried in the outer Ethernet
header.
The following sections will describe the design detail of each
component in SVDC architecture.
2.1 Virtual Switch
Every virtual switch configures its FIB table entries towards VMs in
the local server, and sets the forwarding port of the default entry
towards the edge switch connecting to the server it resides in. The
key of the FIB table entry in virtual switch is a tuple (LTID,VMAC),
which uniquely identifies a VM in a physical server. Note that in
SVDC, VMs are not aware of the virtualized network infrastructure,
and thus the Ethernet header sent by a VM does not contain any LTID.
When a virtual switch receives an Ethernet packet, it first
determines whether it is from a local VM or from the outbound port.
If from a local VM, the virtual switch adds the LTID in the VLAN
field of the Ethernet header based on the incoming port and then
forwards it out. If from the outbound port, operations on it depend
on whether it is a unicast packet or a multicast/broadcast packet.
For a unicast packet, the virtual switch directly looks up the FIB
Chen, et al. Expires Feb 2016 [Page 8]
INTERNET DRAFT SVDC August 2015
table and forwards it to a certain VM in the local server; for a
broadcast packet, the virtual switch forwards it to all VMs within
the same virtual network on the local server; while for a tenant-
defined multicast packet, the virtual switch forwards it towards VMs
that are interested in it, which can be learned by snooping the
multicast group join message sent by VMs.
2.2 Edge switches
Edge switches bear most intelligence of the data plane in SVDC. It is
responsible for rewriting VLAN field in the inner Ethernet packet
header and encapsulating/decapsulating the original Ethernet packets.
Every ingress edge switch maintains a unicast encapsulation table
which maps from (in-port, LTID-s, VM-d) to (LTID-d, ES-d, p-ID),
where in-port is the incoming port of the packet, LTID-s is the LTID
of the virtual network in the source server, VM-d is the MAC address
of the destination VM in the original Ethernet header, LTID-d is the
LTID of the virtual network in the destination server, ES-d is the
MAC address of the egress edge switch, and p-ID is the outgoing port
to which the egress edge switch should forward the packet. If the
lookup hits, the ingress edge switch will do the following
operations. First, it rewrites LTID-s in the VLAN field of the
original Ethernet header as LTID-d. Second, it encapsulates the
packet by adding an outer Ethernet header, with ES-d as the
destination MAC address, its own MAC address (ES-s) as the source MAC
address, and p-ID as the VLAN field. Third, it forwards the
encapsulated packet by looking up the forwarding table. However, if
the lookup fails, the ingress edge switch will direct the packet to
the SVDC controller with incoming port of the packet, which helps the
controller obtain the information required to install an
encapsulation entry in the unicast encapsulation table.
A multicast encapsulation table is also maintained, which maps from
the tuple (in-port, LTID-s, Group-L) to the global multicast group
address Group-G to fill in the outer Ethernet header. If the lookup
hits, it encapsulates the multicast/broadcast packets with Group-G as
the destination MAC address and VLAN ID while ES-s as the source MAC
address. If the lookup misses, it will send this packet to the SVDC
controller to update the multicast encapsulation table.
Since VMs of a certain group can have different LTIDs in different
servers, egress edge switches should rewrite LTID in the inner
Ethernet header for each packet duplication destined to different
servers. Thus, every egress edge switch maintains a multicast
decapsulation table, which maps from Group-G to multiple (Out-PORT,
LTID-d) tuples, where Out-PORT is an output port of a
multicast/broadcast packet duplication and LTID-d is the LTID of the
Chen, et al. Expires Feb 2016 [Page 9]
INTERNET DRAFT SVDC August 2015
virtual network in the destination server connecting to the Out-PORT.
Entries in this table are inserted by the SVDC controller when the
multicast group join message sent by a VM is directed to it. When an
egress edge switch receives a multicast packet, it first duplicates
this packet as the number of (Out-PORT,LTID-d) tuples. Then, it
decapsulates each packet duplication, rewrites the LTID in the inner
Ethernet header of each packet duplication as indicated by LTID-d and
sends each packet duplication towards the destination server as
indicated by the Out-PORT.
2.3 SVDC Controller
The SVDC controller keeps several groups of mapping tables based on
its global knowledge of the network.
- LT-GT MAP: (SID, LTID) is mapped to GTID.
It is used to identify the global identifier of a virtual
network based on a physical server identifier and its local
virtual network identifier.
- VM-LT MAP: (GTID, VMAC) is mapped to (SID,LTID).
Based on the global identifier of a virtual network and a
certain MAC address, we can uniquely identify the physical
server a VM resides in as well as the local identifier of the
virtual network on that server.
- SID-ES MAP: (EID, port) is mapped to SID and vice versa.
This mapping table can be directly obtained from the network
topology and it is used to identify the server connected to a
certain port of an edge switch or vice versa.
- GL-GG MAP: (GTID,Group-L) is mapped to Group-G.
It is used to map a multicast group or broadcast address within
a virtual network to its global multicast group address.
The main function of the SVDC controller is to respond to requests
from edge switches with information they need, which helps install
the encapsulation/decapsulation table entries in the ingress/egress
edge switches. When an ingress edge switch receives the first packet
of a flow, it directs the packet to the controller with the incoming
port of the packet and queries the controller for the information
required.
If it is a unicast data packet, the controller first uses SID-ES MAP
to get the SID of the source server. By source server's SID and LTID
in the original packet, the controller then identifies GTID of the
virtual network by LT-GT MAP. Based on the GTID and the destination
MAC address of the original packet, the controller can use VM-LT MAP
Chen, et al. Expires Feb 2016 [Page 10]
INTERNET DRAFT SVDC August 2015
to further identify the destination SID and LTID of the virtual
network in the destination server. Finally, the controller depends on
the SID-ES MAP again to get the MAC address of the egress edge switch
as well as the port number of the egress edge switch connecting to
the destination server. Now, the SVDC controller can return all the
information needed by the ingress edge switch to construct an unicast
encapsulation table entry.
If it is a multicast data packet, the controller uses SID-ES MAP and
LT-GT MAP sequentially to get the GTID of the virtual network as
aforementioned. Then, if the controller can find a corresponding
entry in GL-GG MAP to get Group-G, it returns Group-G to the ingress
switch to build the multicast encapsulation table. If not, it will
find an available global multicast group address Group-G, insert a
new entry to GL-GG MAP, and return the new Group-G to the ingress
edge switch.
If it is a multicast group join request, the SVDC controller first
gets the GTID of the virtual network by using SID-ES MAP and LT-GT
MAP sequentially. Then, it looks up the GL-GG MAP to find the
corresponding Group-G. If the SVDC controller can find one, it just
responds to the edge switch with this information. If not, the SVDC
controller will find an available Group-G and insert a new entry to
the GL-GG MAP before it responds it to the edge switch. After the
edge switch gets the Group-G from the SVDC controller, it inserts a
new entry into the multicast decapsulation table with Out-PORT as the
incoming port of the multicast group join request and LTID-d as the
LTID of it.
If the cloud provider's layer-2 data center networks are
geographically distributed across the Internet, the SVDC controller
needs to maintain the information of all cloud data center networks
of this cloud provider. In practice, each data center network has a
controller and the global information is synchronized among the
controllers periodically.
3. Packet Forwarding
3.1 Unicast Traffic
When a unicast packet is generated by a VM and sent out to the local
virtual switch, it carries the destination MAC address (VM-d), the
source MAC address (VM-s), and leaves the VLAN field empty.
The virtual switch then adds the local LTID (LTID-s) into the VLAN
field of the packet and looks up the local FIB table for forwarding.
Chen, et al. Expires Feb 2016 [Page 11]
INTERNET DRAFT SVDC August 2015
If the destination VM is within the local server, the packet will be
directly forwarded to it. Otherwise, the packet is delivered to the
ingress edge switch ES-s.
Next, the ingress edge switch ES-s looks up its encapsulation table
using (in-port, LTID-s, VM-d) as key. If missed, the ingress edge
switch directs the packet to the controller and the controller
installs the encapsulation entry for the flow. If hit, the ingress
edge switch obtains the tuple (LTID-d, ES-d, p-ID). Then VLAN field
of the original Ethernet header is changed from LTID-s to LTID-d, and
an outer Ethernet header is added. The ingress edge switch
immediately looks up the FIB table to forward the packet.
After that, the packet is delivered by core switches towards the
egress edge switch ES-d. The egress edge switch gets the VLAN field
of the outer Ethernet header p-ID, decapsulates the outer Ethernet
header, and forwards it to the port p-ID.
Finally the packet arrives at the destination virtual switch. The
virtual switch looks up the FIB table based on LTID-d and VM-d, and
delivers it to the destination VM.
3.2 Multicast/Broadcast Traffic
When a VM generates a multicast packet, the destination address field
of the Ethernet header is filled with the layer-2 multicast group
address, denoted as Group-L. This packet then goes to the virtual
switch, which inserts LTID-s into the VLAN field and forwards it
towards the ingress edge switch.
The ingress edge switch ES-s looks up its multicast encapsulation
table using (in-port, LTID-s, Group-L) as key. If missed, the ingress
edge switch directs the packet to the controller. Then, the
controller installs the multicast encapsulation entry into the
ingress edge switch and the multicast decapsulation entries into the
egress edge switches. If hit, the ingress edge switch gets the global
multicast group address Group-G to fill in the outer Ethernet header.
This packet is then forwarded towards the egress edge switches along
the multicast tree. When an egress edge switch receives this packet,
it takes Group-G filled in the outer Ethernet header as key and gets
multiple (Out-PORT,LTID-d) tuples. It then duplicates the packet as
the number of the tuples, decapsulates each packet duplication,
rewrites the LTID of it and forwards it towards the Out-PORT.
Finally, the packet arrives at the destination virtual switch and is
forwarded towards VMs which have joined the multicast group in the
Chen, et al. Expires Feb 2016 [Page 12]
INTERNET DRAFT SVDC August 2015
virtual network.
3.3 SVDC Frame Format
To mask the overlapped VM MAC addresses and mitigate the limitation
of the forwarding table size in switches. SVDC enhances MAC-in-MAC
encapsulation to guarantee correct packet forwarding. Figure 2
demonstrates the packet format of the MAC-in-MAC encapsulation used
in SVDC.
Outer Ethernet Header:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Outer Destination MAC Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Outer Destination MAC Address | Outer Source MAC Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Outer Source MAC Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ethertype = SVDC Ethertype | Outer.VLAN Tag (p-ID) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Inner Ethernet Header:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Inner Destination MAC Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Inner Destination MAC Address | Inner Source MAC Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Inner Source MAC Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ethertype = C-Tag [802.1q] | Inner.VLAN Tag (LTID) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Payload:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Ethertype of Original Payload | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| Original Ethernet Payload |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Frame Check Sequence:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| New FCS (Frame Check Sequence) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 2. SVDC MAC-in-MAC Packet Format
The outer Ethernet header: The source Ethernet address in the outer
Ethernet header is set to the MAC address of the ingress edge switch.
The destination Ethernet address is either set to the MAC address of
Chen, et al. Expires Feb 2016 [Page 13]
INTERNET DRAFT SVDC August 2015
the egress edge switch (in unicast traffic) or set to the first 48
bits of the Global-G assigned to the virtual network (in
multicast/broadcast). To distinguish SVDC packets, the Ethertype of
the outer Ethernet header needs to be set to a specific SVDC
Ethertype. The outer VLAN information is used to indicate either the
egress port of the packet in the egress edge switch (in unicast
traffic) or the last 12 bits of the Global-G of the virtual network.
The Inner Ethernet header: The source and destination Ethernet
address in the inner Ethernet header is set to the MAC address of the
source and destination VM, respectively. Value of the VLAN tag is
used to indicate the LTID of the virtual network this packet belongs
to in the destination server. The payload of the inner Ethernet
header includes the Ethertype of the original payload and the
original Ethernet payload.
4. SVDC Deployment Considerations
4.1 VM Migration
To handle VM migration, a central VM manager which can communicate
with all hosts needs to be deployed in the network. The SVDC
controller needs to be co-located with this central VM manager. In
this scenario, when a VM is about to migrate, the VM manager will
notify the SVDC controller about the destination server ID, the IP
address and the GTID of this VM.
SVDC controller needs to check whether a LTID is assigned to the
virtual network of this VM in the destination server before VM
migration starts. If not, a LTID will be created and the virtual
switch on the destination server will be configured.
After VM migration completes, a gratuitous ARP message is sent from
the destination server to announce the new location of the VM. This
ARP message is directed to SVDC controller for broadcast entries
query when it arrives at the edge switch. In this way, SVDC can
confirm VM migration completion and update the location information
of this VM in its mapping tables.
To maintain the communication states destined for the migrated VM in
edge switches, SVDC controller broadcasts an entry update message to
all edge switches immediately after it receives the gratuitous ARP
message. This message contains the (LTID, ES, p-ID) tuple the
migrated VM uses after migration. All edge switches that maintain
encapsulation table entries toward the migrated VM update their
encapsulation tables and keep the communication states towards the
Chen, et al. Expires Feb 2016 [Page 14]
INTERNET DRAFT SVDC August 2015
migrated VM. The gratuitous ARP message is then sent to VMs within
the same virtual networks to update the ARP tables of them.
4.2 Fault Tolerance
An important aspect of large virtualized data center network is the
increased likelihood of failures. SVDC tolerates server failures as
well as edge switch failures, because no "hard state" is associated
with a specific virtual switch or edge switch. In large virtualized
data center, it is rational to assume that there are virtual network
and physical network management systems which are responsible for
detecting failed virtual switches or edge switches.
However, it is necessary for SVDC to handle failures of controller
instances or control links between controller instances and edge
switches. To handle failures of controller instances, more than one
controller instances can be used to manage each network element. All
controller instances will synchronize network information
periodically. They can work in hot backup or cold backup mode. When
one controller instance fails, another instance can replace it in
time. To handle failures of control links, traditional routing
protocols that are fault-tolerant, e.g. Spanning-Tree protocol
[802.1D], can be applied to the out-band management network
deployment. For in-band management network deployment, we assume the
layer-2 routing scheme in the core network can take the
responsibility to handle link failures.
5 Security Considerations
Since SVDC enhances MAC-in-MAC technique to implement network
virtualization, it faces several security challenges that traditional
Ethernet network also faces, such as layer-2 traffic snooping, packet
flooding causing denial of service attack and MAC address spoofing.
In SVDC, malicious end-point can choose to attack the SVDC controller
by forging a great number of communication request with different
source and destination pairs or hijack the MAC address of the edge
switch to interfere the normal communication between the SVDC
controller and the edge switches.
Traditional layer-2 technique can be deployed in SVDC to handle these
problems, for example, IEEE 802.1 port admission control mechanism
[802.1X] can be used to mitigate the spoofing problem. The security
of the communication channel between edge switches and the SVDC
controller relies on security mechanism in transport layer.
Chen, et al. Expires Feb 2016 [Page 15]
INTERNET DRAFT SVDC August 2015
6 IANA Considerations
This document has no actions for IANA, but SVDC needs to be assigned
a new ethertype.
7 References
7.1 Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
7.2 Informative References
[802.1aq] IEEE, "Standard for Local and metropolitan area networks --
Media Access Control (MAC) Bridges and Virtual Bridged
Local Area Networks -- Amendment 20: Shortest Path
Bridging", IEEE P802.1aq-2012, 2012.
[802.1D] IEEE, "Draft Standard for Local and Metropolitan Area
Networks/ Media Access Control (MAC) Bridges", IEEE
P802.1D-2004, 2004.
[802.1q] IEEE, "Standards for Local and Metropolitan Area Networks:
Virtual Bridged Local Area Networks.", IEEE Standard
802.1Q, 2005 Edition, May 2006.
[802.1X] IEEE, "IEEE Standard for Local and Metropolitan area
networks -- Port-Based Network Access Control", IEEE Std
802.1X-2010, February 2010.
[RFC4762] Lasserre, M. and Kompella, V., "Virtual private LAN service
(VPLS) using label distribution protocol (LDP) signaling",
RFC 4762, January 2007.
[RFC6325] Perlman, R., Eastlake 3rd, D., Dutt, D., Gai, S., and A.
Ghanwani, "Routing Bridges (RBridges): Base Protocol
Specification", RFC 6325, July 2011.
[RFC7348] Mahalingam, M., Dutt, D., Duda, K., and Agarwal, P.,
"Virtual eXtensible Local Area Network (VXLAN): A
Framework for Overlaying Virtualized Layer 2 Networks over
Layer 3 Networks", RFC 7348, August 2014.
[NVGRE] Sridharan, M., A. Greenberg, N. Venkataramiah, Y. Wang, K.
Duda, I. Ganga, G. Lin, M. Pearson, P. Thaler, and C.
Chen, et al. Expires Feb 2016 [Page 16]
INTERNET DRAFT SVDC August 2015
Tumuluri. "NVGRE: Network virtualization using generic
routing encapsulation." IETF draft, April, 2015.
[SDN] Open Networking Foundation White Paper, "Software-Defined
Networking: The New Norm for Networks", April 2012.
[OPENFLOW] McKeown, N., T. Anderson, H. Balakrishnan, G. Parulkar, L.
Peterson, J. Rexford, S. Shenker, and J. Turner.
"OpenFlow: enabling innovation in campus networks
(OpenFlow White Paper)." Online:
http://www.openflowswitch.org 2008.
Authors' Addresses
Congjie Chen
4-104, FIT Building,
Tsinghua University,
Hai Dian District,
Beijing, China
EMail: ccjguangzhou@gmail.com
Dan Li
4-104, FIT Building,
Tsinghua University,
Hai Dian District,
Beijing, China
EMail: tolidan@tsinghua.edu.cn
Jun Li
Network and Security Research Laboratory,
Department of Computer and Information Science,
University of Oregon,
1585 E 13th Ave.
Eugene, OR 97403
EMail: lijun@cs.uoregon.edu
Chen, et al. Expires Feb 2016 [Page 17]