Internet DRAFT - draft-sarikaya-nvo3-proxy-vxlan

draft-sarikaya-nvo3-proxy-vxlan







Network Working Group                                        B. Sarikaya
Internet-Draft                                                    F. Xia
Expires: April 26, 2015                                       Huawei USA
                                                        October 23, 2014


        Virtual eXtensible Local Area Network over IEEE 802.1Qbg
                 draft-sarikaya-nvo3-proxy-vxlan-00.txt

Abstract

   In data centers there is interest in offloading network functions to
   the switches in order to keep the server focused on computation not
   networking.  IEEE 802.1Qbg or Virtual Ethernet Port Aggregator (VEPA)
   at the hypervisor simply forces each frame sent out to the external
   switch regardless of destination.  In this case, the eXtensible Local
   Area Network operation or proxying at a higher level switch is
   needed.  Communication functions of the eXtensible Local Area Network
   are moved above to the Top of Rack switches which is called Proxy
   VXLAN.  Proxy VXLAN is a Network Virtualization Edge that does VXLAN
   encapsulation/decapsulation.  Proxy VXLAN also takes part in virtual
   machine creation, virtual machine operation and virtual machine
   mobility.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on April 26, 2015.

Copyright Notice

   Copyright (c) 2014 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents



Sarikaya & Xia           Expires April 26, 2015                 [Page 1]

Internet-Draft                 Proxy VXLAN                  October 2014


   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   4
   3.  Problem Statement . . . . . . . . . . . . . . . . . . . . . .   4
   4.  Proxy VXLAN Architecture  . . . . . . . . . . . . . . . . . .   5
   5.  Overview of the protocol  . . . . . . . . . . . . . . . . . .   5
   6.  Encapsulation/Decapsulation Operation . . . . . . . . . . . .   6
   7.  Virtual Machine Creation  . . . . . . . . . . . . . . . . . .   7
     7.1.  VXLAN Tunnel Endpoint Notification  . . . . . . . . . . .   7
   8.  Virtual Machine Mobility and Operation  . . . . . . . . . . .   7
   9.  P Flag Definition . . . . . . . . . . . . . . . . . . . . . .   8
   10. Security Considerations . . . . . . . . . . . . . . . . . . .   8
   11. IANA considerations . . . . . . . . . . . . . . . . . . . . .   8
   12. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .   9
   13. References  . . . . . . . . . . . . . . . . . . . . . . . . .   9
     13.1.  Normative References . . . . . . . . . . . . . . . . . .   9
     13.2.  Informative References . . . . . . . . . . . . . . . . .  10
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  10

1.  Introduction

   Data center networks are being increasingly used by telecom operators
   as well as by enterprises.  Currently these networks are organized as
   one large Layer 2 network in a single building.  In some cases such a
   network is extended geographically using virtual Local Area Network
   (VLAN) technologies still as an even larger Layer 2 network
   connecting the virtual machines (VM), each with its own MAC address.

   Another important requirement was growing demand for multitenancy,
   i.e. multiple tenants each with their own isolated network domain.
   In a data center hosting multiple tenants, each tenant may
   independently assign MAC addresses and VLAN IDs and this may lead to
   potential duplication.

   What we need is IP based tunneling scheme based overlay network
   called Virtual eXtensible Local Area Network (VXLAN).  VXLAN overlays
   a Layer 2 network over a Layer 3 network.  Each overlay is identified
   by the VXLAN Network Identifier (VNI).  This allows up to 16M VXLAN
   segments to coexist within the same administrative domain [RFC7348].



Sarikaya & Xia           Expires April 26, 2015                 [Page 2]

Internet-Draft                 Proxy VXLAN                  October 2014


   In VXLAN, each MAC frame is transmitted after encapsulation, i.e. an
   outer Ethernet header, an IPv4/IPv6 header, UDP header and VXLAN
   header are added.  Outer Ethernet header indicates an IPv4 or IPv6
   payload.  VXLAN header contains 24-bit VNI.

   VXLAN tunnel end point (VTEP) is the hypervisor on the server which
   houses the VM.  VXLAN encapsulation is only known to the VTEP, the
   virtual machines (VM) that the hypervisor runs never see it.  Also
   the tunneling is stateless, each MAC frame is encapsulated
   independent on any other MAC frame.

   It should be noted that in this document, VTEP plays the role of the
   Network Virtualization Edge (NVE) according to NVO3 architecture for
   overlay networks like VXLAN or NVGRE defined in [I-D.ietf-nvo3-arch].
   NVE interfaces the tenant system underneath with the L3 network
   called the Virtual Network (VN).

   Instead of using UDP header, Generic Routing Encapsulation (GRE)
   encapsulation can be used.  A 24-bit Virtual Subnet Identifier (VSID)
   is placed in the GRE key field.  The resulting encapsulation is
   called Network Virtualization using Generic Routing Encapsulation
   (NVGRE) [I-D.sridharan-virtualization-nvgre].  Note that VSID is
   similar to VNI.  Although VXLAN terminology is used throughout, the
   protocol defined in this document applies to VXLAN as well as NVGRE.

   One deployment strategy for VXLAN is to upgrade data center server
   hypervisors for VXLAN compatibility.  Data center servers that can
   not be upgraded can also be given VXLAN capability using proxying.
   For proxying to work, IEEE 801.1Qbg [IEEE802.1Qbg] or Virtual
   Ethernet Port Aggregator (VEPA) functionality is needed in legacy
   server hypervisors.

   In a virtual server environment the most common way to provide
   Virtual Machine (VM) switching connectivity is a Virtual Ethernet
   Bridge (VEB) or a vSwitch.  VEB acts similar to a Layer 2 hardware
   switch providing inbound/outbound and inter-VM communication.  VEB
   aggregates multiple VMs traffic across a set of links as well as
   provides frame delivery between VMs based on MAC address. .

   However VEB lacks network management, monitoring and security
   functions.  IEEE 801.1Qbg or Virtual Ethernet Port Aggregator (VEPA)
   provides a simple solution.  VEPA simply sends each VM frame out to
   the external switch regardless of destination to be handled by an
   external switch, i.e.  Proxy VXLAN switch.

   VXLAN is a server-based network virtualization solution, and
   hypervisors are responsible for all networking work.  At the same
   time, IEEE 801.1Qbg [IEEE802.1Qbg] follows a totally different



Sarikaya & Xia           Expires April 26, 2015                 [Page 3]

Internet-Draft                 Proxy VXLAN                  October 2014


   philosophy that servers should do as little as possible networking
   job, and it defines a way for virtual switches to send all traffic
   and forwarding decisions to the adjacent physical switch.  This
   removes the burden of VM forwarding decisions and network operations
   from the host CPU.  It also leverages the advanced management
   capabilities in the access or aggregation layer switches.

   In this document, we develop Proxy VXLAN switch behavior.

2.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].  The
   terminology in this document is based on the definitions in [RFC7348]

3.  Problem Statement

   In a virtual server environment the most common way to provide
   Virtual Machine (VM) switching connectivity is a Virtual Ethernet
   Bridge (VEB) or a vSwitch.  VEB acts similar to a Layer 2 hardware
   switch providing inbound/outbound and inter-VM communication.  VEB
   aggregates multiple VMs traffic across a set of links as well as
   provides frame delivery between VMs based on MAC address.  There are
   a number of disadvantages of VEB solution.  First of all, vSwitch
   consumes valuable CPU and memory bandwidth.  The higher the traffic
   load, the greater the number of CPU and memory cycles required to
   move traffic through the vSwitch, reducing the ability to support
   larger numbers of VMs in a physical server.  Secondly, the solution
   lacks network-based visibility. vSwitches have a limited feature set.
   They don't provide local traffic visibility or have capabilities for
   enterprise data monitoring, security, or network management.  Finaly,
   it lacks network policy enforcement.  Modern external switches have
   many advanced features such as port security, quality of service
   (QoS), and access control lists (ACL).  But vSwitches often do not
   have, or have limited support for such features.

   To solve the management challenges with VEBs, Edge Virtual Bridging
   (EVB) in the IEEE 802.1Qbg standard was proposed.  The primary goals
   of EVB are to combine the best of software and hardware vSwitches
   with the best of external L2 network switches.  EVB is based on VEPA
   (Virtual Ethernet Port Aggregator) technology.  It is a way for
   virtual switches to send all traffic to the adjacent physical switch
   and let forwarding decisions be made by the adjacent switch.  This
   removes the burden of VM forwarding decisions and network operations
   from the host CPU.  It also leverages the advanced management
   capabilities in the access or aggregation layer switches.




Sarikaya & Xia           Expires April 26, 2015                 [Page 4]

Internet-Draft                 Proxy VXLAN                  October 2014


   VXLAN idea is mainly developed on vSwitch not IEEE 802.1Qbg.  As
   described in [RFC7348] servers built using IEEE 802.1Qbg switches are
   Non-VXLAN servers.  VXLAN Gateway is needed in a connected upstream
   switch.  [RFC7348] provides a short description of VXLAN Gateway and
   no details are given on how VXLAN Gateway works.

   VXLAN gateway located in a ToR switch is called Proxy VXLAN in this
   document.  We provide the details of proxy VXLAN behavior in the
   following sections.

4.  Proxy VXLAN Architecture

   Proxy VXLAN is composed of servers that can host virtual machines.
   The servers are not involved in any communications required by VXLAN.
   This function is moved to the switches above the server.  Top of Rack
   switches are examples of switches that can host proxy functions, i.e.
   VXLAN Tunnel End Point, VTEPs or NVEs.  Servers support IEEE
   802.1Qbg.

   VTEPs or NVEs receive raw packets from the servers and send packets
   upstream after VXLAN encapsulation.  Proxy VXLAN is assumed to be
   connected to VXLAN enabled servers.  NVE in a Proxy VLAN architecture
   always tags the outgoing frames to let VXLAN enabled servers know
   that these frames are proxied.

   A given ToR switch hosting NVE can serve one of more legacy servers.
   Virtual machine creation/deletion is done by the management center.

5.  Overview of the protocol

   The steps involved in the protocol are explained below:

   Encapsulation/Decapsulation of Frames

      In a hybrid Proxy VXLAN, when a frame is received on the VXLAN
      connected interface, the proxy switch decapsulates the frame and
      forwards the packet to the non VXLAN server.  When an incoming
      frame from the non-VXLAN interface is received, the proxy switch
      encapsulates it and forwards it to the VXLAN server.

   Virtual Machine Creation

      Virtual machine creation is initiated by the management center.
      The management center notifies a given non-VXLAN server to create
      a VM.  The center assigns a MAC address and VXLAN Network
      Identifier to the VM.





Sarikaya & Xia           Expires April 26, 2015                 [Page 5]

Internet-Draft                 Proxy VXLAN                  October 2014


   NVE Notification  Management Center notifies the ToR switch that is
      responsible for the server of this newly created virtual machine.
      The center sends MAC address, VNI of the virtual machine to the
      ToR switch which will act as the NVE for this VM.

   Virtual Machine Operation  Virtual machine execution usually starts
      with ARP/ND Request to get the IP address of the destination
      virtual machine.  After ARP/ND, virtual machine enters into IP
      communication with the destination virtual machine.

   Core switches          +----+  +----+  +----+
                          |    |  |    |  |    |
                          |    |  |    |  |    |
                          +----+  +--+-+  +---\+
   Management            --           |         \\
   Center             ---            |            \
                   --               |             \
                ---                 |              \\
              --                   |                 \
         +----+                  +-+--+            +--\-+
         |    | ToR switch       |    |ToR switch  |    |VXLAN enabled
         |    | NVE1             |    |NVE2        |    |NVE3
         +----+  -+          +---+----+\           +-/--+
           +--+   |     +--+-|     +--+ \\     +--+//
           |  +---|     |  |       |  +---\    |  /
           +--+         +--+       +--+        +--+
            |            |          |           |
        +---------+ +---------+ +---------+ +---------+
        |+--+ +--+| |+--+ +--+| |+--+ +--+| |+--+ +--+|
        ||vm| |vm|| ||vm| |vm|| ||vm| |vm|| ||vm| |vm||
        |+--+ +--+| |+--+ +--+| |+--+ +--+| |+--+ +--+|
        +---------+ +---------+ +---------+ +---------+

                    Figure 1: Proxy VXLAN Architecture

6.  Encapsulation/Decapsulation Operation

   In a hybrid Proxy VXLAN, when a frame is received on the VXLAN
   connected interface, the proxy switch removes the VXLAN header.  It
   checks the destination MAC address of the inner Ethernet frame and
   forwards the packet to a physical port based on this MAC address.
   When an incoming frame from the non-VXLAN interface is received, the
   proxy switch first adds a VXLAN header.  VXLAN Network ID (VNI) is
   set to the value which is provided by the management center.  I Flag
   is set to 1.

   A new flag, P flag is set to 1 to indicate that this frame is coming
   from Proxy VXLAN.  P flag is defined in Section 9.  The need for P



Sarikaya & Xia           Expires April 26, 2015                 [Page 6]

Internet-Draft                 Proxy VXLAN                  October 2014


   flag stems from the fact that if an incoming frame used VLAN ID in
   the inner Ethernet header that frame will be discarded by the proxy.
   Also for outgoing frames, proxy will strip VLAN tag in the
   encapsulated frame.

   Source port is assigned by the proxy switch.  Destination port is set
   to 4789.  UDP checksum is set to zero.

   Source IPv4/v6 address is the proxy switch IPv4/v6 address.
   Destination IPv4/v6 address is obtained based on the inner
   destination MAC address.  It is a multicast address if the incoming
   frame belongs to ARP/ND or multicast communication.  Otherwise the
   proxy switch looks up its ARP/ND cache to find the IP address
   corresponding to the inner destination MAC address and places the
   result in the destination IPv4/IPv6 address.

   When an incoming frame from the non-VXLAN interface is received, the
   proxy switch checks if the destination is within the host (another VM
   in the same VLAN).  In that case the frame is forwarded back down the
   port it was received on.

7.  Virtual Machine Creation

   Virtual machines are created by the management center.  The
   management center creates a virtual machine, assigns a server to it.
   Usually each server may host more than one virtual machine.

   When a virtual machine is created, the management center assigns a
   MAC address and its VXLAN Network Identifier.  The center sends MAC
   address and VNI to the server.

7.1.  VXLAN Tunnel Endpoint Notification

   At the time when the virtual machine is created, the management
   center also notifies the ToR switch hosting a NVE that is responsible
   for the server in which the VM was created.  ToR switch receives MAC
   address and VNI value for this virtual machine.

   ToR switch MUST keep all MAC address/VNI values for each virtual
   machine that it serves.  These values are used in encapsulating the
   packets coming from the virtual machines and in virtual machine
   operation.

8.  Virtual Machine Mobility and Operation

   In Proxy VXLAN, virtual machine mobility can be achieved using the
   following steps:




Sarikaya & Xia           Expires April 26, 2015                 [Page 7]

Internet-Draft                 Proxy VXLAN                  October 2014


   Step 1.  Source NVE is notified with destination NVE of the moving
   VM,

   Step 2 Source NVE tunnels all packets for the VM to destination NVE

   Step 3 When the VM is ready, it would send gratuitous ARP to all VMs

   Step 4 When the source NVE receives the gratuitous ARP, it removes
   the VM MAC from its original forwarding table and stops tunneling for
   this virtual machine.

   When a VM is created or after VM is moved, VM starts its operation,
   e.g. by sending ARP/ND packets.  Non-VXLAN server sends the packet to
   the upstream switch which finally reaches the ToR switch hosting the
   NVE Figure 1.  NVE normally converts this packet, i.e. broadcast
   packet into a multicast packet, encapsulates it and sends it out to
   VXLAN enabled servers.  How ARP/ND packets are processed is out of
   scope.

9.  P Flag Definition

   P flag is defined in Figure 2.

        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |P|R|R|R|I|R|R|R|            Reserved                           |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                VXLAN Network Identifier (VNI) |   Reserved    |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                     Figure 2: P Flag in VXLAN Header

   Flags (8 bits)- where the P flag MUST be set to 1 for a proxied
   packet.  The other bits are set according to [RFC7348].

10.  Security Considerations

   The security considerations in [RFC2131], [RFC2132] and [RFC3315]
   apply.  Special considerations in [RFC7348] are also applicable.

11.  IANA considerations

   This specification defines a new flag (P) in the VXLAN header.







Sarikaya & Xia           Expires April 26, 2015                 [Page 8]

Internet-Draft                 Proxy VXLAN                  October 2014


12.  Acknowledgements

13.  References

13.1.  Normative References

   [RFC0826]  Plummer, D., "Ethernet Address Resolution Protocol: Or
              converting network protocol addresses to 48.bit Ethernet
              address for transmission on Ethernet hardware", STD 37,
              RFC 826, November 1982.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC2131]  Droms, R., "Dynamic Host Configuration Protocol", RFC
              2131, March 1997.

   [RFC2132]  Alexander, S. and R. Droms, "DHCP Options and BOOTP Vendor
              Extensions", RFC 2132, March 1997.

   [RFC3315]  Droms, R., Bound, J., Volz, B., Lemon, T., Perkins, C.,
              and M. Carney, "Dynamic Host Configuration Protocol for
              IPv6 (DHCPv6)", RFC 3315, July 2003.

   [RFC4511]  Sermersheim, J., "Lightweight Directory Access Protocol
              (LDAP): The Protocol", RFC 4511, June 2006.

   [RFC4513]  Harrison, R., "Lightweight Directory Access Protocol
              (LDAP): Authentication Methods and Security Mechanisms",
              RFC 4513, June 2006.

   [RFC4861]  Narten, T., Nordmark, E., Simpson, W., and H. Soliman,
              "Neighbor Discovery for IP version 6 (IPv6)", RFC 4861,
              September 2007.

   [RFC7348]  Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger,
              L., Sridhar, T., Bursell, M., and C. Wright, "Virtual
              eXtensible Local Area Network (VXLAN): A Framework for
              Overlaying Virtualized Layer 2 Networks over Layer 3
              Networks", RFC 7348, August 2014.

   [I-D.ietf-nvo3-arch]
              Black, D., Hudson, J., Kreeger, L., Lasserre, M., and T.
              Narten, "An Architecture for Overlay Networks (NVO3)",
              draft-ietf-nvo3-arch-01 (work in progress), February 2014.






Sarikaya & Xia           Expires April 26, 2015                 [Page 9]

Internet-Draft                 Proxy VXLAN                  October 2014


   [IEEE802.1Qbg]
              IEEE, "Edge Virtual Bridging", IEEE Std 802.1Qbg-2012, May
              2012.

13.2.  Informative References

   [I-D.sridharan-virtualization-nvgre]
              Sridharan, M., Greenberg, A., Wang, Y., Garg, P.,
              Venkataramiah, N., Duda, K., Ganga, I., Lin, G., Pearson,
              M., Thaler, P., and C. Tumuluri, "NVGRE: Network
              Virtualization using Generic Routing Encapsulation",
              draft-sridharan-virtualization-nvgre-06 (work in
              progress), October 2014.

Authors' Addresses

   Behcet Sarikaya
   Huawei USA
   5340 Legacy Dr. Building 3
   Plano, TX  75024

   Phone: +1 972-509-5599
   Email: sarikaya@ieee.org


   Frank Xia
   Huawei USA
   Nanjing, China

   Phone: +1 972-509-5599
   Email: xiayangsong@huawei.com




















Sarikaya & Xia           Expires April 26, 2015                [Page 10]