Internet DRAFT - draft-kj-nvo3-pion-architecture
draft-kj-nvo3-pion-architecture
Network Working Group L. Jin
Internet-Draft ZTE
Intended status: Informational B. Khasnabish
Expires: November 12, 2012 ZTE USA
May 11, 2012
Architecture of PSN Independent Overlay Network(PION)
draft-kj-nvo3-pion-architecture-00.txt
Abstract
This draft introduces PSN independent overlay network (PION)
architecture for intra- and inter-datacenter (DC) connections. The
motivations, protocol layers, applications, and etc, for PION are
also discussed. PION provides a virtualized underlying-PSN-
independent network in order to maximize the reuse of IETF protocol
definitions and implementations. The inter- and intra-DC connection
provided by PION could be from endpoint to endpoint, or endpoint to
network, or network to network. The packet transport capabilities
provided by the overlay network are determined by the capability of
the underlying PSN.
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on November 12, 2012.
Copyright Notice
Copyright (c) 2012 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
Jin & Khasnabish Expires November 12, 2012 [Page 1]
Internet-Draft draft-kj-nvo3-pion-architecture-00 May 2012
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. List of Acronyms . . . . . . . . . . . . . . . . . . . . . . . 3
3. PION Definition . . . . . . . . . . . . . . . . . . . . . . . 3
4. PION Motivation . . . . . . . . . . . . . . . . . . . . . . . 4
5. PION Protocol model . . . . . . . . . . . . . . . . . . . . . 5
5.1. Protocol Layers . . . . . . . . . . . . . . . . . . . . . 5
5.2. Encapsulation Layer . . . . . . . . . . . . . . . . . . . 6
5.3. Tenant Network Identifier Layer . . . . . . . . . . . . . 7
5.4. PSN Layer Encapsulation . . . . . . . . . . . . . . . . . 7
5.5. Associate tenant and PSN Layer . . . . . . . . . . . . . . 7
6. Network Architecture . . . . . . . . . . . . . . . . . . . . . 8
7. Applicability of PION . . . . . . . . . . . . . . . . . . . . 8
7.1. PION over IP PSN . . . . . . . . . . . . . . . . . . . . . 9
7.2. PION over MPLS PSN . . . . . . . . . . . . . . . . . . . . 9
8. Control Plane Consideration . . . . . . . . . . . . . . . . . 10
9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 11
10. Informative References . . . . . . . . . . . . . . . . . . . . 11
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 12
Jin & Khasnabish Expires November 12, 2012 [Page 2]
Internet-Draft draft-kj-nvo3-pion-architecture-00 May 2012
1. Introduction
This draft introduces architecture of PSN independent overlay network
(PION) for intra and inter datacenter connection, the motivation,
protocol layer, application and etc. The PSN in this draft refers to
IP or MPLS network. PION provides a virtualized network independent
of underlying PSN, so as to maximize the reuse of IETF protocol
definitions and implementations. That means the overlay network
could work on any underlying PSN layer, and reuse the capability of
the underlying layer.
The inter- and intra-DC connection provided by PION could be from
endpoint to endpoint, or endpoint to network, or network to network.
The packet transport capabilities provided by the overlay network are
determined by the capability of the underlying PSN. Enabling overlay
network to be independent of underlying PSN allows overlay network to
be benefit from different kinds of underlying PSN capabilities.
2. List of Acronyms
PSN: Packet Switched Network
PION: PSN independent overlay network
PION Header: include an encapsulation layer and tenant ID layer
Tenant Packet: a customer packet encapsulated with PION header
BW: BandWidth
ECMP: Equal Cost Multi-Path
MPLS: Multi-Protocol Label Switch
NVE: Network Virtualization Edge
QoS: Quality of Service
SLA: Service level aggrement
3. PION Definition
PSN independent overlay network (PION) is to provide an overlay
network for datacenter, to provider intra and inter connection
between end-points over various kinds of underlying PSN. PION could
provide service bandwidth and QoS assurance, multicast, traffic
Jin & Khasnabish Expires November 12, 2012 [Page 3]
Internet-Draft draft-kj-nvo3-pion-architecture-00 May 2012
engineering, security and other capabilities by relying on different
kinds of underlying PSN capabilities. PION is designed to isolate
traffic and addresses among different tenants, and to be scalable
enough to accommodate millions of end-points.
The PION provides the following functionalities:
1. Be PSN independent, to maximize reuse of existing IETF defined
PSN technologies.
2. Provide traffic/address isolation for each tenant traffic;
3. Provide good scalability to accommodate two million VMs running
on greater than one hundred thousands of physical servers;
4. Provide differentiate service for differentiate tenant, including
bandwidth, QoS and etc;
4. PION Motivation
As a core requirement, emerging datacenters need to support both
multi-tenancy and high scalability. The packet transport
capabilities provided by overlay network are determined by the
capability of underlying PSN, making overlay network independent of
underlying PSN allow overlay network to benefit from different kinds
of underlying PSN capabilities.
The IP PSN could provide the maximum connection availability for the
overlay network, and it also provides stateless IP connections which
ease the operation of PSN tunnel. Approach examples like VXLAN
[I-D.mahalingam-dutt-dcops-vxlan], NVGRE
[I-D.sridharan-virtualization-nvgre] or STT [I-D.davie-stt] allow
setting up of a Layer 2 overlay network over UDP, IP, or even TCP-
like.
The MPLS PSN is now widely deployed in wide area network, and has
been proved to have capability to provide connection with bandwidth
guarantee, differentiate QoS assurance, high resiliency, and etc.
There are some capabilities that IP PSN does not own, but MPLS PSN
do. One typical example is as below:
1. It is required to support bandwidth guarantee per tenant, not
shared bandwidth provisioning among tenants. The IP connections
resources among NVEs are served for all tenants, and would be unable
to setup connections that are dedicated for tenants. Some "Gold"
class tenants may require bandwidth guarantee for the service. If
tenants of other category (e.g., Silver, Bronze, etc.) are mixed/
Jin & Khasnabish Expires November 12, 2012 [Page 4]
Internet-Draft draft-kj-nvo3-pion-architecture-00 May 2012
shared with Gold category tenants, and the traffic flows from all
category tenants are transferred over the same connection, the
desired bandwidth of the Gold-class tenants may not be guaranteed.
When the overlay network is across WAN, the bandwidth guarantee
problem would be exaggerated by the limited bandwidth in WAN. The
MPLS PSM has the capability to provide tenant-aware traffic
transportation. For example, when the connection provided by overlay
network is across WAN with IP/MPLS enabled, and then the specified
tenant traffic could be traffic engineered by the IP/MPLS network,
which would greatly improve the tenant service transportation
quality.
The purpose of PSN independent overlay network (PION) is to reuse
various kinds of existing IETF defined PSN technologies, while
keeping the tenant packet encapsulation to be uniformed over
different type of PSN connections/tunnels. The PSN here mainly refer
to IP and MPLS, the layer2 PSN technologies are excluded.
5. PION Protocol model
5.1. Protocol Layers
PION protocol layering model is shown below:
+-------------------------------------------+
| Customer Payload |
| ~~~ |
/===========================================\
H Tenant Network Identifier H
H-------------------------------------------H <--Tenant Header
H Encapsulation H
\===========================================/
| PSN Layer |
+-------------------------------------------+
| Data-Link |
+-------------------------------------------+
| Physical Layer |
+-------------------------------------------+
Figure 1
The customer payload in datacenter would be an Ethernet payload, but
here it does not preclude other type of payload, e.g, IP payload.
The encapsulation layer provides packet transport with some
capabilities that other layers could not provide. For more detail,
Jin & Khasnabish Expires November 12, 2012 [Page 5]
Internet-Draft draft-kj-nvo3-pion-architecture-00 May 2012
see section 4.2.
Tenant Network Identifier (TNI) layer provides customer traffic and
address isolation among different tenants. This identifier maybe
unique per NVE, but MUST be unique per connection between two NVEs.
The PSN layer provides physical network transport for the virtualized
network in datacenter, and is maximally reused from IETF definition
protocols. The ECMP transport capability of PSN layer should be able
to hash the traffic per flow per tenant.
Data-Link and physical layer is out of the scope of this document.
5.2. Encapsulation Layer
There are several functions/services that the encapsulation layer
could provide. This draft lists the following functions/services:
1. Customer payload indication to indicate different type of
customer payloads.
2. Packet sequencing and fragmentation capability.
3. Flow entropy value to add flow based entropy, and tag all the
packets from a flow with an entropy label.
The customer payload could be Ethernet in many cases, but does not
preclude IP payload. An indication value in encapsulation layer
could be provided to indicate the customer payload type. Some
application using UDP transportation requires to transmit packets
with sequence, and to get the information of packet loss. Some
application requires lager packet transportation to improve
efficiency, and then packet fragmentation is required, and is
preferred to be performed at hardware layer. The encapsulation layer
has the capability to provide packet fragmentation information. Some
PSN connection used by PION does not provide ECMP capability, e.g,
GRE. The encapsulation layer would provide such ECMP capability, by
adding a flow entropy value to indicate flow based entropy, and it is
required to tag all the packets from one flow with same entropy
value.
As the PSN layer with UDP encapsulation, the entropy value could be
added to the UDP source port, then the flow entropy value in
encapsulation layer could be omitted.
As the PSN layer with GRE tunnel, the flow entropy value in
encapsulation layer should be added if ECMP per flow is required.
Jin & Khasnabish Expires November 12, 2012 [Page 6]
Internet-Draft draft-kj-nvo3-pion-architecture-00 May 2012
As the PSN layer with TCP-like [I-D.davie-stt] encapsulation, the
sequencing and fragmentation could be provided by the IP layer, and
then the sequencing and fragmentation capability in tenant header
could be omitted. The entropy value could be added to the TCP source
port, and then the flow entropy value in encapsulation layer could be
omitted.
As the PSN layer with MPLS tunnel, the sequencing and fragmentation
in tenant header would be applied if required. The entropy value
could be added to the MPLS flow label, and then the flow entropy
value in encapsulation layer could be omitted.
5.3. Tenant Network Identifier Layer
The tenant network identifier (TNI) could be an integer to indicate
the membership of each customer packet. One example is to use an
explicit integer number, like VLAN. Take datacenter for example,
explicit tenant ID will simplify the interoperations in the inter-
datacenter connection environment. By whatever control plane TNI has
been allocated, static configuration or dynamic allocation, the
overlay network with different control plane could be always
interoperable with same TNI. That would be particular useful when
interconnecting two datacenter with different control plane, the
operator only needs to ensure the same TNI (or by TNI translation) to
interoperate.
5.4. PSN Layer Encapsulation
The PSN Layer required for PION could be any kinds of PSN connection
that has capability to transmit tenant packets. There would be
generally two kinds of PSN connection that could be provided, IP and
MPLS.
5.5. Associate tenant and PSN Layer
It is the NVE's responsibility to associate the tenant with PSN
connection to one peer NVE, which could be done by configuration or
other implementation specific way. Different type of PSN connections
could be used between different NVEs within one tenant. The NVE
should have the capability to setup the specified PSN connection if
required. For example, if only IP connection required between or
among NVEs, IP connection setup capability is required for NVEs. If
one NVE requires BW guarantee connection to peer NVE which is located
in another datacenter across WAN, the NVE should setup hierarchy MPLS
LSP as specified in section 7.2, and specify the bandwidth required.
Jin & Khasnabish Expires November 12, 2012 [Page 7]
Internet-Draft draft-kj-nvo3-pion-architecture-00 May 2012
6. Network Architecture
One important application for the overlay network is to provide intra
and inter datacenter connection between end-point and end-point, or
end-point and network, see figure below.
/-----DC1-----\ /-------WAN-------\ /-----DC2-----\
+---+ | | | | +---+
|NVE|--\ | | | | /--|NVE|
| 1 | \ / | 3 |
+---+ \ +-------+ +-------+ / +---+
\--| Edge | | Edge |--/
/--|Router1|-----------|Router2|--\
/ +-------+ +-------+ \
+---+ / \ +---+
|NVE|--/ \--|NVE|
| 2 | | | | | | 4 |
+---+ | | | | +---+
\-----DC1-----/ \-------WAN-------/ \-----DC2-----/
Figure 2
There are two datacenters, DC1 and DC2 managed by same
administrators, and the two datacenters are connected by the WAN
which would be an IP/MPLS network. The overlay network of intra-
datacenter connection is to connect end-points within a datacenter
(e.g, between NVE1 and NVE2, NVE3 and NVE4), and be scalable enough
without being restricted by the topology of underlying datacenter
network.
The overlay network of inter-datacenter connection is to connect end-
points between two different datacenters (e.g, between NVE1 and NVE3
if NVE1 and NVE3 is the gateway). In this case, the overlay network
would be across a wide area network where the network resources would
be always limited. The overlay network should have the capability to
provide QoS/BW guarantee per tenant customers, even when being across
WAN where large network providers have already deployed MPLS
technology widely. In addition, the MPLS network has the capability
to provide traffic-engineering, QoS/BW guarantee, and higher
reliability. The overlay network should have the capability to
benefit from the underlying network to provide high quality service.
7. Applicability of PION
Jin & Khasnabish Expires November 12, 2012 [Page 8]
Internet-Draft draft-kj-nvo3-pion-architecture-00 May 2012
7.1. PION over IP PSN
Most of the datacenters have IP transport capability, and the end-
points would be reasonably to be assumed to be IP reachable in most
cases. The PSN layer would be an IP layer with UDP encapsulation,
GRE encapsulation, or TCP-like [I-D.davie-stt] encapsulation. If
security is required, IPsec could be used as a PSN tunnel.
The IP connection is indexed by destination IP address, and all
tenants would share the same IP connection when sending to the same
destination. To provide PSN tunnel per destination per tenant,
please see section 7.2.
7.2. PION over MPLS PSN
The MPLS LSP could be setup per destination per tenant, and provide
optimized traffic transmission for the overlay network, which would
greatly improve the service quality especially when interconnecting
different datacenters. And hierarchy MPLS LSP could provide flexible
connection across different domains.
PION over MPLS PSN does not require all the nodes in the network to
be MPLS enabled. Hierarchy MPLS LSP over IP would be used to adapt
the IP environment of a datacenter house. Most of the deployment
would only require NVE and edge router to be MPLS capable in
datacenter. One typical deployment use case for MPLS tunnel per
destination per tenant would be PION deployment across WAN. See the
figure below.
/----DC1----\ /----WAN--------\ /----DC2---\
/ \ / \ / \
+---+ +-------+ +-------+ +---+
|NVE| | Edge | | Edge | |NVE|
| 1 |--------|Router1|-----------|Router2|--------| 2 |
+---+ +-------+ +-------+ +---+
|| | | ||
||<=============>|<=================>|<=============>||
| T1(IP) T2(MPLS) T3(IP) |
|<--------------------------------------------------->|
End to End MPLS Tunnel
Figure 3
The two interconnected datacenter would be across WAN which is MPLS
enabled. The MPLS tunnel per destination address per tenant ID
provided for "Gold" class tenant customer could have dedicated
network resources to serve.
Jin & Khasnabish Expires November 12, 2012 [Page 9]
Internet-Draft draft-kj-nvo3-pion-architecture-00 May 2012
Assuming only the WAN network is required to provide bandwidth
guarantee, where congestion is always happened. When setting up
connection from NVE1 to NVE2 where the two NVEs are gateway for the
specified tenant, there would be a hierarchy LSP from NVE1 to NVE2.
The underlying Tunnel3 (T3 in the figure) between Edge Router2 and
NVE2, underlying Tunnel1 (T1 in the figure) between Edge Router1 and
NVE1 could be IP connection within the datacenter (e.g, GRE
encapsulated). The underlying Tunnel2 (T2 in the figure) between
Edge Router1 and Edge Router2 could be selected as an MPLS-TE tunnel
which would provide QoS/BW guarantee. The allocated MPLS label is an
inner label to associated different underlying tunnels in different
domains. And inner MPLS label is only switched at underlying tunnel
stitching point, e.g, Edge Router1 and Edge Router2.
In the above case, only NVE and Edge Router are required to be MPLS
capable and the MPLS network in WAN could be optimized to provide
high quality service to tenant customer. The edge router in above
case is designed to be tenant-aware to optimize the tenant traffic by
standard IETF way.
8. Control Plane Consideration
There are three kinds of control plane functions for PION:
1. One is between end-point and NVE which is used to signal the
behave requirement of end-point to NVE. One option to implement the
first control plane is to reuse VDP defined by IEEE.
2. The other one is among NVEs, to synchronize end-point and PION
connection mapping among NVEs. Please refer to the requirement
[I-D.kreeger-nvo3-overlay-cp] for more detail.
One option to implement the second control plane above within one
datacenter could be by using one centralized server, and a standard
interface between the central server and NVE should be defined. The
centralized server would collect all the mapping information from
each NVE through this standard interface. When the NVE receives a
packet without forwarding entry, it would request the forwarding
entry from the centralized server to get the correct forwarding entry
and install it with appropriate lift time.
It is also possible to employ two or more centralized servers in one
datacenters, different centralized servers should be able to
synchronize the mapping information, and a standard interface between
different centralized servers should be defined.
When inter-connecting two datacenters, a standard interface between
Jin & Khasnabish Expires November 12, 2012 [Page 10]
Internet-Draft draft-kj-nvo3-pion-architecture-00 May 2012
the corresponding two centralized servers should also be defined, the
interface would be the same as the one within datacenter. All the
PION mapping information would be exchanged between the two central
servers through this standard interface.
3. Additional control plane function for PION is to setup PSN
connection. The IP connection within IP PSN is setup by normal
routing protocols and the IETF defined control plane could be reused.
The control plane of MPLS connection per destination per tenant would
be defined, one possible way is to reused MP-BGP or XMPP.
9. Acknowledgments
The authors would like to thank Igor Gashinsky, David McDysan,
Patricia Thaler, Thomas Morin, Vishwas Manral for their review and
contributions.
10. Informative References
[I-D.davie-stt]
Davie, B. and J. Gross, "A Stateless Transport Tunneling
Protocol for Network Virtualization (STT)",
draft-davie-stt-01 (work in progress), March 2012.
[I-D.kreeger-nvo3-overlay-cp]
Black, D., Dutt, D., Kreeger, L., Sridhavan, M., and T.
Narten, "Network Virtualization Overlay Control Protocol
Requirements", draft-kreeger-nvo3-overlay-cp-00 (work in
progress), January 2012.
[I-D.mahalingam-dutt-dcops-vxlan]
Sridhar, T., Bursell, M., Kreeger, L., Dutt, D., Wright,
C., Mahalingam, M., Duda, K., and P. Agarwal, "VXLAN: A
Framework for Overlaying Virtualized Layer 2 Networks over
Layer 3 Networks", draft-mahalingam-dutt-dcops-vxlan-01
(work in progress), February 2012.
[I-D.sridharan-virtualization-nvgre]
Sridhavan, M., Duda, K., Ganga, I., Greenberg, A., Lin,
G., Pearson, M., Thaler, P., Tumuluri, C., and Y. Wang,
"NVGRE: Network Virtualization using Generic Routing
Encapsulation", draft-sridharan-virtualization-nvgre-00
(work in progress), September 2011.
[RFC6513] Rosen, E. and R. Aggarwal, "Multicast in MPLS/BGP IP
VPNs", RFC 6513, February 2012.
Jin & Khasnabish Expires November 12, 2012 [Page 11]
Internet-Draft draft-kj-nvo3-pion-architecture-00 May 2012
Authors' Addresses
Lizhong Jin
ZTE
889, Bibo Road
Shanghai, 201203, China
Email: lizhong.jin@zte.com.cn, lizho.jin@gmail.com
Bhumip Khasnabish
ZTE USA, Inc.
55 Madison Avenue, Suite 160
Morristown, NJ 07960 USA
Email: bhumip.khasnabish@zteusa.com, vumip1@gmail.com
Jin & Khasnabish Expires November 12, 2012 [Page 12]