Internet DRAFT - draft-hao-trill-anycast-gw
draft-hao-trill-anycast-gw
TRILL Weiguo Hao
Yizhou Li
Donald Eastlake
Internet Draft Huawei
Radia Perlman
Intel Labs
Intended status: Standards Track February 14, 2014
Expires: August 2014
TRILL anycast Layer 3 Gateway
draft-hao-trill-anycast-gw-00.txt
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. This document may not be modified,
and derivative works of it may not be created, and it may not be
published except as an Internet-Draft.
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. This document may not be modified,
and derivative works of it may not be created, except to publish it
as an RFC and to translate it into languages other than English.
This document may contain material from IETF Documents or IETF
Contributions published or made publicly available before November
10, 2008. The person(s) controlling the copyright in some of this
material may not have granted the IETF Trust the right to allow
modifications of such material outside the IETF Standards Process.
Without obtaining an adequate license from the person(s) controlling
the copyright in such materials, this document may not be modified
outside the IETF Standards Process, and derivative works of it may
not be created outside the IETF Standards Process, except to format
it for publication as an RFC or to translate it into languages other
than English.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other documents
Hao & Li,etc Expires August 14, 2014 [Page 1]
Internet-Draft TRILL anycast Layer 3 Gateway February 2014
at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
This Internet-Draft will expire on August 14, 2014.
Copyright Notice
Copyright (c) 2014 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with
respect to this document. Code Components extracted from this
document must include Simplified BSD License text as described in
Section 4.e of the Trust Legal Provisions and are provided without
warranty as described in the Simplified BSD License.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with
respect to this document.
Abstract
This draft mainly describes centralized anycast layer 3 gateway
solution in TRILL campus. Comparing to traditional VRRP based
active-standby layer 3 gateway solution, this solution can achieve
better load balancing and scalability. Anycast nickname, anycast
gateway IP and MAC are introduced. It can ensure inter-subnet
traffic forwarding in flow-based load balancing mode among all
physical layer 3 gateways. To avoid sending duplicated ARP reply
message to the end system, ARP master gateway election mechanism is
introduced. The election algorithm is described in this draft.
Hao & Li,etc Expires August 14, 2014 [Page 2]
Internet-Draft TRILL anycast Layer 3 Gateway February 2014
Table of Contents
1. Introduction ................................................ 3
2. Conventions used in this document............................ 5
3. VRRP based gateways ......................................... 5
4. Anycast layer 3 gateway...................................... 6
4.1. ARP Handling ........................................... 7
4.2. Data traffic forwarding................................. 9
5. Node failure ................................................ 9
6. Anycast MAC aging on edge node.............................. 10
7. TRILL protocol extension.................................... 10
7.1. The Anycast Gateway TLV................................ 10
8. Security Considerations..................................... 11
9. IANA Considerations ........................................ 11
10. Normative References....................................... 11
11. Informative References..................................... 11
12. Acknowledgments ........................................... 11
1. Introduction
In a TRILL based multi-tenancy data center network (DCN), each
tenant normally owns one routing domain (RD) which may consist of
one or more IP subnets. It is a common practice that one layer 2
virtual network (VN) maps to a unique IP subnet. Layer 2 virtual
network in a TRILL campus is identified by a 12-bit VLAN ID or 24-
bit Fine Grained Label [FGL].
All the inter-subnet communication or inter VN communication need to
pass through an L3 GW. Different subnets in one tenant are usually
allowed to communicate with each other freely. Gateway plays an
important role in both such west-to-east traffic and traditional
north-to-south traffic.
Figure 1 shows a typical data center network topology. Multiple core
switches serve as the layer 3 gateways. All the network nodes are
RBridges running TRILL protocol. Gateway functions co-exist with
traditional RBridge functions at the GW switch. There are several
ways to organize the gateways. A traditional way is to use VRRP
based gateways which is explained in section 3. However it has the
issue of scalability and efficiency. In order to avoid single point
of failure and achieve better load balancing, anycast gateway group
can be used.The key idea of anycast gateway is to make multiple
physical gateways share the same gateway IP and MAC address for
single virtual network(VN).
Hao & Li,etc Expires August 14, 2014 [Page 3]
Internet-Draft TRILL anycast Layer 3 Gateway February 2014
,---------.
,' `.
( IP/MPLS WAN )
`. ,'
* -+------+' *
* *
* *
--------- ---------
| GW1 | | GW2 |
| | ************ | |
--------- ---------
* *
* *
* TRILL Campus *
* *
* *
--------- --------- --------- ---------
| TOR1 | ******** | TOR2 | ******** | TOR3 | ******** | TOR4 |
| | | | | | | |
--------- --------- --------- ---------
| | | | | | | |
____ ____ ____ ____ ____ ____ ____ ____
|T | |T | |T | |T | |T | |T | |T | |T |
|S1| |S2| |S3| |S4| |S5| |S6| |S7| |S8|
---- ---- ---- ---- ---- ---- ---- ----
Figure 1 Centralized layer 3 gateway in TRILL campus
For inter-subnet layer 3 traffic, centralized layer 3 gateway is
normally used and put at the boundary of TRILL network and the
external IP network. In figure 1 above, GW1 and GW2 are integrated
devices of layer 3 gateway and TRILL RB function. TRILL protocol
runs on TOR and GW devices. West-to-east IP traffic among different
VNs and north-to-south IP traffic between TRILL network and external
IP network both pass through the layer 3 gateway. When the gateway
receives the unicast TRILL encapsulated traffic from one layer 2 VN,
it removes the TRILL encapsulation header. If destination MAC in
inner Ethernet header is gateway's MAC, the gateway removes inner
Ethernet header. Then the gateway looks up local IP forwarding table.
If destination IP belongs to another VN in TRILL campus, the gateway
will encapsulate the frame in TRILL format and send to the
destination.
To eliminate the single point of gateway failure and to enhance the
reliability, multiple layer 3 gateways are deployed. These gateways
can work in active-standby mode or active-active mode. In active-
standby mode, for each VN only one gateway acts as master and is
Hao & Li,etc Expires August 14, 2014 [Page 4]
Internet-Draft TRILL anycast Layer 3 Gateway February 2014
responsible for IP traffic forwarding between VNs. Network bandwidth
usage is inefficient with such deployment. In a cloud computing data
center, it is estimated that about 70% of traffic is east-west
traffic which requires a non-blocking forwarding for line-speed
traffic transmission between servers.
For inter-subnet layer 3 traffic, multiple centralized layer 3
gateways working in flow-based active-active mode will enhance the
network efficiency. In this draft, such anycast layer 3 gateway
solution for TRILL campus is illustrated. Anycast nickname, anycast
gateway IP and MAC address are introduced. Anycast gateway IP and
MAC address are set on each layer 3 gateway for each VN to terminate
Ethernet traffic. Anycast nickname also is shared by multiple
gateways, the TRILL traffic with anycast nickname as egress nickname
could go to any one of the gateways by the natural support of ECMP
from TRILL protocol, so flow-based load balancing among physical
gateways will be achieved. Comparing to traditional VRRP based
active-standby layer 3 gateway, anycast gateway can achieve better
load balancing and scalability.
This document is organized as follows: Section 3 describes VRRP
based gateway solution and its disadvantage. Section 4 gives anycast
gateway solution overview. Section 5 describes ARP handling process.
Section 6 describes data traffic forwarding. Section 7 describes
TRILL protocol extension.
Familiarity with [RFC6325] is assumed in this document.
2. Conventions used in this document
ARP - Address Resolution Protocol.
ES - End Station.
VN - Virtual Network. In TRILL network, each VN can be identified by
a 12 bit VLAN ID or a 24 bit Fine Grained Label.
3. VRRP based gateways
Assuming in figure 1 above, COR1 and COR2 are centralized gateway in
active-standby mode. TRILL protocol runs on TOR and GW device. ES is
end station. ES1,ES3,ES5 and ES7 belong to VLAN1. ES2,ES4,ES6 and
ES8 belong to VLAN2.
The Virtual Router Redundancy Protocol (VRRP) is designed to
eliminate the single point of gateway failure. VRRP is an election
protocol that dynamically assigns responsibility for a virtual
Hao & Li,etc Expires August 14, 2014 [Page 5]
Internet-Draft TRILL anycast Layer 3 Gateway February 2014
router to one of the VRRP routers on a layer 2 VN. Any of the
virtual router's IP addresses on a LAN can then be used as the
default first hop router by end-hosts. The layer 3 gateway of VRRP
master is responsible for forwarding packets destined to the virtual
router. If VRRP master fails, VRRP backup will take over.
VRRP based solution has the following issues:
1. Inefficient network bandwidth usage. Only the VRRP master gateway
forwards the traffic. VRRP slave is idle most of the time.
2. Low scalability. VRRP session among physical layer 3 gateways
should be established per layer 2 VN. Large number of layer 2 VN
will cause heavy CPU workload for each layer 3 gateway.
4. Anycast layer 3 gateway
Multiple gateways share the same IP and MAC address for each VN.
These IP and MAC address are called anycast IP and anycast MAC
address respectively. Anycast IP is used as the default gateway IP
address for all end hosts in the corresponding VN. Gateways always
respond with the anycast MAC address when receiving ARP request for
the anycast IP. As different VNs are allowed to have overlapping MAC
address space, different anycast IP addresses can map to the same
anycast MAC. That is to say, each VN should have a unique anycast
gateway IP, however multiple anycast gateway IPs may map to the same
anycast MAC. It is recommend to configure only one anycast MAC for
all VNs on each gateway device for simplicity purpose. Each physical
gateway performs layer 2 Ethernet traffic termination when the inner
destination MAC of the incoming frame equal to its anycast MAC.
To support layer 3 traffic load-balancing among all gateways,
besides each layer 3 gateway's own nickname, anycast nickname is
introduced, multiple gateways share the same nickname. Each gateway
announces anycast nickname through the Nickname Sub-Tlv specified in
[RFC6326] to TRILL network and MUST ignore the nickname collision
check as defined in basic TRILL protocol. The anycast nickname used
by the gateway should be set to the highest priority. With such
setting, in case some other RBridge tries to use the same nickname,
the gateway can always win in the nickname conflicts.
Besides anycast nickname/IP/MAC, each physical gateway also has its
own gateway IP and MAC for each VN and its own nickname.
The source MAC of ARP reply when responding to ARP request for
anycast IP from ES is always the anycast MAC. Ingress nickname
should be anycast nickname when the ARP reply message is a unicast
Hao & Li,etc Expires August 14, 2014 [Page 6]
Internet-Draft TRILL anycast Layer 3 Gateway February 2014
TRILL frame. For proactive ARP request from a gateway to ES, source
MAC is the gateway's own MAC. In this case ingress nickname in TRILL
header should be the gateway's own nickname. Edge nodes i.e. ToRs
learn the consistent correspondence of anycast MAC and anycast
nickname and correspondence of gateway's physical MAC and
nickname through normal data plane learning mechanism.
An ES has no knowledge that MAC address it gets for a gateway is
actually an address for anycast purpose. The ES operates in normal
way. The ES acquires correspondence between anycast MAC and anycast
IP through normal ARP procedures. When the ES tries to send traffic
cross subnets, it will send the frame to the gateway first. The
anycast MAC is used by the end system as destination MAC. As edge
nodes, ToRs in this case, learn the consistent correspondence of
anycast MAC and nickname for gateway beforehand, frame from the end
host sending to the gateway could go to any one of the gateways by
the natural support of ECMP from TRILL protocol. The workload is
well spread over all the core switches. When one gateway fails, the
rest could seamlessly take over the workload automatically without
running any VRRP-like keepalive protocol in between.
It should not be allowed to telnet each physical gateway using the
anycast IP address. The information exchange in a single telnet
session may indeed go to the different physical gateways when the
anycast gateway IP address is used for telnet. Consequently the
state machine at the telnet initiator side may be in unpredictable
and disordered states. To overcome this ,it is recommended to use
gateway's own physical IP for telnet. ARP tables age independently
on each physical gateways. A physical gateway should use its own MAC
to send ARP request message to all ES belonging to a VN in proactive
mode to acquire destination ES's ARP table. The source MAC of ARP
request message should be the gateway's own MAC instead of anycast
MAC, the destination ES uses the physical gateway's own MAC as
destination MAC to send ARP reply message. Through this mode, the
ARP reply message from destination ES can be ensured to reach the
physical gateway. Inter-subnet traffic from gateway to ES can use
either the gateway's own physical MAC or anycast MAC as source MAC.
4.1. ARP Handling
Before an ES begins inter-subnet communication, it sends ARP request
to ask the MAC address of the gateway. As the ES uses the anycast
gateway IP as the target address, all physical layer 3 gateways
could possibly respond it. To avoid duplicate ARP reply sending to
the end system, only one physical gateway should be elected to
respond. The physical gateway that responds to ARP request message
Hao & Li,etc Expires August 14, 2014 [Page 7]
Internet-Draft TRILL anycast Layer 3 Gateway February 2014
is called ARP master gateway. Assuming there are k physical gateways,
the algorithm to elect ARP master gateway for each VN is as follows:
1. All physical gateways are ordered and numbered from 0 to k-1 in
ascending order according to the 7-octet IS-IS ID.
2. For VN ID m, choose RB whose number equals (m mod k) as ARP
master gateway.
The algorithm guarantees each VN has a consistent ARP master gateway.
Only ARP master gateway sends ARP reply to an ES's ARP request for
that VN. The rest gateways should ignore the ARP request.
Sender protocol address (SPA) and Sender hardware address (SHA) in
the ARP reply message is set as anycast IP address and anycast MAC
address. The ARP reply message is unicast TRILL encapsulated and
sent to the ES. Ingress nickname should be anycast nickname. Egress
nickname is set as the nickname of egress RB connecting to the ES.
As ES broadcasts ARP request message to TRILL campus, all physical
gateways can learn the correspondence of <ES MAC, ES IP, VN ID,
Ingress Nickname> from the frame. Gateways can use this information
to generate IP forwarding table for that ES.
In summary, through the above ARP process:
1. Edge RBs i.e. TORs learn anycast MAC address associating with
anycast nickname.
2. ES learns the anycast MAC address associating with anycast
gateway IP.
All physical gateways learn the (ES MAC, ES IP and connected edge RB
nickname) for all end systems. ARP tables age independently on each
layer 3 gateway. To avoid the unnecessary flooding due to ARP table
aging, the layer 3 gateway should send ARP detection message
periodically in proactive mode to refresh the ARP table state. In
this case, source MAC in inner Ethernet header and Sender hardware
address (SHA) in the ARP request message is suggested to use the
gateway's own MAC, ingress nickname is suggested to use the
gateway's own nickname when it is unicast TRILL encapsulated. When
the ES receives the ARP request message, ES returns unicast ARP
reply message, destination MAC is the layer 3 gateway's own MAC. The
message will only reach the layer 3 gateway. When the edge RB
connecting the ES receives the ARP reply message, the edge RB will
forward the packet to the ARP request sending layer 3 gateway.
Hao & Li,etc Expires August 14, 2014 [Page 8]
Internet-Draft TRILL anycast Layer 3 Gateway February 2014
4.2. Data traffic forwarding
After an ES acquires anycast MAC associated with anycast IP through
above ARP handling process, it can start to send the inter-subnet IP
traffic. Assuming ES1 tries to send data to ES4 in figure1. They
belong to different subnet. The IP traffic forwarding process is as
following:
1. ES1 sends unicast IP traffic to ES4. Destination IP is ES4's IP
address, destination MAC is anycast gateway's MAC.
2. TOR1 receives the message from ES1. Because TOR1 has already
learned anycast MAC address associating with anycast nickname
through above ARP process, so it sends the packet with unicast
TRILL encapsulation, egress nickname in TRILL header is anycast
nickname. The TRILL data will reach one of the physical gateways
through ECMP. Assuming the TRILL data reaches GW1.
3. GW1 receives the TRILL data from TOR1. It decapsulates the frame
and get native packet. It looks up local IP forwarding table
based on destination IP and tries to forward the packet to ES4.
If entry of <ES4 MAC, ES4 IP, VLAN2, Nickname of TOR2> was stored
on GW1, GW1 encapsulates the frame based on the information and
sends it to the egress RB. The source MAC can be the gateway's
own MAC or anycast MAC. If the gateway's own MAC is used as
source MAC,ingress nickname of TRILL frame should be GW1's own
nickname. If anycast MAC is used, ingress nickname should be
anycast nickname.(If the entry is not available on GW1, the
gateway will send ARP Request message to ES4 proactively.)
4. TOR2 receives the TRILL data from GW1. It decapsulates the frame
and forward the payload to ES4.
All layer 3 traffic will be processed in a flow-based load balancing
mode among all physical gateways. Anycast gateway achieves better
bandwidth utilization and scalability compared to VRRP-like
mechanism.
5. Node failure
When one of the layer 3 gateways fails, after network convergence,
the TRILL traffic to anycast nickname will only reach the remaining
gateways. ARP master gateway will be re-elected among the remaining
gateways. No VRRP-like protocol session among layer 3 gateways is
required to detect the node failure. Network convergence relies
purely on TRILL protocol.
Hao & Li,etc Expires August 14, 2014 [Page 9]
Internet-Draft TRILL anycast Layer 3 Gateway February 2014
6. Anycast MAC aging on edge node
If anycast MAC aged on an edge node, when the edge node receives
inter-subnet traffic from connecting ES, the edge node will flood
the unicast traffic to TRILL campus as unknown unicast traffic. All
physical gateways will receive the traffic, only one of the physical
gateways should forward it, all others should drop it to avoid
forwarding duplicated data to destination ES. The forwarding gateway
is suggested to be same with ARP master device.
7. TRILL protocol extension
All layer 3 gateways should announce the anycast gateway TLV in LSP
defined in section 6.1 to TRILL campus. Each gateway receiving the
anycast gateway TLV from other RBs with the same anycast GW nickname
thinks they are in one anycast gateway group. All the gateways
should ensure the anycast nickname configuration consistency. If the
anycast nickname is different from the local configured one,
configuration error occurs and a network warning or SNMP trap should
be sent to the network management system. Anycast nickname also is
carried in the Nickname Sub-Tlv specified in [RFC6326], each gateway
MUST ignore the nickname collision check for anycast nickname.
7.1. The Anycast Gateway TLV
+-+-+-+-+-+-+-+
|Type= ANY-GW | (1 byte)
+-+-+-+-+-+-+-+
| Length | (1 byte)
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Anycast GW Nickname |(2 bytes)
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
o Type: TLV Type, TBD.
o Length: indicates the length of LAGID field, it is a fixed value
of 1.
o Anycast GW Nickname: the nickname is shared by all the physical
gateways in the anycast gateway group. All the inter-subnet traffic
to the anycast gateways MUST use the nickname as egress nickname in
TRILL header.
Hao & Li,etc Expires August 14, 2014 [Page 10]
Internet-Draft TRILL anycast Layer 3 Gateway February 2014
8. Security Considerations
The default value of anycast nickname priority should be set as
highest value. If nickname on non-gateway and anycast nickname on
gateways occurs collision, it can minimize the probability to modify
anycast nickname.
9. IANA Considerations
TBD
10. Normative References
[1] [RFC6165] Banerjee, A. and D. Ward, "Extensions to IS-IS for
Layer-2 Systems", RFC 6165, April 2011.
[2] [RFC6325] Perlman, R., et.al. "RBridge: Base Protocol
Specification", RFC 6325, July 2011.
[3] [RFC6326bis] Eastlake, D., Banerjee, A., Dutt, D., Perlman, R.,
and A. Ghanwani, "TRILL Use of IS-IS", draft-eastlake-isis-
rfc6326bis, work in progress.
11. Informative References
[4] [RFC 3768] R. Hinden, Ed., "Virtual Router Redundancy Protocol
(VRRP)", RFC 3768, April 2004.
12. Acknowledgments
The authors wish to acknowledge the important contributions of Zhang
Chengsong.
Hao & Li,etc Expires August 14, 2014 [Page 11]
Internet-Draft TRILL anycast Layer 3 Gateway February 2014
Authors' Addresses
Weiguo Hao
Huawei Technologies
101 Software Avenue,
Nanjing 210012
China
Phone: +86-25-56623144
Email: haoweiguo@huawei.com
Yizhou Li
Huawei Technologies
101 Software Avenue,
Nanjing 210012
China
Phone: +86-25-56625375
Email: liyizhou@huawei.com
Donald E. Eastlake
Huawei Technologies
155 Beaver Street
Milford, MA 01757 USA
Phone: +1-508-333-2270
EMail: d3e3e3@gmail.com
Radia Perlman
Intel Labs
2200 Mission College Blvd.
Santa Clara, CA 95054-1549 USA
Phone: +1-408-765-8080
EMail: Radia@alum.mit.edu
Hao & Li,etc Expires August 14, 2014 [Page 12]