Internet DRAFT - draft-sajassi-l2vpn-evpn-segment-route
draft-sajassi-l2vpn-evpn-segment-route
INTERNET-DRAFT Ali Sajassi
Intended Status: Standards Track Samer Salam
Sami Boutros
Keyur Patel
Cisco
Expires: January 17, 2013 July 16, 2012
E-VPN Ethernet Segment Route
draft-sajassi-l2vpn-evpn-segment-route-01.txt
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as
Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/1id-abstracts.html
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
Copyright and License Notice
Copyright (c) 2012 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Sajassi et al. Expires January 17, 2013 [Page 1]
INTERNET DRAFT draft-sajassi-l2vpn-evpn-segment-route July 16, 2012
Abstract
[E-VPN] defines a solution and architecture for BGP MPLS-based
Ethernet VPNs. This document describes procedures and additional BGP
route attributes that enhance the multi-homing capabilities of the
solution. These are: the DF Election Attribute and the Inter-chassis
Communication Attribute. This draft describes their usage, advantages
and encoding.
Table of Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Motivation and Usage . . . . . . . . . . . . . . . . . . . . . 3
2.1 Support of Multi-Chassis Ethernet Bundles . . . . . . . . . 3
2.2 Avoiding Relearning of Subscriber/Session State . . . . . . 4
2.3 Preventing Transient Loops and Packet Duplication . . . . . 4
3 BGP Encoding . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1 DF Election Attribute . . . . . . . . . . . . . . . . . . . 5
3.2 Inter-chassis Communication Attribute . . . . . . . . . . . 5
4. DF Election with Paxos Algorithm . . . . . . . . . . . . . . . 6
5 LACP State Synchronization . . . . . . . . . . . . . . . . . . 7
6 Subscriber/Session State Synchronization . . . . . . . . . . . 8
7 Security Considerations . . . . . . . . . . . . . . . . . . . . 8
8 IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 8
9 References . . . . . . . . . . . . . . . . . . . . . . . . . . 9
9.1 Normative References . . . . . . . . . . . . . . . . . . . 9
9.2 Informative References . . . . . . . . . . . . . . . . . . 9
Author's Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9
Sajassi et al. Expires January 17, 2013 [Page 2]
INTERNET DRAFT draft-sajassi-l2vpn-evpn-segment-route July 16, 2012
1 Introduction
[E-VPN] defines a solution and architecture for BGP MPLS-based
Ethernet L2VPN services with advanced multi-homing capabilities. In
this draft we define procedures and extensions that enhance the
multi-homing capabilities of the E-VPN solution in the following
areas:
- Preventing transient loops and packet duplication
- Support of multi-chassis Ethernet bundles
- Avoiding relearning of subscriber/session state
Two new BGP route attributes are defined: the DF Election attribute
and the Inter-chassis Communication attribute.
Section 2 discusses the motivation and usage of the new attributes.
Section 3 describes the BGP encoding.
1.1 Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
2 Motivation and Usage
This section focuses on the reasons for defining the 2 BGP
attributes, and describes their usage in E-VPN.
2.1 Support of Multi-Chassis Ethernet Bundles
When a CE is multi-homed to a set of PE nodes using the [802.1AX]
Link Aggregation Control Protocol (LACP), the PEs must act as if they
were a single LACP speaker for the Ethernet links to form a bundle,
and operate correctly as a Link Aggregation Group (LAG). To achieve
this, the PEs connected to the same multi-homed CE must synchronize
LACP configuration and operational data among them. The
synchronization is required for the following reasons:
- to determine if the links in the Ethernet bundle are to operate in
all-active or hot-standby resiliency mode
- to detect and handle CE mis-configuration when LACP Port Key is
configured on the PE
- to detect and handle mis-wiring between CE and PE when LACP Port
Key is configured on the PE
Sajassi et al. Expires January 17, 2013 [Page 3]
INTERNET DRAFT draft-sajassi-l2vpn-evpn-segment-route July 16, 2012
- to deterministically agree on which link(s) should join a bundle
based on port and system priorities, especially when the number of
links exceeds the aggregation capacity of the PEs, and the PE LACP
System Priority is higher than the CE's
- to detect and react to actor/partner churn where the LACP speakers
are not able to converge
Synchronization of LACP state between PEs is performed using the
Inter-chassis Communication attribute carried in the Ethernet Segment
route, as described in the 'LACP State Synchronization' section
below.
2.2 Avoiding Relearning of Subscriber/Session State
For certain applications, the PE builds and maintains per subscriber
or per session 'soft' state that is used for either optimizing the
traffic forwarding or enforcing security. Examples of such per
subscriber/session state includes:
- multicast state derived from IGMP or PIM snooping
- IP address to MAC address bindings gleaned from snooping ARP and/or
DHCP packets, and used to prevent address spoofing or masquerading
When a set of PE nodes provides multi-homed connectivity for an
Ethernet Segment, this 'soft' state is built on the active PE node
that forwards and snoops the relevant protocol packets. In case of a
link or node failure, the state must be reconstructed on the backup
PE (e.g. by waiting for the next IGMP query or ARP message or by
issuing unsolicited queries). This may cause traffic disruption and
affect the availability of the service. Alternatively, the state can
be synchronized among the PE nodes via BGP, and that would enhance
the convergence of the service after failure.
Synchronization of subscriber/session state between PE nodes is
performed using the Inter-chassis Communication attribute carried in
the Ethernet Segment route, as described in the 'Subscriber/Session
State Synchronization' section below.
2.3 Preventing Transient Loops and Packet Duplication
During routing transients, different PEs may end up electing
different DFs for the same Ethernet Segment due to inconsistent views
of the network. If the Ethernet Segment is a multi-homed device, this
may lead to transient packet duplication. If the Ethernet Segment is
a multi-homed network, the presence of multiple DFs may lead to
transient forwarding loops in addition to potential packet
Sajassi et al. Expires January 17, 2013 [Page 4]
INTERNET DRAFT draft-sajassi-l2vpn-evpn-segment-route July 16, 2012
duplication.
To eliminate these issues, an optional handshake mechanism is defined
to ensure that the PE nodes connected to the same Ethernet Segment
share a common view of the access network topology. This handshake is
performed using the DF Election attribute carried in the Ethernet
Segment route, as discussed in Appendix I: 'DF Election with Paxos
Algorithm'.
3 BGP Encoding
This section defines the encoding of the BGP attributes.
3.1 DF Election Attribute
+---------------------------------------+
| State (2 octets) |
+---------------------------------------+
| Sequence No. (4 octets) |
+---------------------------------------+
| Local No. of links (2 octets) |
+---------------------------------------+
| Total No. of links (2 octets) |
+---------------------------------------+
| Flags (1 octet) |
+---------------------------------------+
| No. of IP addresses (1 octet) |
+---------------------------------------+
| Ordered list of tuples: |
| [IP address Length (1 octet), |
| IP Address (4 or 16 bytes)]| |
+---------------------------------------+
State field can take one of the following values:
0x0000 Initializing
0x0001 Proposal Pending
0x0002 Promise Pending
0x0003 Active
Flags field is encoded as follows:
7 bits: reserved
Least significant bit: Protecting flag
3.2 Inter-chassis Communication Attribute
Sajassi et al. Expires January 17, 2013 [Page 5]
INTERNET DRAFT draft-sajassi-l2vpn-evpn-segment-route July 16, 2012
+---------------------------------------+
| Type (2 octets) |
+---------------------------------------+
| Length (1 or 2 octets) |
+---------------------------------------+
| Opaque (var) |
+---------------------------------------+
4. DF Election with Paxos Algorithm
The procedures in this section guarantee that all PE nodes in a given
redundancy group agree on a unique DF for a given Ethernet Segment.
This eliminates the problem of transient forwarding loops and
transient packet duplicates described above. The procedures can be
broken down to the following steps:
1. When a PE discovers the ESI of the attached Ethernet Segment, it
advertises an Ethernet Segment route with the associated ES-Import
extended community attribute and with the 'Initializing' code in the
State field of the DF Election attribute.
2. The PE then starts a timer to allow the reception of Ethernet
Segment routes from other PE nodes in the same redundancy group.
3. When the timer expires, each PE builds an ordered list of the IP
addresses of all the PE nodes connected to the Ethernet Segment
(including itself), in increasing numeric value.
4. The first PE in the ordered list then elects itself as the Arbiter
Node (AN). It initiates the handshake by sending an Ethernet Segment
route with 'Proposal Pending' code in the State field of the DF
Election attribute.
5. When a PE node receives an Ethernet Segment route with the
'Proposal Pending' code, it takes one of the following options:
a. If the receiving PE ranks the transmitting PE's IP address as
the top entry in its local ordered list, it acknowledges the
handshake by responding with an Ethernet Segment route with the
'Promise Pending' code in the State field of the DF Election
attribute. This includes the scenario where the receiving PE
forfeits the AN role to another advertising PE with a numerically
lower IP address.
b. If the receiving PE does not rank the transmitting PE's IP
address as the top entry in its local ordered list, and the
receiving PE had advertised an Ethernet Segment route with the
'Initializing' code or with the 'Proposal Pending' code, then the
Sajassi et al. Expires January 17, 2013 [Page 6]
INTERNET DRAFT draft-sajassi-l2vpn-evpn-segment-route July 16, 2012
PE takes no further action.
6. When the AN receives 'Promise Pending' from all of the PE nodes in
the ordered list, it sends an updated Ethernet Segment route with the
'Active' code in the DF Election attribute.
7. When the other PE nodes in the redundancy group receive the
'Active' code from the AN, they respond with an updated Ethernet
Segment route with the 'Active' code in the DF Election attribute.
This concludes the handshake.
In the case where the DF election is performed at the granularity of
an Ethernet Segment, i.e. there is a single DF for all VLANs on the
segment, the Arbiter Node is effectively the Designated Forwarder for
the segment. All the PE nodes start off with their ports, that are
connected to the segment, blocked in Step 1 (for multi-destination
traffic from core). And in Step 6, the PE confirmed as the AN (i.e.
DF) unblocks its port towards the Ethernet Segment. DF election at
the granularity of (Ethernet Segment, VLAN) is discussed in the "VLAN
Carving" section below.
5 LACP State Synchronization
To support CE multi-homing with multi-chassis Ethernet bundles, the
PE nodes connected to a given CE should synchronize [802.1AX] LACP
state amongst each other. This includes the following LACP specific
configuration parameters:
- System Identifier (MAC Address): uniquely identifies a LACP
speaker.
- System Priority: determines which LACP speaker's port priorities
are used in the Selection logic.
- Aggregator Identifier: uniquely identifies a bundle within a LACP
speaker.
- Aggregator MAC Address: identifies the MAC address of the bundle.
- Aggregator Key: used to determine which ports can join an
Aggregator.
- Port Number: uniquely identifies an interface within a LACP
speaker.
- Port Key: determines the set of ports that can be bundled.
- Port Priority: determines a port's precedence level to join a
bundle in case the number of eligible ports exceeds the maximum
number of links allowed in a bundle.
The above information must be synchronized between the PE nodes
wishing to form a multi-chassis bundle with a given CE, in order for
Sajassi et al. Expires January 17, 2013 [Page 7]
INTERNET DRAFT draft-sajassi-l2vpn-evpn-segment-route July 16, 2012
the former to convey a single LACP peer to that CE. This is required
for initial system bring-up and upon any configuration change.
Furthermore, the PEs must also synchronize operational (run-time)
data, in order for the LACP Selection logic state-machines to
execute. This operational data includes the following LACP
operational parameters, on a per port basis:
- Partner System Identifier: this is the CE System MAC address.
- Partner System Priority: the CE LACP System Priority
- Partner Port Number: CE's AC port number.
- Partner Port Priority: CE's AC Port Priority.
- Partner Key: CE's key for this AC.
- Partner State: CE's LACP State for the AC.
- Actor State: PE's LACP State for the AC.
- Port State: PE's AC port status.
The above state needs to be communicated between PEs forming a
multi-chassis bundle during LACP initial bring-up, upon any
configuration change and upon the occurrence of a failure.
It should be noted that the above configuration and operational state
is localized in scope and is only relevant to PEs within a given
Redundancy Group, i.e. which connect to the same Ethernet Segment
over a given Ethernet bundle. Furthermore, the communication of state
changes, upon failures, must occur with minimal latency, in order to
minimize the switchover time and consequent service disruption.
Without synchronization of the above parameters, the system is
subject to the issues outlined in section 2.2 above.
6 Subscriber/Session State Synchronization
Synchronization of subscriber/session state between PE nodes is
performed using the Inter-chassis Communication attribute carried in
the Ethernet Segment route. The various applications are responsible
for the encoding and decoding of the relevant data, and this is
outside the scope of this draft. BGP provides a reliable transport
service in this case.
7 Security Considerations
There are no additional security aspects beyond those of VPLS/H-VPLS
that need to be considered.
8 IANA Considerations
Sajassi et al. Expires January 17, 2013 [Page 8]
INTERNET DRAFT draft-sajassi-l2vpn-evpn-segment-route July 16, 2012
To be added in a later revision.
9 References
9.1 Normative References
[RFC2119] S. Bradner, "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
9.2 Informative References
[E-VPN] Aggarwal et al., "BGP MPLS Based Ethernet VPN", draft-
raggarwa-sajassi-l2vpn-evpn-02.txt, work in progress,
March, 2011.
[EVPN-REQ] Sajassi et al., "Requirements for Ethernet VPN (E-VPN)",
draft-sajassi-raggarwa-l2vpn-evpn-req-00.txt, work in
progress, October, 2010.
Author's Addresses
Ali Sajassi
Cisco
170 West Tasman Drive
San Jose, CA 95134, US
Email: sajassi@cisco.com
Samer Salam
Cisco
595 Burrard Street, Suite 2123
Vancouver, BC V7X 1J1, Canada
Email: ssalam@cisco.com
Sami Boutros
Cisco
170 West Tasman Drive
San Jose, CA 95134, US
Email: sboutros@cisco.com
Keyur Patel
Cisco
170 West Tasman Drive
San Jose, CA 95134, US
Sajassi et al. Expires January 17, 2013 [Page 9]
INTERNET DRAFT draft-sajassi-l2vpn-evpn-segment-route July 16, 2012
Email: keyupate@cisco.com
Sajassi et al. Expires January 17, 2013 [Page 10]