Internet DRAFT - draft-bryant-perlman-trill-pwe-encap
draft-bryant-perlman-trill-pwe-encap
Network Working Group S. Bryant
Internet-Draft Cisco Systems
Expires: April 19, 2006 R. Perlman
Sun Microsystems
A. Atlas
Google
D. Fedyk
Nortel Networks
October 16, 2005
TRILL using Pseudo-Wire Emulation (PWE) Encapsulation
draft-bryant-perlman-trill-pwe-encap-00
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on April 19, 2006.
Copyright Notice
Copyright (C) The Internet Society (2005).
Abstract
A new layer of encapsulation is required with RBridges. This layer
must contain at least a time-to-live and an RBridge identifier field.
This document proposes that the reuse of the encapsulation defined by
Bryant, et al. Expires April 19, 2006 [Page 1]
Internet-Draft draft-bryant-perlman-trill-pwe-encap-00 October 2005
PWE3 for encapsulation of Ethernet frames over an MPLS packet
switched network.
Table of Contents
1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Forwarding Considerations . . . . . . . . . . . . . . . . . . 4
2.1. Forwarding Table Population . . . . . . . . . . . . . . . 5
2.2. QoS Treatment . . . . . . . . . . . . . . . . . . . . . . 5
2.3. Load Balancing . . . . . . . . . . . . . . . . . . . . . . 6
2.4. Multicast and Broadcast Frames . . . . . . . . . . . . . . 6
3. Dynamic Assignment of 19-bit Nicknames . . . . . . . . . . . . 7
4. Security Considerations . . . . . . . . . . . . . . . . . . . 8
5. References . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9
Intellectual Property and Copyright Statements . . . . . . . . . . 10
Bryant, et al. Expires April 19, 2006 [Page 2]
Internet-Draft draft-bryant-perlman-trill-pwe-encap-00 October 2005
1. Motivation
The TRILL encapsulation requires a TTL and an RBridge ID, which could
be the ingress or the egress depending upon the particular packet.
There are four encapsulation mechanism that TRILL could use:
a. It could design its own encapsulation from scratch.
b. It could use an Ethernet based encapsulation.
c. It could use an IP based encapsulation.
d. It could use an MPLS based encapsulation.
Adding, or removing an encapsulation, or forwarding a packet based on
an encapsulation is one of the most time critical operation in any
networking equipment, and usually requires hardware support. The use
of a new network encapsulation type is always problematic because new
hardware is usually required. This is expensive to design and
deploy, and frequently has a significant time and risk impact on the
market acceptance of a new network architecture. The use of a new,
TRILL specific, encapsulation should therefore, if possible, to be
avoided.
TRILL could opt to use an Ethernet based encapsulation. The nesting
of 802.x tags is a well understood technology and suitable hardware
is widely deployed. However the absence of a TTL field in the header
means that a controlled convergence technology needs to be used
[CCONV] to avoid the collateral damage caused by microlooping packets
during network convergence. Although convergence control
technologies are now available, they are not well understood by the
networking industry, and their use by TRILL may not be accepted by
the industry.
TRILL could use an IP encapsulation, but using an IP header for this
purpose has issues (see Section 5.5 in [RBRIDGE]). Such issues
include the encapsulation overhead, the complexity of providing L2
services within the L3 subnet, and the additional potential work for
fragmentation and reassembly.
The simplest existing encapsulation that meets the TRILL requirement
is that defined by PWE3 for the encapsulation of Ethernet frames over
an MPLS packet switched network [PWE3-ETHER]. The forwarding
functionality required by TRILL is very similar to that needed to
implement virtual private lan service (VPLS [VPLS]). Equipment
capable of encapsulating Ethernet packets for carriage over an MPLS
core is widely available, and the modifications necessary to support
TRILL would reside primarily in the control plane.
Bryant, et al. Expires April 19, 2006 [Page 3]
Internet-Draft draft-bryant-perlman-trill-pwe-encap-00 October 2005
The encapsulation described in [PWE3-ETHER] consists of an MPLS label
stack [RFC3032] plus an OPTIONAL four byte control word. At least
one MPLS label stack entry (LSE) will be present in the TRILL packet.
In addition to containing the label (delivery address), the LSE also
contains the TTL field required by TRILL, and a QoS field (exp bits)
that may also be of use.
The control word carries some information that prevents the packet
being mistaken for an IP packet in an MPLS network and incorrectly
being subjected to ECMP. This functionality is not required in a
TRILL network. The control word also contains a sequence number
which is used to prevent the out of order delivery of PWE3 Ethernet
payloads. If order preservation is required the control word MUST be
used, otherwise a TRILL implementation MAY omit the PWE3 control
word.
The use of the PWE3 Ethernet over MPLS encapsulation by TRILL would
facilitate the integration of TRILL and MPLS networking.
2. Forwarding Considerations
As described in Section 3, each RBridge can obtain two 19-bit
nicknames. The first nickname can be used for the RBridge when
unicast traffic is directed to it; it is the egress RBridge nickname.
The second nickname can be used for multicast and broadcast traffic
from the RBridge; it will be the ingress RBridge nickname.
An MPLS shim header contains a 20-bit label field. The same format
can be used for the TRILL shim header; the labels will be distributed
via the link-state protocol used between RBridges; those labels will
be unique within this RBridge network instance. The Ethertype will
indicate that it is a TRILL frame; this will be used to provide the
correct forwarding context for the label space. The bottom-most bit
of the label field can indicate whether the top 19 bits indicate a
unicast nickname or a multicast and broadcast nickname. The
forwarding behavior will differ based upon this.
In the unicast case, when an Ethernet frame is received without the
new TRILL ethertype, the ingress RBridge will lookup the egress
RBridge, as specified in [RBRIDGE], and obtain its egress RBridge
nickname. The ingress RBridge will also determine if the Ethernet
frame has a priority specified as in 802.1p and will extract that
3-bit priority field. Then the original Ethernet frame will be
encapsulated as follows:
Bryant, et al. Expires April 19, 2006 [Page 4]
Internet-Draft draft-bryant-perlman-trill-pwe-encap-00 October 2005
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Egress Nickname |0| Exp |S| TTL |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| Received Ethernet Frame ///
/// |
| |
+---------------------------------------------------------------+
Exp: Indicates Priority
S: Bottom of Stack, 1 bit
TTL: Time to Live, 8 bits
Figure 1: Unicast Encapsulation
Traditional bridges avoid misordering; it is an Ethernet invarient.
During a traditional network convergence using a link-state protocol,
it is possible for packets to be misordered. The PWE3 control word
can be used for this purpose with pseudo-wires (Section 3.7 in [PWE3-
ETHER]); such use might require too much hardware state due to the
desired load-balancing of flows.
This gives the encapsulated frame the same format as an Ethernet
pseudo-wire [PWE3-ETHER]. The forwarding path can be exactly the
same as that used for an Ethernet pseudo-wire.
2.1. Forwarding Table Population
When an RBridge X learns a new egress nickname A, on each interface,
the top 19 bits of the label are filled out with the new nickname and
the bottom bit (the unicast/other) is set to 0; an insegment for that
label is created (usually by adding an entry into the input label
mapping (ILM) table.) A corresponding outsegment is installed for
each interface that is on the shortest path tree from the RBridge X
to the RBridge indicated by A. That out-segment does a label swap
operation, where the label swapped to is the same constructed label.
The created in-segment is connected to the created out-segments with
load balancing specified; only one out-segment will be used for a
particular frame.
2.2. QoS Treatment
The encapsulation preserves the priority, if specified, of the frame
without requiring intermediate RBridges to examine the encapsulated
frame. The ingress RBridge extracts the priority from the 802.1p
Bryant, et al. Expires April 19, 2006 [Page 5]
Internet-Draft draft-bryant-perlman-trill-pwe-encap-00 October 2005
field and stores that in the EXP field of the shim header.
When an RBridge adds the outer Ethernet frame to an TRILL
encapsulated frame, the RBridge can specify an 802.1p field with a
priority equal to that stored in the EXP field of the shim header.
If the EXP field is 0, then no 802.1p field is necessary.
2.3. Load Balancing
Load balancing between multiple equal cost paths is a concern for
RBridges. To properly load balance TRILL encapsulated frames, an
RBridge should identify TRILL encapsulated frames and implement a
specific hashing algorithm for this ethertype. A specific Ethertype
would be used for TRILL frames, making them trivial to identify.
The load balancing that would be provided by current mechanisms is
not sufficient. Without the PWE3 control word, either the TRILL
encapsulated frame would appear as non-IP and would be load balanced
based on a hash of the label stack (known as LABEL ECMP [MPLS-ECMP])
or it would be mis-identified as IP and load balanced based on the
bits located where IP addresses would be if the encapsulated Ethernet
frame were an IP packet. The former case would provide no flow
diversity, since all TRILL encapsulated frames would have the same
label, corresponding to the same egress RBridge nickname. The latter
case could risk packet re-ordering. Current mechanisms seeing the
PWE3 control-word would use LABEL EMP and thus provide no flow
diversity.
2.4. Multicast and Broadcast Frames
For multicast/broadcast frames, the ingress RBridge nickname
indicates the spanning tree which should be used. As with the
unicast case, a label is formed of the nickname field and the
unicast/other field (label[19:1] = nickname[18:0] and label[0] = 1).
The treatment of the TTL field and the EXP fields are the same.
When an RBridge learns of a new ingress RBridge nickname, an ILM
entry corresponding to the label is created. An out-segment is
created for each interface that is in the SPT rooted at the ingress
RBridge. The in-segment is connected to the created out-segments
with multicasting specified; subject to filtering, each frame will be
sent out each out-segment. Except for the egress filtering, the
above forwarding behavior is already part of MPLS; it is used to
support point-to-multipoint MPLS LSPs.
Filtering may be applied based upon the frame and the outgoing
interface's membership. For instance, if a frame is being broadcast
along a VLAN and an interface is marked as not being connected to any
Bryant, et al. Expires April 19, 2006 [Page 6]
Internet-Draft draft-bryant-perlman-trill-pwe-encap-00 October 2005
bridges or RBridges with VLAN membership, then the frame need not be
sent out that interface. Similarly, if a frame is being multicasted,
the RBridge could decide to filter the frame if the interface is
explicitly known to not be part of the multicast tree.
3. Dynamic Assignment of 19-bit Nicknames
We assume each RBridge has a unique 6-byte system ID, which it uses
as its IS-IS ID. In order to use the compressed MPLS-like encoding
of the shim header, we need to create an identifier which is 19-bits.
This gives a space of half a million nicknames, large enough that
there will be enough nicknames. We do, however, need a method for
assigning nicknames to RBridges so that the nicknames are unique
within the RBridge domain.
We will assign a new type value to be carried in LSPs. The TLV will
carry the nickname the LSP source wishes to use. The TLV will be:
+------+--------+-----------------------+
| type | length | value=19 bit nickname |
+------+--------+-----------------------+
Figure 2: Nickname TLV
Each RBridge chooses its own nickname. However, each RBridge is also
responsible for ensuring that its nickname is unique. If R1 chooses
nickname x, and R1 discovers, through receipt of R2's LSP, that R2
has also chosen x, then the RBridge with the lower system ID keeps
the nickname, and the other one must choose a new nickname.
If two RBridge domains merge, then there might be a lot of nickname
collisions for a short time, but as soon as each side receives the
link state packets of the other, the RBridges that need to change
nicknames will quickly become aware of this, and choose new nicknames
that do not, to the best of their ability, collide with any existing
nicknames.
To minimize the probability of nickname collisions, each RBridge
chooses its nickname randomly from the set of assigned nicknames.
Alternatively, we could use some sort of hash algorithm (such as the
bottom 19 bits of the MD5 of the RBridge's system ID), to choose the
first nickname, and then if there is a collision, go to the next 19
bits of the MD5, and so on, until all 128 bits of the MD5 hash are
exhausted, in which case the RBridge hashes its own system ID again,
this time together with the constant "1".
There is no reason for all RBridges to use the same algorithm for
Bryant, et al. Expires April 19, 2006 [Page 7]
Internet-Draft draft-bryant-perlman-trill-pwe-encap-00 October 2005
choosing nicknames. Picking them at random, or using a hash, are an
attempt to avoid collisions when the network starts up, but that is
only an optimization. Even if all RBridges used the same algorithm,
say as a worst case, they all start with "1" and count up
sequentially until they find an uncontested nickname, the network
will eventually stabilize. And once it is stable, nicknames should
remain stable even as routers go up or down.
To minimize the probability of a new RBridge usurping a nickname
already in use, an RBridge should wait to acquire the link state
database from a neighbor before it announces its own nickname.
4. Security Considerations
The security implications of selecting this format have not yet been
considered.
5. References
[CCONV] Bryant, S. and M. Shand, "Applicability of Loop-free
Convergence", draft-bryant-shand-lf-conv-frmwk-00.txt
(work in progress), June 2005.
[MPLS-ECMP]
Swallow, G., Bryant, S., and L. Andersson, "Avoiding Equal
Cost Multipath Treatment in MPLS Networks",
draft-ietf-mpls-ecmp-bcp-01.txt (work in progress),
July 2005.
[PWE3-ETHER]
Martini, L., Rosen, E., and G. Heron, "Encapsulation
Methods for Transport of Ethernet Over MPLS Networks",
draft-ietf-pwe3-ethernet-encap-10.txt (work in progress),
June 2005.
[RBRIDGE] Perlman, R., Touch, J., and A. Yegin, "RBridges:
Transparent Routing", draft-perlman-rbridge-03.txt (work
in progress), May 2005.
[RFC3032] Rosen, E., Tappan, D., Fedorkow, G., Rekhter, Y.,
Farinacci, D., Li, T., and A. Conta, "MPLS Label Stack
Encoding", RFC 3032, January 2001.
[VPLS] Lasserre, M. and V. Kompella, "Virtual Private LAN
Services over MPLS", draft-ietf-l2vpn-ldp-07.txt (work in
progress), July 2005.
Bryant, et al. Expires April 19, 2006 [Page 8]
Internet-Draft draft-bryant-perlman-trill-pwe-encap-00 October 2005
Authors' Addresses
Stewart Bryant
Cisco Systems
250, Longwater, Green Park
Reading RG2 6GB
United Kingdom
Email: stbryant@cisco.com
Radia Perlman
Sun Microsystems
Email: Radia.Perlman@sun.com
Alia K. Atlas
Google
1600 Amphitheatre Parkway
Mountain View, CA 94043
USA
Email: akatlas@alum.mit.edu
Don Fedyk
Nortel Networks
600 Technology Park
Billerica, MA 01821
USA
Phone: +1 978 288 3041
Email: dwfedyk@nortelnetworks.com
Bryant, et al. Expires April 19, 2006 [Page 9]
Internet-Draft draft-bryant-perlman-trill-pwe-encap-00 October 2005
Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Disclaimer of Validity
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Copyright Statement
Copyright (C) The Internet Society (2005). This document is subject
to the rights, licenses and restrictions contained in BCP 78, and
except as set forth therein, the authors retain all their rights.
Acknowledgment
Funding for the RFC Editor function is currently provided by the
Internet Society.
Bryant, et al. Expires April 19, 2006 [Page 10]