Internet DRAFT - draft-gibanez-trill-abridge
draft-gibanez-trill-abridge
TRILL WG Guillermo Ibanez
Internet Draft Alberto Garcia
Expires: Dec 2006 Arturo Azcorra
June 6, 2006
ABridges as RBridges: Transparent Routing with Simplified
Multiple Spanning Trees.
draft-gibanez-trill-abridge-01.txt
Status of this Memo
By submitting this Internet-Draft, each author represents
that any applicable patent or other IPR claims of which he or
she is aware have been or will be disclosed, and any of which
he or she becomes aware will be disclosed, in accordance with
Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet
Engineering Task Force (IETF), its areas, and its working
groups. Note that other groups may also distribute working
documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of
six months and may be updated, replaced, or obsoleted by
other documents at any time. It is inappropriate to use
Internet-Drafts as reference material or to cite them other
than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed
at
http://www.ietf.org/shadow.html
This Internet-Draft will expire on Dec 16, 2006.
Abstract
RBridges are link layer devices that use routing protocols as
a control plane but do not target to scale up to large campus
networks. This document contains an alternative proposal to
link-state RBridges, named ABridges. ABridges overcome
RBridges L2 network size restrictions allowing applicability
to very large Ethernet campus networks while maintaining zero
configuration and high performance, by assuming a topological
restriction that is automatically performed. The proposal
includes a two-layered network architecture with two
hierarchical independent spanning tree layers. Expected
convergence is fast, probably below two seconds.
G. ibanez Informational Expires May 2006 1
INTERNET DRAFT abridge June 6, 2006
ABridges use multiple simplified spanning trees rooted at
core edge bridges to achieve results comparable to RBridges
with lower computational complexity. Two implementation
variants of simplified multiple spanning trees are proposed:
The first one is a fundamental simplification of the standard
Multiple Spanning Tree protocol and the second one (still in
a very preliminary stage) consists of an N-multiple
simultaneous execution of the Rapid Spanning Tree protocol at
each RBridge.
An optional mechanism of ARP/ABridge servers/registrars (with
load splitting) is proposed to limit ARP traffic in large
scale Ethernet networks and to enhance scalability and
security. This mechanism can also be used for host-Designated
RBridge resolution as an alternative to the interchange of
Hosts Lists between RBridges.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as
described in RFC-2119 [1].
Table of Contents
1.Introduction...............................................2
2.Terminology................................................4
3. Network Architecture......................................4
4. Protocols.................................................5
4.1 RSTP Protocol............................................5
4.2 MSTP Protocol............................................6
4.3 Core Layer: AMSTP Protocol ..............................6
4.4 AMSTP versus MSTP........................................8
4.5 Designated (and Root) ABridge............................9
4.6 Forwarding Scenario.....................................10
4.7 Learning End Node Location..............................12
4.8 Routing versus Learning Bridges Addresses...............12
4.9 Header on 802 Links.....................................12
4.10 Distributed ARP Query..................................13
4.11 ABridge Identities and Addresses.......................13
5. ARP/ABridge Server/Registrars............................12
6. Issues...................................................13
6.1 Per Ingress Spanning Tree...............................14
6.2 Symmetrical Path Problem................................14
6.3 Traffic Aggregation at Root bridge......................14
6.4 VLANs ..................................................14
6.5 Optimizing ARP/ND.......................................14
7. Security Considerations..................................15
8. IANA Considerations......................................15
9.NRSTP.....................................................15
10.Conclusions..............................................15
11.Acknowledgments..........................................15
12.References...............................................15
Author's Addresses..........................................16
Intellectual Property Statement.............................16
Disclaimer of Validity......................................17
Copyright Statement.........................................17
G.Ibanez Informational expires Dec 6, 2006 2
INTERNET DRAFT abridge June 6, 2006
1. Introduction
Current IP-based campus networks use one prefix address per
link to support routing. This implies administration and
configuration of IP addresses. IP addresses are link-related,
so the IP address of an end node varies when the point of
attachment to the network changes.
Bridges do not require this kind of configuration because
they forward in the switched domain using flat layer 2
addresses. However standard bridge protocols do not scale,
because the spanning tree protocol only enables some selected
links to prevent loops, and network utilization is therefore
low. Also the routes along the spanning tree are not pair-
wise shortest paths, and temporary loops may produce packet
proliferation across the entire switched domain.
RBridges have been proposed as a hybrid of routers and
bridges, showing the advantages of routers while preserving
at the same time the zero configuration capability of
bridges.
However RBridges currently do not fulfill an important
requirement such as scaling to large Ethernet campus
networks. The importance of this requirement is growing with
the increasing size of campus networks and the foreseeable
increase in connected devices (displays, IP phones, cameras,
802.11 PDAs, sensors, etc). This lack of scalability derives
from the use of flat MAC addresses to perform routing. Being
non aggregatable, MAC addresses will produce long tables in
RBridges when used in large campus networks. Another
potential weakness of RBridges is that, while exhibiting
unrestricted topological compatibility with standard bridges,
RBridges depend on the bridged links to communicate among
themselves and to perform the IS-IS Designated Router
election. This dependency increases their complexity and
makes the whole system vulnerable to inter RBridge
communication problems. The overall convergence time is
increased because the spanning tree convergence time adds up
to the IS-IS DR election time.
This draft proposes an architecture for Ethernet campus
networks based on a new type of Ethernet hierarchical
switches for campus cores. The architecture is oriented to
provide high performance, minimal configuration, and
scalability in very large Ethernet campus networks. The
proposed network architecture consists of a high capacity
core composed of an arbitrary mesh of switches named
ABridges, and a number of access networks with standard
bridges connected to the core.
The document proposes an alternative implementation for
RBridges [10] (Routing Bridges), identified as AMSTP Bridges
(ABridges) that combine the advantages of bridges and
routers. Like bridges and RBridges, ABridges require zero
configuration and are transparent to IP nodes. ABridges also
forward on pair-wise shortest paths like routers as RBridges
do.
G.Ibanez Informational expires Dec 6, 2006 3
INTERNET DRAFT abridge June 6, 2006
We propose to use multiple L2 spanning trees between ABridges
to forward via shortest paths in the core of the campus
network. The AMSTP protocol is a simplification of the
standard MSTP protocol, oriented to zero configuration. The
core edge bridges provide backbone connectivity to lower
layer (Access Layer) networks. The active topology of the
Access Layer networks consists of standard spanning trees of
switches (RSTP/STP). Each Edge Switch acts as the root bridge
of two independent spanning trees: the spanning tree of its
lower layer Access network, and one spanning tree instance of
the core network. The architecture provides shortest paths in
most traffic situations for client-server traffic (for
servers located in a server farm) and adapts well to traffic
aggregation. Additional mechanisms can be designed to achieve
high network availability.
Due to the access port mode, ABridges are compatible with
current bridges as well as current IPv4 and IPv6 routers and
end nodes. They are as transparent to current IP routers as
bridges and RBridges are. Like routers, they terminate a
bridged spanning tree.
Packets in the Core of ABridges must be encapsulated such
that:
- Forwarding is performed in the Core across per egress
bridge tree instances, while maintaining the original L2
header so that end destination bridges can learn about the
location of the source by learning the source address from
packets.
- ABridges can learn the location of end nodes. They can
learn the location and layer 2 addresses of attached nodes
from the source address of data packets, as bridges and
RBridges. However, very large campus networks with tens of
thousands of nodes may require more scalable and safer
solutions for locating end nodes. For this case, the use of
ARP/Abriges Server/Registrars is proposed.
Support of VLANs traditionally requires configuration of the
bridges to know which ports and links belong to which VLANs.
In order to achieve true zero configuration, we recommend
that bridges do not separate per VLAN traffic in the campus
core, and do not use a separate spanning tree for each
broadcast domain. In a campus without VLANs, this means a
single spanning tree would be used for delivery of packets
with unknown or layer 2 group address layer 2 destinations.
ABridges can suppress the broadcast/multicast for Neighbour
discovery by using ARP servers/registrars or, similarly to
RBridges, by conventional proxy ARP (IPv4) or proxy ND
(IPv6).
ABridges are fully compatible with current IPv4 and IPv6
routers and end nodes. They are as invisible to current IP
G.Ibanez Informational expires Dec 6, 2006 4
INTERNET DRAFT abridge June 6, 2006
routers as bridges are, and they participate in two bridged
hierarchically linked but separated spanning trees.
2. Terminology
AMSTP: Alternative Multiple Spanning Tree Protocol
ABridge: An RBridge implemented as an AMSTP Bridge
Access network: subnetwork of standard bridges connected to
an ABridge.
ARP/ABridge Server/Registrar: Server that provides ARP
resolution and the ID of a destination hosts Designated
ABridge (see DR).
Campus network: set of network elements (standard bridges and
ABridges) connected to one or more routers.
Core: set of ABridges directly interconnected through point
to point links.
Core port: The port of an ABridge connected to another
ABridge through a point to point link.
Access port: The port of an ABridge connected to a link that
has active standard bridges connected. It executes the
standard spanning tree protocol and provides connection to
the Access Network.
DR: Designated RBridge. In the context of an ABridge, it
means the Designated ABridge that coincides with the STP/RSTP
Root bridge of the Access network.
MSTP: Multiple Spanning Tree Algorithm and Protocol.
NRSTP: Variant implementation of AMSTP through execution of N
independent RSTP instances.
RBridge: Routing Bridge as defined by Radia Perlman and TRILL
WGs proposal.
RSTP: Rapid Spanning Tree Algorithm and Protocol
3. Network Architecture
Campus network designs are currently based on a layered
architecture (core, distribution and access layers) to obtain
network scalability and predictability. Segmentation of
networks is obtained using routers or devices called
multilayer switches that segment the network in IP segments
or subnets.
A similar approach is proposed here, but with the network
segmentation performed at layer 2 instead of layer 3. The
G.Ibanez Informational expires Dec 6, 2006 5
INTERNET DRAFT abridge June 6, 2006
proposed network architecture is shown in Figure 1. It uses a
two-layer hierarchical L2 network to achieve scalability to
large scale Ethernet networks. The upper layer acts as a
Core-Distribution Layer (Core) and the lower Layer acts as an
Access Layer. The core layer uses the AMSTP protocol for
interconnection between core ABridges while the Access Layer
uses the standard spanning tree protocol (RSTP or STP) to
connect hosts of the access network with other hosts via
their root bridge at the core (ABridge).
The ABridges constitute the core network and are
interconnected by dedicated links. The point to point link
requirement derives from the need for fast convergence of
standard layer 2 spanning tree algorithms, but it is also
required for high performance and enhanced security (802.1X).
Thus, point to point links are becoming a requirement for
Ethernet networks, at least at the core and distribution
layers.
Other bridges connect to ABridges without requiring a point
to point connection, and form the Access Layer. The Access
Layer is segmented in multiple access networks. Each Access
network is formed by devices connected to a core ABridge; it
may have arbitrary topologies but the active topology will
use the standard spanning tree as the basic forwarding
mechanism. More sophisticated protocols are possible for
better infrastructure usage inside each Access network, but
they are out of the scope of this proposal.
---------
| network|
/ ---------
/
A -----A
/ \ / \ Core layer
/ \ / \
A------A-----A
/ \ \
-------- / \ \-----------
|network| \ | network | Access Layer
-------- \ -----------
\ ---------
| network |
----------
A: ABridge
Figure 1. Campus network topology
ABridges must auto-configure ports to participate in the Core
or in the Access network. The port reconfiguring mechanism is
as follows: a port that is not connected using a point to
G.Ibanez Informational expires Dec 6, 2006 6
INTERNET DRAFT abridge June 6, 2006
point link to another ABridge configures itself as an access
port (an ingress and egress point for traffic to/from the
core). Ports directly connected to another ABridge act as
core ports. The auto-configuration of ports works as follows:
each port detects, through the STP BPDU type (STP, RSTP or
AMSTP) received on their link upon initialization, whether
the device connected to the link is a standard bridge or an
ABridge. If the BPDUs received are standard 802.1D BPDUs, the
link will be assigned to the Access Network and the port will
be automatically configured to access port mode. Any standard
bridge connected to the ABridge is thus automatically
excluded from the core function.
Figure 1 shows an example of the proposed network topology. A
core of ABridges constitutes the campus backbone and
interconnects different area networks formed by standard
802.1D bridges.
4. Protocols
In this section the proposed protocols are described. The
Alternative Multiple Spanning Tree Protocol [7] is an
evolution of the standard MSTP [6] and RSTP [3] protocols. In
the following paragraphs RSTP and MSTP protocols are first
summarily introduced to provide the required context to
describe the AMSTP protocol. Differences between AMSTP and
MSTP are summarized after a description of AMSTP.
4.1 RSTP Protocol
A standard protocol for bridges is the Rapid Spanning Tree
Protocol, included in IEEE 802.1D[5]. It provides much faster
convergence than the previous standard protocol STP [4]. To
achieve convergence in (typically) fractions of one second,
RSTP substitutes the timer based mechanism that STP uses, to
ensure that the algorithm has converged with a locally
controlled proposal-agreement mechanism between adjacent
switches to transition the port states to forwarding in a
controlled way. This mechanism requires point to point links
to operate without loops. Other mechanisms are also used to
ensure rapid convergence.
4.2 MSTP protocol
The Multiple Spanning Tree Protocol (IEEE 802.1Q) is based on
RSTP (IEEE 802.1D) and creates different tree instances that
are used by sets of VLANs according to the configuration of
the bridge. MSTP implements a set of multiple and independent
spanning tree instances (MSTI) in a network region. Each
region is interconnected via a common spanning tree (CST) to
other MST regions. Inside a region, several VLANs can be
mapped to a single tree instance. Multiple tree instances at
each region make it possible to improve the usage of the
links. At each region, there is a tree instance (IST),
identified with the number 0, that acts as the basic spanning
tree. The CIST or total spanning tree is comprised of the CST
G.Ibanez Informational expires Dec 6, 2006 7
INTERNET DRAFT abridge June 6, 2006
that connects all the regions, and the IST that provides
connectivity inside each region. It allows separated
management of the regions, appearing to the outside as a
unique and separate "superbridge", i.e. the whole region
connects to the CST via one Regional Root Bridge port and a
number of designated ports like a single bridge. Therefore,
no change in internal topology inside is influenced by
outside tree topology changes. MSTP allows more efficient
network infrastructure usage by assigning different spanning
trees to different sets of VLANs.
But MSTP is complex to configure. Tree instances must be
planned and VLANs must be mapped to those tree instances. The
configuration table must be checked to be exactly the same
for all bridges of the same region. Serious malfunction
occurs if VLAN mapping discrepancies between bridges in the
same region exist.
4.3 Core layer: AMSTP Protocol
In the architecture proposed, the AMSTP Protocol works as a
Core Layer protocol providing shortest path interconnection
between Access Networks and providing network segmentation to
prevent the extension of failures to the whole switched
domain. The AMSTP Protocol has been proposed previously
[AMSTP] for metropolitan Ethernet backbones but it can be
extended for campus networks as well, with some
modifications. AMSTP is a simplified multiple spanning tree
protocol that uses one tree instance rooted at each edge
bridge in the core to forward frames. A complete multi-tree
is the set of all tree instances, one rooted at every edge
bridge that interconnects all bridges in the backbone. Only
the ABridge ports connected to other ABridges participate in
the multiple spanning tree protocol. The rest of the ports
participate in the standard spanning tree protocols such as
RSTP or STP (IEEE 802.1D).
To describe the AMSTP protocol, we consider its two main
functionalities: building and maintaining the spanning trees
(control plane), and processing and forwarding frames in the
bridges (user plane).
4.3.1 Building the Trees
The process of tree building consists of two parts: building
the basic (standard) RSTP tree and building the rest of the
instances, called Alternate Multiple Spanning Tree Instances
(AMSTI), till one tree instance per bridge is built as shown
in figure 2. The process of building the main tree is the
same as in RSTP.
Every bridge emits autonomously Bridge Protocol Data Units
(BPDU) every Hello Time (configurable from milliseconds) to
neighbouring bridges. First the Bridge having the lowest
Bridge ID (best configured priority plus lower MAC address
appended) is elected as Root Bridge of the main spanning
G.Ibanez Informational expires Dec 6, 2006 8
INTERNET DRAFT abridge June 6, 2006
tree. Every bridge receiving BPDU from this Bridge will adopt
it as Root and propagate it in the BPDUs emitted.
These BPDUs contain the minimum path cost from the emitting
bridge to the elected Root Bridge. Every Bridge attaches to
the spanning tree by selecting the port that is receiving the
"best" BPDU as the root port. The best BPDU is the one that
announces minimum path cost to root bridge. Each bridge
builds its own BPDU with the result of received BPDUs from
other bridges, selecting "superior" BPDUs according to the
standard STP criteria (lower Bridge ID, lower path cost,
lower port priority, lower port ID) and transmits them via
the main tree for the continuous maintenance of the optimum
main spanning tree.
A -----A
/ \ / \
/ \ / \
A------A-----A
A ----A A A A A R-----A A---R
/ \ / \ \ / \ \ / \
/ \ / \ \ / \ \ / \
R-----A----A A----R----A A---A----R A A A A---A A
Fig.2. A five node network and its five self-rooted AMSTP
Spanning Tree Instances (R: root bridge).
The process of building all the other tree instances, one per
tree, takes place as follows: Each Core Bridge appends to the
main tree BPDU the information of all AMSTI tree instances
which the bridges participates in. The information appended
per tree instance is called the AM-Record and contains similar
information for BPDU tree instance building. The key
difference with other spanning tree protocols is that there is
no bridge election. In AMSTP the ABridge claims itself as Tree
Root Bridge of its own instance and accepts equally every
other ABridge as the Root of its own instance. The bridge is
accepted as the root by other bridges without negotiation
(except when a malfunction is detected). This self rooted tree
instance is identified by the bridge ID of the edge ABridge
(root). The rest of the process is analogous to the building
of the MSTI tree instances used by MSTP inside an MST region
[4]: the tree is built by selecting tree paths at every bridge
according to the same minimum path cost criteria as MSTP,
using port priority and port ID for tie breaking. A flag
octet, identical to the one for building the basic tree
instance, is used by the bridges to communicate and negotiate
transitions of port states and roles per tree instance.
4.3.2 Frame processing in Core Switches
G.Ibanez Informational expires Dec 6, 2006 9
INTERNET DRAFT abridge June 6, 2006
When processing a frame, a Core Switch (ABridge) may act as an
ingress, transit or egress ABridge.
As ingress ABridge, the switch encapsulates the frame with an
additional Layer 2 header containing its MAC as source
address, and as destination the MAC address of the egress
ABridge. The ingress ABridge forwards the encapsulated frame
through the branch belonging to the spanning tree instance
rooted at the egress ABridge. This path is a pair-wise
shortest path because the tree is built by minimizing path
cost from each root to the rest of the nodes.
Traffic forwarding in the core depends on the traffic type:
broadcast, multicast and traffic to unknown destinations is
forwarded via the tree instance rooted at the ingress ABridge.
Unicast traffic (to a known ABridge) is forwarded through the
tree instance of the egress ABridge. Forwarding takes place by
sending the frame through the bridge root port. Broadcast and
multicast traffic are forwarded via the tree instance rooted
at the ingress ABridge.
ABridges may learn from the received frames both the MAC
addresses of other ABridges and the MACs of the connected end
nodes by the inspection of the inner and outer Ethernet MAC
addresses of the encapsulated frames. This learning process is
called double MAC learning and is applicable only in networks
with a moderate number of end nodes, like a backbone with
routers connected to it [7].
The MAC learning process is based on frames broadcasted over
the switched network. These broadcasts are commonly ARP
packets issued by end nodes for layer 2 destination address
resolution. In this process the bridges learn the originating
MAC at receiving ports and the hosts add the IP-MAC pair to
their ARP table. In networks with a high number of end nodes,
processing a high number of ARP requests by every endnode may
result in significant load for endnodes. A different mechanism
is needed to prevent ARP packets from
broadcasting/multicasting in large Ethernet campus networks.
The ports of switches that are not connected to AMSTP capable
Core Switches do not run AMSTP, so they are kept out of the
core forwarding mechanism. For Core Switches running AMSTP to
interoperate with legacy switches running STP or RSTP, a
mechanism is needed, like the standard port migration protocol
used by MSTP, RSTP and STP. Basically the mechanism is that if
a port of an MSTP switch receives BPDUs of protocol version 0
(STP protocol) it will emit STP BPDUs only. Recovery is not
automatic; the port will not emit MSTP BPDUs until a
configuration command restarts the protocol migration process,
forcing renegotiation between neighbouring switches.
4.3.3 AMSTP BPDU layout
AMSTP BPDUs have a structure that resembles MSTP BPDUs [4]
since both are comprised essentially of a basic BPDU and
G.Ibanez Informational expires Dec 6, 2006 10
INTERNET DRAFT abridge June 6, 2006
several AM-Records appended. The AMSTP BPDU structure is shown
in figure 3. The basic BPDU is used for basic tree (0)
negotiation between switches. Each of the appended AM-Records
is used to negotiate a specific tree instance (AMSTI).
As in the MSTP case the BPDUs carrying the rapid spanning tree
information distributed via instance 0 also carry the
information of all the spanning tree instances appended to the
RSTP BPDU as AM records. This reduces broadcasting and
simplifies BPDU processing at the switches.
--------------------------
! Basic RSTP BPDU !
! Tree instance 0 !
-------------------------- -------------------------
! [AMSTP header] ! /! AMSTI flags !
! ! / -------------------------
--------------------------/ ! Root bridge ID (edge)!
! Tree Instance 1 ! -------------------------
! Root 1 ! ! Root path cost !
! ! -------------------------
-------------------------- ! Dest. Port Address !
! Tree Instance 2 !\ ! of Root bridge !
! Root 2 !| -------------------------
! ! \ ! Port priority !
-------------------------- | -------------------------
........... \! Remaining hops !
-------------------------- -------------------------
! Tree Instance 1 !
! Root N !
--------------------------
Fig. 3. AMSTP BPDU layout
Every AM-record includes an octet flag identical to the one
described for the RSTP tree. These flags are used to negotiate
all transitions of each tree instance between connected ports
of neighbouring switches.
Minimum configuration is an important requirement for Core
Switches. While multiple spanning tree algorithms enable much
better usage of the existing infrastructure, they are usually
complex to configure because a way to assign frames to tree
instances is needed. In the case of MSTP, this means that the
mapping of VLANs to tree instances (MSTIs) has to be
configured manually at each bridge, resulting in a complex and
error-prone process.
AMSTP uses Self rooted Spanning Tree instances instead of
VLAN mapped trees and all tree instances are automatically
created, so no tree configuration is needed. The parameters to
configure are those common to RSTP, such as selection of the
Root Bridge and configuration of the Backup Bridges for the
region and their priorities.
Multicast (L2 addresses) traffic. Multicast traffic in the
campus core is forwarded via same tree instances as unicast
traffic, via pair-wise shortest paths to destination ABridges.
G.Ibanez Informational expires Dec 6, 2006 11
INTERNET DRAFT abridge June 6, 2006
The difference with unicast traffic is that the spanning tree
used is rooted at the ingress ABridge, instead of the tree
rooted at the destination ABridge. The multicast trees are
therefore always optimized for minimum hops without the
construction of additional tree instances. As for RBridges,
ABridges may treat multicast traffic as broadcast or may use
current techniques like IGMP snooping to limit broadcast.
4.4 AMSTP versus MSTP
Table I below shows a comparison of the main protocol
differences between MSTP and AMSTP. The first difference is
the criteria used for assignment of frames to a tree instance
for processing, in other words, how the bridge knows which
spanning tree instance to use to forward the frame. The second
one is the criteria used to create a tree instance.
TABLE I
MSTP VS AMSTP - MAIN PROTOCOL DIFFERENCES
--------------------------------------------------------------
Protocol feature MSTP AMSTP
--------------------------------------------------------------
Criteria for
frame assignment Destination MAC
of frame(root)
to a tree instance VLAN tag on frame (802.1Q)
--------------------------------------------------------------
Tree instance Configured : Automatic: One
formation Sets of VLANs are per core bridge
criteria mapped to every tree
instance
--------------------------------------------------------------
Number of tree instances Configured :1 to 64 One per core
bridge (*)
--------------------------------------------------------------
Root bridge As RSTP (lower bridge No election.
election. ID including bridge priority) Every bridge is
the root of its
tree instance
--------------------------------------------------------------
Bridge ID 4 MSB byte priority, 12 bit VLAN ID 6 byte MAC
--------------------------------------------------------------
Single or Multiple Single
Multiple MST regions
--------------------------------------------------------------
Main application
Environment Interconnected VLAN based regions
Cores, backbones
--------------------------------------------------------------
(*) An ABridge with no access ports (transit ABridge instead
of edge ABridge) does not create a self rooted instance.
4.5 Designated (and Root) ABridge
G.Ibanez Informational expires Dec 6, 2006 12
INTERNET DRAFT abridge June 6, 2006
Similarly to RBridges, an ABridge of each link has special
duties. This ABridge acts as the Designated RBridge of that
link. The DR function combines very well with being the root
bridge of the spanning tree of that link. To achieve automatic
election of ABridges as roots of the respective access
networks of the campus it would suffice that the default
bridge ID of ABridges have a lower value than that of standard
bridges (midrange). An ABridge may in this way become the root
bridge of any link. DR election and root bridge election are
one and the same operation, performed according to the
standard procedure [5]. In this way DR election does not
depend on any external mechanism and convergence time at links
does not add up to the convergence time of DR election at IS-
IS as in the RBridge case. The complete DR election process is
avoided.
4.6 Forwarding scenario
Now the basic forwarding scenario is described. Figure 4 shows
two hosts H1 and H2 connected at different access networks.
First the ARP and destination ABridge resolution are
described, and then the forwarding process.
4.6.1 ARP and ABridge Resolution
Using ARP servers is the optional mechanism proposed to limit
broadcast/multicast traffic. However, the standard ARP
mechanism must be kept to ensure that hosts that silently move
from one part of the campus to another can be located.
Besides ARP for host resolution, the servers may also be used
for resolution of the destination ABridge. Each server stores
a table with tuples containing the IP, L2 address of the end
node and L2 address of the Designated Bridge (Root ABridge).
The set of stored tuples corresponds to IP addresses that
produce identical (few bits) hash results of IP destination
end node.
The sequence for communication between H1 and H2 at figure 4
is as follows:
Host H1 first sends a broadcast ARP packet to get the
resolution of host H2s L2 address. The packet is distributed
through the spanning tree of the access network and arrives at
the root ABridge. The root ABridge detects the ARP, calculates
hash(IP destination address) and with the result obtains the
server responsible for that IP address. The server performs a
look up using H2s IP destination address and obtains the H2 L2
address and the (egress) ABridge ID of that access network,
then sends the reply in a packet to the ingress ABridge. The
ABridge extracts the information and forwards a standard ARP
response packet to host H1. Host H1 can then proceed to send
packets with the L2 address of host H2. The ingress ABridge
also registers the originating host by sending a registration
packet containing the ARP packet to the corresponding
ARP/ABridge server, obtained by computing hash(IP origin).
G.Ibanez Informational expires Dec 6, 2006 13
INTERNET DRAFT abridge June 6, 2006
b---b
b: standard bridge / Access Layer
A: ABridge b---b
Path: H1-b-b-b-A-A-A-b-b-b-H2 / .............
A A
\\ \\ Core layer
\\ \\
A======A=====A
/ \ \ ..........
/ \ \ Access Layer
H1---- b---b--b b b---b---b----H2
/ / / \ \
b-/ b---b b- b b-b b---b---b
Figure 4. End to End forwarding scenario
Note: If the destination host is connected to the same access
network, the host will reply directly by emitting an ARP
response packet.
Note: The ABridge registers a host at the corresponding ARP
Server/Registrar whenever it detects a frame from an unknown
host connected at its access network.
4.6.2 Forwarding
The frame forwarding process is as follows: the standard frame
sent by host (IP(H2), L2(H2)) arrives to the Access network
root bridge (ABridge). Its DA Ethernet Address contains the
end node destination address. The root ABridge (Designated)
looks at its cache for the ID of the destination end nodes
designated ABridge (that was filled just before with the
ARP/ABridge server response). The ABridge still has in its
cache the pair (L2 address, L2 egress ABridge) obtained before
and encapsulates the frame with a header like this: (DA egress
ABridge, SA ingress ABridge, Ethertype: AMSTP). It then
determines the applicable tree instance by looking at the
destination ABridge and forwards it through the port that was
elected root for the ABridge destination instance. The packet
arrives at the Designated Port of the next ABridge, which then
inspects it and forwards it to the outer destination MAC
address using the corresponding tree instance to obtain the
root port of that instance. The packet is forwarded again via
the root port till the egress ABridge is reached. The egress
ABridge detects that it is the destination of the frame,
removes the encapsulation header of the frame and forwards the
original frame via the access port where the L2 host has been
learnt or via all access ports if H2 is unknown. The packet
goes from the egress bridge (root) to H2 following a branch of
the tree rooted at the egress bridge. Frame forwarding in the
access networks is performed in the standard way with the
spanning tree set up by STP or RSTP. A packet exiting the
ABridge by an access port must look to ordinary bridges like
an ordinary layer 2 packet and must not be encapsulated.
G.Ibanez Informational expires Dec 6, 2006 14
INTERNET DRAFT abridge June 6, 2006
The ABridge may learn the destination ABridge by host list
interchange. The forwarding behaviour of RBridges is as
follows: "When a DR R1 receives a native packet with layer 2
address S and layer 2 destination address D, R1 looks up the
location of D. If D is claimed by egress RBridge R2, then R1
encapsulates the packet, directing it towards R2". ABridges
may use the same behaviour, but in this case network size
might not scale to one hundred thousand end nodes--the Campus
Transit Tables (CTT) would be too big.
In contrast to an RBridge, when an ABridge receives an
encapsulated packet, it forwards it based on the DA ABridge
and does not change the DA for the "next-hop" address. The
next hop is selected by forwarding the frame via the root port
of tree instance rooted at the destination ABridge. A packet
in the core must look like an Ethernet frame, but must be
differentiable from a native layer 2 packet by ABridges. To
accomplish this, a new layer 2 protocol type ("Ethertype") is
used.
4.7 Learning End node Location
ABridges learn end node location in access ports as standard
bridges do. ABridges learn root bridge IDs of the multiple
instances of core from AMSTP BPDUs received.
Similarly to RBridges, the Core (Edge) ABridge, acting as root
and Designated RBridge, might work in two modes:
- As a standard Designated RBridge, that learns the L2
addresses of attached end nodes, initiates a distributed
ARP when an ARP query is received for an unknown
destination, and answers ARP queries when the target node
is known. This mechanism is an alternative to the use of
ARP Servers/Registrars
- From data packets. They learn (layer 3, layer 2) pairs
(for the purpose of supporting proxy ARP/ND) from
listening to ARP or ND replies.
4.8 Routing in ABridges vs Learning Bridges Addresses
Some recent proposals like Shortest Path Bridging (SPB), as
proposed at the IEEE [12][13], use also multiple tree
instances rooted at edge bridges. However it presents the
problem of asymmetrical spanning trees. This happens when the
tree rooted at bridge A differs in chosen path A-B from the
path chosen by the tree rooted at B to A. The problem occurs
when there are ties in the path costs of tree instances. In
the instance with node A as root the tie may be solved by
choosing one path. In the instance with node B as root the tie
may be solved choosing a different path. But the spanning
trees must be symmetrical for the address learning to work
correctly: the address learnt at one port of B sent by A (via
spanning tree A to B), if forwarded via same port through the
opposite direction spanning tree (B to A) might find the path
blocked due to a different root port election at A for the
tree instance rooted at B.
G.Ibanez Informational expires Dec 6, 2006 15
INTERNET DRAFT abridge June 6, 2006
ABridges work differently because they do not learn addresses.
ABridges only build spanning trees and assign traffic to them
according to the destination ABridge. AMSTP uses always the
root port to send frames to the destination bridge (instance
rooted at destination), so the routing function for ABridge is
as follows:
- The bridge ID of the destination corresponding to the
destination end node is obtained from the ABridge Server.
- The bridge ID of the destination is translated to the
port MAC destination address of the destination ABridge
at the internal ABridge table.
- The frame is encapsulated with an external L2 header with
Destination Bridge ID.
- ABridges only forward a frame received at a designated
port, upstream, via the root port. The L2 external
destination address can be the Destination Bridge ID
itself. When the encapsulated frame arrives at the
destination bridge, it must identify its Bridge ID in the
DA and remove the L2 encapsulation of the frame and
forward it downstream to the access network via access
port(s).
4.9 Header on 802 Links
ABridges, as RBridges, must coexist with ordinary bridges.
The encapsulated L2 format must be compatible with the
Ethernet format. No additional fields like TTL are required
if the fast convergence mechanism procedure of RSTP is used.
An encapsulated packet would look as follows:
+--------------+----------------+----
| outer header |original packet |CRC|
+--------------+----------------+----
Figure 5 Encapsulated packet
The outer header contains:
o L2 destination = destination (egress) ABridge
o L2 source = origin (ingress) ABridge
o protocol type = "to be assigned...ABridge encapsulated
packet" (AMSTP)
4.10 Distributed ARP Query
ABridges may perform distributed ARP Query as RBridges do,
but for large campus networks, it is recommended the use of
ARP/ABridge servers/ registrars to reduce multicast traffic
and processing load at end nodes.
4.11 ABridge identities and addresses.
G.Ibanez Informational expires Dec 6, 2006 16
INTERNET DRAFT abridge June 6, 2006
Each ABridge needs a unique ID within the campus. The
simplest such address is a unique 6-byte ID, since such an ID
is easily obtainable as any of the EUI-48's owned by that
ABridge.
A new Ethertype must be assigned to indicate an ABridge-
encapsulated packet.
A layer 2 multicast address is used as the "all ABridges"
destination address in distributed ARP queries and any other
intercommunication message.
An optional layer 2 multicast address is needed to address to
"all ARB/ABridge" servers" (if used), to communicate among
them the available servers and the hash value(s) supported.
The AMSTP protocol distributes BPDUs addressed to the local
multicast protocol addresses used by the spanning tree
protocol (Bridge Group Address 01-80-C2-00-00-00). These
addresses are neither forwarded by bridges nor by RBridges or
ABridges.
5. ARP/ABridge Servers/Registrars
ABridges, as RBridges, may suppress the broadcast/multicast
for neighbour discovery by doing proxy ARP (IPv4) or proxy ND
(IPv6). However the mechanism proposed for large campus
networks to suppress broadcast/multicast for neighbour
discovery consists of ARP servers/registrars, where end nodes
are registered upon frame detection by the Designated
ABridge.
Although all ARP/ABridge servers might work in parallel, it
seems more efficient to perform statistical uniform load
distribution between servers, distributing the IP addresses
to resolve among the available servers by a hashing based
mechanism. The process is as follows: When a host issues an
ARP packet, the packet is forwarded up across the spanning
tree of the access network up to the root bridge (ABridge).
The ABridge, acting as Designated ABridge, performs hashing
of the destination IP. With this hash result the ABridge
obtains the ARP/ABridge server ID in charge of that IP
address. This server ID was previously obtained from
announcement packets from ARP servers containing its IP
address, L2 address, server ID and hash values that it
serves.
The ABridge encapsulates the ARP packet originated by endnode
with an additional L2 header with the destination address of
the corresponding server for ARP resolution.
The ABridge also prepares a registering packet with the IP
origin in order to register (or refresh) the host originating
the ARP into the corresponding ARP/ABridge server.
To avoid redundant load on ARP/ABridge servers, they must
share the load by assigning server IDs according to the
result of hash (IP destination). The total number of servers
G.Ibanez Informational expires Dec 6, 2006 17
INTERNET DRAFT abridge June 6, 2006
may be dimensioned according to the length of the hash
results used or by additional grouping. An additional
protocol between ARP/ABridge servers can be designed to
handle dynamic load splitting among the available
servers/registrars as they come into and out of service. A
server coming into service takes charge of a hash value
handed out by a running server. The new server performs the
new registrations, and forwards unsolved requests to the
previous server. After the expiration time of the first
registration performed at new server is reached, the handover
process is complete as no valid registries remain in previous
server.
6. Issues
In this section the identified issues, either for RBridges,
ABridges or both, are described or commented.
6.1 Per Ingress Spanning Tree.
Per Ingress multicast spanning Tree is implemented by default
with ABridges. Multicast paths always traverse minimum hops.
There is no issue here.
6.2 Symmetrical Paths Problem.
Shortest Path Bridging [SPB], the current proposal at IEEE
for pair-wise shortest path, depends on symmetrical tree
instances between bridges pairs for the L2 addresses learning
to work properly. In case of a path cost tie during tree
instances calculation, different paths might be elected in
opposite directions. The proposal at [13] describes a change
in MSTP Protocol to prevent this, but convergence times
increase.
ABridges are not subject to this problem because they forward
unicast traffic through one branch of the destination ABridge
tree instance. Packets are forwarded in ABridges via its port
elected as the root of the destination ABridge tree instance.
Unicast forwarding in the core campus always follows the path
from Designated Port to root port at each ABridge traversed
till reaching the destination. No address learning is used
for filtering as the packet is always forwarded via one port
(root port of ABridge).
6.3 Traffic Aggregation at Root.
A usual argument against spanning trees is that the traffic
accumulates near the root bridge, provoking congestion. The
real situation in campus networks is that traffic,
predominantly client-server, distributes in a tree form.
However, bridge design and Ethernet technologies with their
various speeds (100 Mbps, 1 Gbps, 10 Gbps) currently make
efficient switch designs possible (like N*100 Mbps with two 1
Gbps uplinks) that aggregate traffic efficiently.
G.Ibanez Informational expires Dec 6, 2006 18
INTERNET DRAFT abridge June 6, 2006
6.4 VLANs
VLAN usage in campus core requires detailed configuration of
which ABridge port belongs to which VLAN.
ABridges may learn, as VLAN aware bridges, which port belongs
to which VLAN by inspecting the incoming VLAN tagged frames.
This may help simplify VLAN configuration in ABridges but
does not eliminate the need to configure VLANs in campus
networks: Tagged VLAN frames must be generated either by
manually configured bridges or by hosts originating the
frames. In the hosts case, a system to assign a VLAN to each
host must be set up via a dynamic VLAN server that requires
configuration.
VLANs are used to separate broadcast domains. Frames are
broadcast in ABridges when the destination is unknown. The
tree instance used by the ingress ABridge to broadcast is its
own tree instance rooted at that ABridge. To limit broadcast
to the ports belonging to the VLAN, it is necessary to filter
by VLAN, which means that separate tree instances must be
built for VLAN forwarding, increasing the complexity or at
least requiring additional filtering on the tree instance
used for broadcast, performed using the VLAN tag inside the
encapsulated frame.
The recommendation, as default behaviour, is that VLAN tagged
frames are encapsulated in the same way as non VLAN tagged
frames and no VLAN specific forwarding is performed in the
ABridges.
6.5 Optimizing ARP/ND
Mechanisms proposed for RBridges for ARP/ND optimization
[10] are feasible in ABridges as well. However, if proposed
ARP/ABridge servers are used for ARP and destination ABridge
resolution they become redundant.
7. Security Considerations [To be added]
As for RBridges, the objective of ABridges is to keep at
least the same security level of bridged networks, not
introducing additional risks.
However the position of ABridges and their role as Root
Bridges combined with the use of ARP Servers/Registrars
allow efficient means to enhance the network security due to
easier localization of attackers, fast detection of spoofed
MACs by successive and duplicated, inconsistent registries,
etc.
If IEEE 802.1X is used in link ports connecting ABridges,
security is greatly enhanced in the network core, although
it can not prevent malicious behaviour of trusted
authenticated ABridges.
G.Ibanez Informational expires Dec 6, 2006 19
INTERNET DRAFT abridge June 6, 2006
However, authentication requires some additional
configuration, which contradicts in part the zero
configuration objective of RBridges and ABridges.
8. IANA Considerations.
A new Ethertype must be assigned to indicate an ABridge-
encapsulated packet.
A layer 2 multicast address is used as the "all ABridges"
destination address in distributed ARP queries and any other
intercommunication message.
An optional layer 2 multicast address is needed to address to
"all ARB/ABridge servers" (if used), to communicate among
them the available servers and the hash value(s) supported.
A new Ethertype is required for AMSTP protocol.
If ARP/ABridge servers-registrars are used, a L2 group
multicast address is required.
9. NRSTP Protocol.
This concept is in its early stages, and requires detailed
analysis and is described summarily here due to its
simplicity.
An alternative to implementing multiple simplified spanning
trees like AMSTP might consist of a simultaneous and
independent construction of N spanning trees (one per
ABridge) by full independent execution of N RSTP protocols
(single code, multiple data) at each ABridge. Each ABridge
executes RSTP protocol N times simultaneously to participate
in N tree instances. In one of the N protocol executions, the
ABridge claims itself as the nonnegotiable root bridge. At
the same time, with the other N-1 RSTP protocol executions,
the ABridge joins the N-1 RST tree instances proposed by the
other N-1 ABridges of the core. As for AMSTP, the destination
ABridge tree instance is used to forward unicast frames,
while for broadcast and multicast, the originating ABridge
tree instance is used. The number of BPDUs is multiplied, but
processing and implementation may be simplified.
10. Conclusions
An alternative implementation for RBridges has been
described. It provides pair-wise shortest paths using
multiple L2 spanning trees across ABridges instead of link
state L2 routing. The proposal has lower computational
complexity than RBridges and is scalable to large scale
Ethernet campus networks. A topological restriction,
automatically controlled, is introduced: core forwarding only
operates on dedicated links that interconnect ABridges.
Obtainable convergence is likely similar to that obtained by
the standard IEEE Rapid Spanning Tree protocol, less than 2
seconds, typically in the hundreds of milliseconds range. The
design is compatible with current IP nodes and routers and
with standard bridges, but any connected standard bridge
connected to an ABridge always works outside the network
core, in the access layer.
11. Acknowledgments
This draft used the current RBridges draft as a basis for the
structure, and for some of the text, to aid
comprehension and to aid comparison between the two.
Thanks to Matt Hutton who performed the English language
review.
For feedback and contributions, join the RBridge mailing list
at http://www.postel.org/rbridge
G.Ibanez Informational expires Dec 6, 2006 20
INTERNET DRAFT abridge June 6, 2006
12. References
[1] Bradner, S."Key words for use in RFCs to Indicate
Requirement Levels" BCP 14, RFC 2119, March 1997.
[2] The RBridge archives. http://www.postel.org/pipermail/
rbridge/
[3] Rapid Reconfiguration of Spanning Tree. http://www.
ieee802.org/1/pages/802.1w.html
[4] IEEE 802.1D.IEEE-1998 IEEE standard for local and
metropolitan area networks--Common specifications--Media
access control (MAC) Bridges.
[5] IEEE 802.1D-2004 IEEE standard for local and metropolitan
area Networks-- Common specifications--Media access control
(MAC) Bridges.
[6] IEEE 802.1Q-2003 IEEE standard for Local and Metropolitan
Area Networks- Virtual Bridged Local Area Networks.
[7] G. Ibanez, A. Garcia, A. Azcorra. Alternative Multiple
Spanning Tree Protocol (AMSTP) for Optical Ethernet
Backbones. IEEE HSLN (LCN 2004). Tampa, Nov. 2004
[8] Plummer, D., "Ethernet Address Resolution Protocol: Or
converting network protocol addresses to 48.bit Ethernet
address for transmission on Ethernet hardware", STD 37, RFC
826, November 1982.
[9] Narten, T., Nordmark, E. and W. Simpson, "Neighbour
Discovery for IP Version 6 (IPv6)", RFC 2461 (Standards
Track), December 1998.
[10] Perlman, R., "RBridges: Transparent Routing", Proc.
Infocom 2004.
[11] R. Perlman, J. Touch, A. Yegin. RBridges: Transparent
Routing draft-perlman-rbridge-03.txt May 2005.
http://www.ietf.org/internet-drafts/draft-perlman-rbridge-
03.txt
[12] M. Seaman. Shortest Path Bridging. http://www.ieee802.
org/1/files/public/docs2005/ new-seaman-shortest-path-par-
0405-02.htm.
[13] N. Finn. "An Update on Networking Technologies".
http://www.ieee802.org/802_tutorials/july05/nfinn-shortest
path-bridging.pdf
[14] A. Iwata, et al., "Global Open Ethernet Architecture for
a Cost-Effective Scalable VPN Solution,"IEICE Trans. On
Communications, E87-B, 1, pp.142-151, Jan. 2004.
G.Ibanez Informational expires Dec 6, 2006 21
INTERNET DRAFT abridge June 6, 2006
Author's Addresses
Guillermo Ibanez
Universidad Carlos III Madrid
Email: gibanez@it.uc3m.es
Alberto Garcia
Universidad Carlos III Madrid
Email: alberto@it.uc3m.es
Arturo Azcorra
Universidad Carlos III Madrid
Email: azcorra@it.uc3m.es
Intellectual Property Statement
The IETF takes no position regarding the validity or scope
of any Intellectual Property Rights or other rights that might
be claimed to pertain to the implementation or use of the
technology described in this document or the extent to which
any license under such rights might or might not be available;
nor does it represent that it has made any independent effort
to identify any such rights. Information on the procedures
with respect to rights in RFC documents can be found in BCP 78
and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and
any assurances of licenses to be made available, or the result
of an attempt made to obtain a general license or permission
for the use of such proprietary rights by implementers or
users of this specification can be obtained from the IETF on-
line IPR repository at http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its
attention any copyrights, patents or patent applications, or
other proprietary rights that may cover technology that may be
required to implement this standard. Please address the
information to the IETF at ietf-ipr@ietf.org.
Disclaimer of Validity
This document and the information contained herein are
provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET
SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM
ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT
LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR
PURPOSE.
Copyright Statement
Copyright (C) The Internet Society (2006).
G.Ibanez Informational expires Dec 6, 2006 22
INTERNET DR abridge June 6, 2006
This document is subject to the rights, licenses and
restrictions contained in BCP 78, and except as set forth
therein, the authors retain all their rights.
G.Ibanez Informational expires Dec 6, 2006 23