Internet DRAFT - draft-smirnov-ospf-dive
draft-smirnov-ospf-dive
OSPF A. Smirnov
Internet-Draft Cisco Systems, Inc.
Intended status: Standards Track April 10, 2015
Expires: October 12, 2015
OSPF for large-scale networks with regular topologies
draft-smirnov-ospf-dive-01
Abstract
Many popular topologies for large-scale networks have highly regular
structure with distinctive design pattern. Examples of such
topologies include hub-and-spoke (also known as "star") common in
enterprise WAN networks, fat-tree and Clos topologies common in
datacenters. For number of reasons in such large-scale networks
distance-vector protocols perform better than OSPF. On the other
hand network backbones have no highly regular topology pattern and
there OSPF outperforms distance-vector protocols. As a result large-
scale networks frequently employ different routing protocols in
different regions of the network, complicating network operations.
This document proposes OSPF extensions to improve scalability of
routing for large-scale networks.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on October 12, 2015.
Copyright Notice
Copyright (c) 2015 IETF Trust and the persons identified as the
document authors. All rights reserved.
Smirnov Expires October 12, 2015 [Page 1]
Internet-Draft OSPF Routing in large-scale networks April 2015
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Requirements Language . . . . . . . . . . . . . . . . . . . . 3
3. Problem definition . . . . . . . . . . . . . . . . . . . . . 4
3.1. Typical regular network topologies . . . . . . . . . . . 4
3.1.1. Hub-and-spoke topology . . . . . . . . . . . . . . . 4
3.1.2. Fat-tree topology . . . . . . . . . . . . . . . . . . 5
3.1.3. Clos topology . . . . . . . . . . . . . . . . . . . . 5
3.2. Problems with OSPF routing in large-scale networks . . . 5
4. Solution requirements . . . . . . . . . . . . . . . . . . . . 7
5. Functional summary . . . . . . . . . . . . . . . . . . . . . 8
6. Protocol Details . . . . . . . . . . . . . . . . . . . . . . 10
6.1. The DIVE area . . . . . . . . . . . . . . . . . . . . . . 10
6.2. Hello packets and the database exchange on DIVE
area interfaces . . . . . . . . . . . . . . . . . . . . . 11
6.3. LSA generation into the DIVE area . . . . . . . . . . . . 12
6.3.1. Metric Sub-TLV . . . . . . . . . . . . . . . . . . . 13
6.4. SPF calculation in the DIVE area . . . . . . . . . . . . 14
6.5. Translation of LSAs and route propagation . . . . . . . . 15
6.5.1. Hub routers: Propagation of routes from the core
network into the DIVE area . . . . . . . . . . . . . 16
6.5.2. Hub routers: Propagation of routes from the DIVE area
into the backbone area . . . . . . . . . . . . . . . 17
6.5.3. Hub routers: Propagation of routes from the DIVE area
into the non-backbone area . . . . . . . . . . . . . 18
6.5.4. Route propagation on Spoke routers . . . . . . . . . 19
7. Other considerations for the DIVE area . . . . . . . . . . . 20
7.1. Routing considerations . . . . . . . . . . . . . . . . . 20
7.2. LSDB size considerations . . . . . . . . . . . . . . . . 20
7.3. Optimal DIVE area design . . . . . . . . . . . . . . . . 21
8. Backward Compatibility . . . . . . . . . . . . . . . . . . . 21
9. Security Considerations . . . . . . . . . . . . . . . . . . . 21
10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21
11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 22
12. References . . . . . . . . . . . . . . . . . . . . . . . . . 22
12.1. Normative References . . . . . . . . . . . . . . . . . . 22
12.2. Informative References . . . . . . . . . . . . . . . . . 22
Smirnov Expires October 12, 2015 [Page 2]
Internet-Draft OSPF Routing in large-scale networks April 2015
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 23
1. Introduction
OSPF is a link-state protocol which was designed to provide routing
in networks of arbitrary topology. Big modern networks may have
thousands of routers providing the same type of service. To simplify
network design and operations as well as to unify hardware and
software configurations of routers such networks are frequently built
by replicating a basic design element hundreds and thousands of
times. Resulting network has highly regular topology exhibiting a
distinctive pattern. Such regular designs include hub-and-spoke
topology common in enterprise networks, "fat-tree" and Clos
topologies common in data center networks. Running routing protocols
in such networks poses number of problems arising mostly from the
very large number of routers in the network. On the other hand,
regular pattern of the topology allows certain simplifications.
OSPF (and link state protocols in general) can be used to provide
routing in networks with regular topologies but it does not make any
use of the regularity. This makes OSPF especially vulnerable to the
elements of the large scale.
Real-life networks combine regions of regular topologies with
(smaller scale) regions of free topology where OSPF works the best.
Continuing examples above, these are the headquarter (HQ) network of
the enterprise or interconnections between datacenters. For
operational simplicity it is desirable to have the same routing
protocol running in both parts of the network. The present document
specifies extensions to OSPF to improve its scalability in the very
large scale networks with regular topologies.
Section 3 of this document details typical problems seen if OSPF is
used as the routing protocol in the large scale networks with regular
topologies. Section 4 lists requirements for the routing solution.
Section 5 gives brief overview of the solution. Section 6 defines
details of the new protocol behavior.
2. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
Smirnov Expires October 12, 2015 [Page 3]
Internet-Draft OSPF Routing in large-scale networks April 2015
3. Problem definition
3.1. Typical regular network topologies
3.1.1. Hub-and-spoke topology
Hub-and-spoke topology (also frequently called a "star" topology)
comes in many variants but they all possess set of common properties
allowing to group them into a single category:
o A few (usually one or two) routers with central location which on
one side connect to each remote site by a point-to-point Layer-2
connection and on another side to the network backbone. These
routers are usually referred as 'hubs'.
o Remote site may have a single router possibly dual-homed to hub
routers or multiple routers each connecting to a hub router.
These routers are frequently referred as 'spokes'. Internally
site's network may consist of multiple devices but typically it is
relatively small and simple. To the outside world site's internal
reachability is described by just a few prefixes.
o Connection between hubs and spokes is the only connection between
a site and rest of the network. All traffic going in/out of the
site flows through the hubs. Inter-spoke traffic is minimal and
may even be blocked, for example for security reasons.
o Hub-and-spoke network usually provides some sort of redundancy.
Redundant hub routers and redundant connections to sites are
common; redundant spoke routers are not unusual.
o Number of sites/spokes may be very large requiring each hub router
to handle thousands of spoke neighbors.
o All sites share the same routing administrative policy, i.e. the
same route filtering and summarization rules.
On hub routers hub-and-spoke layer-2 connections may be presented to
OSPF as either single point-to-multipoint interface with large number
of neighbors or as multiple point-to-point Layer-3 interfaces.
Examples of layer-2 technologies commonly used to create hub-and-
spoke networks are (from historical to modern technologies): SMDS,
ISDN, Frame-Relay, IPsec VPN tunnels, VPLS.
Hub-and-spoke topology is the most common topology for building
enterprise WAN networks. It naturally maps into HQ/branch enterprise
Smirnov Expires October 12, 2015 [Page 4]
Internet-Draft OSPF Routing in large-scale networks April 2015
structure and provides optimal link capacity for traffic flowing
between branches and centralized services located in the HQ.
3.1.2. Fat-tree topology
Fat-tree topology is hierarchical structure resembling tree turned
upside down with link capacity growing from the "leafs" up toward the
"roots". This topology is common in datacenter designs. Since parts
of the tree may run Layer-2 switching, IP routing protocols usually
'see' 2 or 3 levels in the tree hierarchy. Typically lower level
nodes have redundant connections to the upper level.
This topology may be seen as hub-and-spoke topology where each spoke
site in turn has the tree-like organization.
3.1.3. Clos topology
Clos topology is multistage switching network where each node in a
stage connects to multiple nodes in previous and subsequent stages.
This is another topology frequently used in datacenters. The key
advantage of this topology is that number of elements in each stage
and number of links connecting an element with previous and
subsequent stages may be chosen in such way that all upstream and
downstream links have the same capacity. Clos network always
provides multiple same-cost paths between each pair of leaf nodes.
On a node number of ECMP paths for each destination would be equal to
number of connections to elements in the next stage of the Clos
topology. In real-life deployments there may be as many as several
dozens of ECMP paths for each routing destination prefix.
3.2. Problems with OSPF routing in large-scale networks
Hub-and-spoke networks pose challenges to routing scalability:
o All spoke sites placed into single OSPF area will, by the virtue
of link-state protocol, receive full topology information
describing each spoke in the area and its connection to hubs.
This information is at best excessive as all links from the site
go to hubs and knowledge of many other spoke links in the area
cannot reveal alternative paths to destinations outside of the
site. Worse yet, spoke routers are usually much smaller devices
than hub routers and they are intended to serve tiny site network
with a few routes and light traffic; spoke routers do not have
sufficient resources (either memory or CPU) to hold and process
the same link-state database as hub routers. Distance-vector
protocols in the same topology propagate only prefix reachability
information and do not tax spoke routers with topology view.
Smirnov Expires October 12, 2015 [Page 5]
Internet-Draft OSPF Routing in large-scale networks April 2015
o Inter-site visibility may be undesirable to decrease size of the
routing table on spoke devices or because of security reasons.
Link-state protocols do not allow routing information to be
filtered within area flooding scope. To compare, distance-vector
protocols allow route filtering and summarization on per-neighbor
basis.
o Having full topology visibility within an area may also lead hub
routers to compute suboptimal paths. Consider example hub-and-
spoke network with two hubs A and B and two spoke sites S1 and S2.
Each spoke site is dual-connected to both hubs. Hubs A and B are
Area Border Routers (ABRs) between hub-and-spoke WAN and the
backbone. If link between hub A and site S1 goes down then A will
choose (or at least consider) intra-area WAN route A -> S2 -> B ->
S1. But typically a spoke site (S2 in this example) must not be
considered for any transit traffic.
o Size and stability of Router LSA of hub routers is also
problematic due to very large number of point-to-point links in it
describing connections to all spokes in the area. Both the size
and probability of need to rebuild the Router LSA grow directly
proportional to N, number of spokes in the area. Thus overall CPU
resources consumed by flooding and processing hub's Router LSA
grow as O(N^2).
o Alternative network design is to separate each spoke into area of
its own. This solves problems of spoke routers but transfers
cumulative burden of supporting multiple areas to hub routers.
This design requires hub router to be able to support thousands of
NSSA areas, originate as many router LSAs, translate multiple LSAs
from/into each area. Managing common route filtering and
summarization policy is also difficult.
Some problems described above call for designing as small areas as
possible, while others vice verse are resolved by designing big
areas. In a relatively small network it is possible to find a
sensible design compromise but as number of spokes grows to thousands
finding working compromise becomes more and more challenging and the
balance becomes more and more fragile.
As noted in Section 3.1.2, fat-tree topology can be viewed as a
particular case of the hub-and-spoke topology. For this reason many
problems described above for hub-and-spoke networks are equally
affecting fat-tree networks.
Clos networks add one more problem specific to this topology:
Smirnov Expires October 12, 2015 [Page 6]
Internet-Draft OSPF Routing in large-scale networks April 2015
o Section 3.1.3 underlined that Clos networks provide massive equal-
cost multipth for most destinations. When a link goes down this
rarely means that the node lost connectivity to any destination.
OSPF in this situation rebuilds its Router LSA and floods it to
all routers in the area. Ensuing SPF on all nodes in the area
will result in no change in the routing on all routers not
connected to the problematic link. For comparison, distance-
vector routing protocols detect that there were no change in
reachability of prefixes or their metrics and hence no update is
sent to neighbors.
As can be seen from these examples full knowledge of topology within
an area, what is a key property of link state protocols and works so
well in networks with arbitrary topology, becomes the biggest factor
limiting routing scalability in networks with regular topologies.
For this reason distance-vector protocols are the tool of choice for
network designer working with large hub-and-spoke networks. Factors
specific to networks with regular topologies, such as link between
hub and spoke being the only connectivity to rest of the network and
low number of prefixes advertised in each direction, negate
convergence slowness which affects distance-vector protocols in more
complex topologies.
4. Solution requirements
For OSPF solution to be as scalable as distance-vector protocol these
design goals were taken into consideration:
o Spoke routers MUST be protected from routing information sent
from/to other spoke routers unless explicitly required by
network's policy.
o Solution MUST require no modification to OSPF routers other than
those connected to hub-and-spoke network itself.
o Number of LSAs which hub router originates MUST NOT grow faster
than O(N) where N is number of spoke sites and preferably should
not depend on N. Size of LSAs MUST NOT depend on N.
o Solution MUST protect against routing loops should a spoke site
becomes connected to another site and/or to rest of the network.
Supporting such configuration is not a goal and such configuration
looses important property defining spoke site of hub-and-spoke
networks. But network must be protected from routing meltdown in
case of accidental misconfiguration.
o Solution SHOULD provide easy and scalable way to apply common
administrative routing policy via centralized configuration.
Smirnov Expires October 12, 2015 [Page 7]
Internet-Draft OSPF Routing in large-scale networks April 2015
5. Functional summary
Area Border Routers in OSPF propagate inter-area routing information
by announcing reachability and routing metrics. Thus inter-area and
external routes are announced in OSPF as in distance-vector routing
protocols.
The solution satisfying requirements laid in the Section 4 is to
create new type of OSPF area which is devoid of LSAs carrying
topology information (i.e. Router and Network LSAs). The routing
information in this area is propagated only by prefix LSAs flooded
with link-local scope. In many aspects such area behaves similarly
to distance-vector routing protocols. Due to this the area type is
called DIstance-VEctor or DIVE area.
In hub-and-spoke network DIVE area would cover only links between
hubs and spoke routers. Other interfaces on hub routers will be
placed into a regular OSPF area carrying link-state routing
information. This may or may not be the backbone area. Further in
this document this area and network behind it is called "the core
network".
Spoke routers are connected to area covering the site network.
Example hub-and-spoke network using DIVE area is depicted on
Figure 1.
Smirnov Expires October 12, 2015 [Page 8]
Internet-Draft OSPF Routing in large-scale networks April 2015
............. ...........
. DIVE . . .
. +-----+ Site 1 .
. area | | .
. _____|Spoke| NSSA .
. / | |. .
............. ............. . / +-----+ .........
. . . . . / .
. . . . . / .
. +-----+ +-----+ / +-----+
. Backbone | | Regular | |/ | | .........
. | ABR | | Hub |___________|Spoke|. .
. area 0 | | area | |\ | | .
. +-----+ +-----+ \ +-----+ .
. . . . . \ . . Site 2 .
............. ............. . \ +-----+ .
. \ | | NSSA .
. \_____|Spoke|. .
. | | .........
. +-----+
. .
.............
Figure 1
Note that network using DIVE area design resembles [RFC4364] BGP/MPLS
VPN network where OSPF is used as PE-CE protocol [RFC4577]. The DIVE
area is analogue of BGP 'superbackbone' in MPLS VPN network and hub
and spoke routers are analogues of MPLS VPN PE routers.
DIVE area does not have LSAs with area scope flooding. This solves
problems related to excessive visibility of routing information where
it is undesirable. There is also no visibility or reachability of
routers other than immediately connected neighbors. Each router in
the DIVE area is ABR and there is no concept of a router internal to
the DIVE area. In this sense DIVE area is always a one-hop area.
Since DIVE area deviates from traditional one level hierarchy of OSPF
areas (backbone area/all other areas connected to it) it must employ
strict rules of accepting and propagating routing information to
prevent routing information looping. These rules are further
discussed in Section 6.5. ABRs connected to DIVE area perform
translation of LSAs from and into the DIVE area similar to
translation of NSSA LSA into External LSA on ABR between NSSA and
backbone areas.
Routing information propagated throughout DIVE area is encoded into
Prefix Attribute LSAs for OSPFv2 ([I-D.ietf-ospf-prefix-link-attr])
Smirnov Expires October 12, 2015 [Page 9]
Internet-Draft OSPF Routing in large-scale networks April 2015
and Extended Prefix LSAs for OSPFv3
([I-D.ietf-ospf-ospfv3-lsa-extend]). In both cases 'old' style LSAs
carrying either topological or routing information are not originated
or flooded into the DIVE area.
6. Protocol Details
6.1. The DIVE area
Current specification defines new type of OSPF area called DIVE area.
There is no concept of router internal to the DIVE area, all DIVE
area routers are ABRs. At the time of configuring DIVE area, role of
the router in the area must be provided as configuration parameter.
Currently supported DIVE area router roles are Hub and Spoke. Router
role has important implications during translation of routes between
the DIVE area and other connected area(s). Details of LSA
translation are covered in Section 6.5. In the subsequent text terms
'Hub' and 'Spoke' (with capital letter) are used to denote routers'
configured role in the DIVE area while terms 'hub' and 'spoke' (with
small letter) describe position of the router in the hub-and-spoke
topology.
Router of any role may be connected to multiple DIVE areas.
Hierarchical DIVE areas are not defined by the current specification.
In other words, router's role in all connected DIVE areas SHOULD be
the same.
Hub router must be connected to either backbone or non-backbone
regular area. Hub router cannot be connected to either stub or NSSA
area.
Site area (or areas) connected to the Stub router must be a non-
backbone regular area, NSSA or stub area. Note that route filtering
and summarization are best to be applied on the hub routers. This
will both protect the high-scale DIVE area from flooding unnecessary
information and provide centralized location to manage the route
filtering/summarization policy on a few hub routers rather than on
many spokes. So stub and NSSA areas on spoke sites would provide
limited benefit comparing to regular non-backbone area and SHOULD be
used only if there exist direct spoke-to-spoke neighborships between
some sites.
Virtual links through DIVE area are not supported.
Routers connected to the DIVE area MUST support Prefix Attribute LSAs
for OSPFv2 ([I-D.ietf-ospf-prefix-link-attr]) and Extended LSAs for
OSPFv3 ([I-D.ietf-ospf-ospfv3-lsa-extend]).
Smirnov Expires October 12, 2015 [Page 10]
Internet-Draft OSPF Routing in large-scale networks April 2015
6.2. Hello packets and the database exchange on DIVE area interfaces
All routers connected to the DIVE area must agree on the area's
configuration and learn roles of neighbors. Roles of the local
router and neighbor determine LSA translation and route propagation.
Section 6.5 details these rules.
Router's role is advertised in Hello packets sent on interfaces in
the DIVE area. Two new bits, called DV-bits, are used to encode
router's role in the DIVE area. DV-bits are allocated in the
Extended Options and Flags LLS TLV for OSPFv2 [RFC5613] and in the
Options field of OSPFv3:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+--+
| | | | | | | | | | | | | | | | | | | | | | | | | | |DV |F|I|RS|LR|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+--+
Bits in Extended Options and Flags TLV
0 1 2
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+-+-+--+-+-+--+-+--+
| | | | | | | | | | | | | DV|L|AF|*|*|DC|R|N|MC|E|V6|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+-+-+--+-+-+--+-+--+
OSPFv3 Options Field
Meaning of DV bits is defined as:
0 0 - Interface of sending router does not belong to DIVE area
0 1 - Sending router has role Hub
1 0 - Sending router has role Spoke
1 1 - Reserved
When Hello packet is received from previously unknown neighbor DV
bits are checked to see if neighbor's interface belongs to a DIVE
area. If neighbor advertises any of DV bits set and receiving
interface does not belong to a DIVE area OR if both DV bits
advertised by the neighbor are clear and receiving interface belongs
to a DIVE area then received packet MUST be silently discarded.
DV bits advertised by the neighbor must be stored in neighbor's data
structure and compared when receiving subsequent Hello packets from
Smirnov Expires October 12, 2015 [Page 11]
Internet-Draft OSPF Routing in large-scale networks April 2015
the neighbor. Change in advertised DV bits MUST generate BadLSReq
Neighbor FSM event. Processing this event will cause adjacency with
the neighbor to be reset and LSDB exchange to [re]start.
LS database of DIVE area may contain only Opaque LSAs for OSPFv2 and
Extended LSAs for OSPFv3. LSA types defined in [RFC2328] and
[RFC5340] are not flooded into the DIVE area, including AS External
LSAs with the domain flooding scope. OSPFv2 opaque LSAs with domain
flooding scope are not flooded into DIVE areas. OSPFv3 flooding of
unknown LSA types is performed as described by [RFC5340].
Choosing neighbors to establish the full adjacency or to stop
neighborship formation at the 2-Way Neighbor FSM state does NOT
depend on DIVE area roles of the local router and of the neighbor and
works as described in [RFC2328]. On broadcast and NBMA interfaces of
Spoke routers in the DIVE area implementation SHOULD have Router
Priority by default set to 0.
If during the LS database exchange with neighbor in DIVE area router
receives Database Description packet describing LSA of a type not
allowed in the DIVE area then SeqNumberMismatch Neighbor FSM event
MUST be generated and LSDB exchange must restart.
If OSPF interface type is broadcast then implementation SHOULD
support Incremental Hellos as described by [RFC5820]. If Incremental
Hellos are supported then they MUST be enabled by default on
broadcast interfaces in DIVE area. On point-to-multipoint interfaces
Hub routers SHOULD default to sending unicast Hellos to discovered
neighbors rather than sending multicast Hello packets listing all
known neighbors.
6.3. LSA generation into the DIVE area
Following types of LSAs containing listed TLV types may be originated
into the DIVE area:
For OSPFv2 (see [I-D.ietf-ospf-prefix-link-attr]):
o OSPFv2 Extended Prefix Opaque LSA
* OSPFv2 Extended Prefix TLV
Extended Prefix Opaque LSA MUST have LSA Type 9.
Extended Prefix TLV as defined by [I-D.ietf-ospf-prefix-link-attr]
may advertise attributes for several route types. Only following
route types may be present in Extended Prefix TLV in LSAs originated
into the DIVE area:
Smirnov Expires October 12, 2015 [Page 12]
Internet-Draft OSPF Routing in large-scale networks April 2015
1 - Intra-Area
3 - Inter-Area
5 - AS External
This specification defines one new sub-TLV of OSPFv2 Extended Prefix
TLV - Metric Sub-TLV, see Section 6.3.1.
For OSPFv3 (see [I-D.ietf-ospf-ospfv3-lsa-extend]):
o E-Intra-Area-Prefix-LSA
o E-Inter-Area-Prefix-LSA
o E-AS-External-LSA
o E-Link-LSA
Extended Prefix LSAs may contain following TLV types:
o 6 - Intra-Area Prefix TLV
o 3 - Inter-Area Prefix TLV
o 5 - External Prefix TLV
All Extended prefix LSAs originated into the DIVE area MUST have
link-local flooding scope. Thus their LSA types will be:
LSA function code LS Type Description
----------------- ------- -----------------------
35 0x8023 E-Inter-Area-Prefix-LSA
37 0x8025 E-AS-External-LSA
40 0x8028 E-Link-LSA
41 0x8029 E-Intra-Area-Prefix-LSA
6.3.1. Metric Sub-TLV
One new sub-TLV is defined for OSPFv2 to carry metric of the route.
This is required because in the DIVE area Extended Prefix Opaque LSAs
do not accompany [RFC2328] LSAs and must carry all route information.
The Metric Sub-TLV is a Sub-TLV of the OSPF Extended Prefix TLV
defined in [I-D.ietf-ospf-prefix-link-attr]. It MAY appear more than
once in the top level TLV and has the following format:
Smirnov Expires October 12, 2015 [Page 13]
Internet-Draft OSPF Routing in large-scale networks April 2015
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|E| MT-ID | Metric |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
where:
Type: TBD, suggested value is 3.
Length: 4.
E: one bit field. For AS External routes this defines the type of
external metric. Its function and meaning are fully analogous to the
E-bit of Type-5 LSA [RFC2328]. This bit is always 0 for intra- and
inter-area route metrics.
MT-ID: Multi-Topology ID (as defined in [RFC4915]).
Metric: The cost of this route. For inter-area and external routes
all 24 bits of the field may be used to encode route metric. For
intra-area routes upper 8 bits must be 0, thus valid metric for
intra-area route is in the range 1 to 2^16-1.
If more than one instance of the Metric Sub-TLV is present in the
Extended Prefix TLV then each instance MUST describe metric with
different Multi-Topology ID.
6.4. SPF calculation in the DIVE area
Intra-area SPF calculation within the DIVE area is reduced to walking
the list of neighbors in the area and adding neighbors which have
reached the FULL adjacency state to the table of routers'
reachability. Each reachable router is marked to be both ABR and
ASBR. Cost of the routing table entry is equal to the cost of local
interface associated with the neighbor.
Unlike of other area types in the DIVE area routers which do not have
fully established adjacency between them do not have valid intra-area
path to reach each other.
Calculation of inter-area and AS External routes follows algorithms
described in [RFC2328] for OSPFv2 and [RFC5340] for OSPFv3 with
following caveat. OSPFv2 Extended Prefix LSA does not provide
ordering of prefixes by prefix types. Hence there are no separate
phases of computing inter-area and then AS external routes. Instead,
Smirnov Expires October 12, 2015 [Page 14]
Internet-Draft OSPF Routing in large-scale networks April 2015
all Extended Prefix LSAs and all Extended Prefix TLVs in them are
examined in turn and type of calculated route is determined by the
Route Type field of the Extended Prefix TLV being examined.
6.5. Translation of LSAs and route propagation
Each router connected to a DIVE area is an Area Border Router and
will originate LSAs into connected non-DIVE areas to describe
reachability of prefixes received via the DIVE area. And vice verse,
it will originate LSAs into the DIVE area to describe reachability of
routes learned via other connected areas.
Moreover, in DIVE area all LSAs propagating routing information have
link-local scope. In those cases where routing information should
propagate between routers which do not have direct adjacency,
intermediate routers will originate their own LSAs carrying routing
information one hop further. Accordingly to distance-vector routing
principles metric of such routes will be increased to reflect cost of
the path to reach destination from the router originating LSA. There
are two cases when routing information has to be re-advertised within
the DIVE area:
o If inter-spoke site traffic is not prohibited then hub routers
must advertise to spokes inter-spoke routing information. This
may be either in the form of summarized routing information
covering multiple spoke sites (including advertisement of default
route) or in the form of non-summarized routing information hub
received from spoke routers. In the latter case hub would re-
advertise in the DIVE area routing information received from spoke
neighbors in the area.
o DIVE area is attached to the core network via redundant hub
routers and hubs advertise into the network summarized routing
information covering multiple site prefixes. If link between one
of hubs and a spoke site is lost then the hub must know
alternative paths to the spoke network via other hubs. Direct
neighborship between hub routers in the DIVE area would provide
such alternate path. Thus in this scenario hub routers advertise
summarized routing information into the core network and exchange
non-summarized spoke prefix reachability via DIVE area adjacency.
Note that in both scenarios above re-advertisement of routing
information within DIVE area is done by Hub routers and information
being re-advertized was received by the Hub from Spoke routers.
Routers whose role in DIVE area was configured as Spoke MUST NOT re-
advertise into the DIVE area routing information received via a DIVE
area. Routers whose role in DIVE area was configured as Hub MUST NOT
Smirnov Expires October 12, 2015 [Page 15]
Internet-Draft OSPF Routing in large-scale networks April 2015
re-advertize routing information received from other Hub routers in
this DIVE area.
Route filtering and/or summarization is frequently configured on Hub
routers. Summarization reduces number of LSAs to originate, maintain
and flood. Managing LSDB size is an important aspect of scalability
in a large-scale network. Summarization may be performed in both
directions - to summarize reachability of core networks advertized
toward the spoke sites (in the ultimate summarization case Hubs may
advertize toward spokes only one - default - route) and to summarize
into the core reachability of remote sites connected by the hub-and-
spoke network. To improve stability of LSA advertising summarized
routing information an implementation MUST allow cost of the summary
route to be statically provided via configuration and SHOULD have
static assignment of summary cost (as opposed to dynamically
computing cost of the summary route from costs of component routes
falling into the summary range) as default cost selection mechanism.
A spoke site for redundancy reasons may be connected to the hub-and-
spoke network by more than one spoke router. To prevent looping of
routing information, routes propagated from the DIVE area into the
spoke site network must not be re-advertised back into the DIVE area
by another spoke router. This is achieved by setting by Spoke
routers the Down bit in LSAs advertised into the spoke site network.
Unlike of looping prevention for MPLS VPN PE routers [RFC4576], Spoke
routers are allowed to install into their own routing table routes
derived from LSAs with the Down bit set but they MUST NOT re-
advertising them into the DIVE area.
Following chapters describe route propagation and re-advertisement
rules. For Hub routers LSA translation rules for routes learned from
the DIVE area depend on if the Hub is connected to the backbone area
or non-backbone area.
6.5.1. Hub routers: Propagation of routes from the core network into
the DIVE area
After completing calculation of routes during SPF DIVE area's Hub
router will perform Area Border Router's functions. This section
lists rules to propagate routing information from the core network
into the DIVE area. Each prefix being propagated will be described
by one Prefix TLV, see Section 6.3. Strategy of packing Prefix TLVs
into LSAs (one or multiple Prefix TLVs per LSA; LS ID selection etc)
is outside of the scope of this document.
To advertise routing information into DIVE area a Hub router MUST:
Smirnov Expires October 12, 2015 [Page 16]
Internet-Draft OSPF Routing in large-scale networks April 2015
o Examine each reachable prefix in its routing table. If the best
path for the prefix lies through the DIVE area then proceed to the
next prefix.
o Check if the route falls into any configured summary range. If it
is, mark the summary as reachable. Compare type of the route with
type of the summary using usual OSPF route preference rules
(intra-area route is preferred over inter-area; external Type-1 is
preferred over external Type-2 etc.). If route's type is more
preferable store it as new type of the summary route and proceed
to the next route
o Otherwise add into DIVE area's LSAs route TLV of the appropriate
type. The LSA MUST be added to LSDB of all interfaces where exist
Spoke neighbor(s) in a state above Down:
* If route is intra-area or inter-area then originate inter-area
route TLV. Use cost of the route as cost advertised in the TLV
* If route is Type-1 AS external route then originate Type-1 AS
External route TLV. Use cost of the route as cost advertised
in the TLV
* If route is Type-2 AS external route then originate Type-2 AS
External route TLV. Use Type-2 metric of the LSA which
contributed to the route plus one as cost advertised in the TLV
o For routes which became unreachable advertise the LSA without TLV
corresponding to the route or flush the LSA if applicable
Calculation of the summary route reachability and type, as well as
flushing TLVs of unreachable routes is the same for all router roles
and route propagation scenarios, so for brevity they are omitted in
the following sections.
6.5.2. Hub routers: Propagation of routes from the DIVE area into the
backbone area
If the Hub router is connected to (i.e. has interfaces in) the
backbone area then route advertisement rules are:
o For routes whose LSA was originated by a Spoke router originate
into the backbone area LSA of the corresponding type:
* For intra- and inter-area routes originate Type-3 Summary LSA
(OSPFv2) or Inter-area Prefix LSA (OSPFv3) using cost of the
route
Smirnov Expires October 12, 2015 [Page 17]
Internet-Draft OSPF Routing in large-scale networks April 2015
* For Type-1 AS external routes originate Type-5 External LSA
(OSPFv2) or AS-External LSA (OSPFv3) advertising Type-1
external route and using cost of the route
* For Type-2 AS external routes originate Type-5 External LSA
(OSPFv2) or AS-External LSA (OSPFv3) advertising Type-2
external route and using metric received in the Prefix TLV plus
one
* When advertising AS external routes the Hub router MUST also
announce itself as ASBR
o If LSA was not originated into the backbone because the route is
subsumed by summarization then instead add TLV to the LSA in LSDB
of all interfaces where exist Hub neighbor(s) in a state above
Down. Otherwise to provide inter-spoke connectivity TLV MAY be
added to the LSA in LSDB of all interfaces where exist Spoke
neighbor(s) in a state above Down. In either case the TLV MUST
have the same route type as route being advertised. For intra-
inter-area and Type-1 external routes advertised cost is taken as
cost of the route. For Type-2 external routes the cost is equal
to metric received in the Prefix TLV plus one
o Note that a Hub router MUST NOT advertise into either the backbone
or to other Hubs routes received from Hubs
6.5.3. Hub routers: Propagation of routes from the DIVE area into the
non-backbone area
If the Hub router is not connected to the backbone area then it
cannot advertise inter-area routing information. To provide
compromise between network design flexibility and compatibility with
[RFC2328]/[RFC5340] implementations the Hub router will advertise
routing information as AS external routes.
For routes whose LSA was originated by a Spoke the Hub router MAY
originate into the non-backbone area LSA of the following type:
o For intra- and inter-area routes originate Type-5 External LSA
(OSPFv2) or AS-External LSA (OSPFv3) advertising Type-1 AS
external route with metric equal to cost of the route
o For Type-1 AS external routes originate Type-5 External LSA
(OSPFv2) or AS-External LSA (OSPFv3) advertising Type-2 external
route with metric of one
Smirnov Expires October 12, 2015 [Page 18]
Internet-Draft OSPF Routing in large-scale networks April 2015
o For Type-2 AS external routes originate Type-5 External LSA
(OSPFv2) or AS-External LSA (OSPFv3) advertising Type-2 external
route and using metric received in the Prefix TLV plus one
Propagation of the route to other Hub or Spoke routers in the same
DIVE area is the same as described in the previous section.
6.5.4. Route propagation on Spoke routers
To advertise routing information received from the DIVE area into
areas of the site network Spoke router MUST:
o For intra- and inter-area routes originate into the site area
Type-3 Summary LSA (OSPFv2) or Inter-Area-Prefix LSA (OSPFv3).
Metric advertised in the LSA is set equal to cost of the route
o For Type-1 AS external routes originate into the site area Type-5
External LSA (OSPFv2) or AS-External LSA (OSPFv3) advertising
Type-1 external route with metric equal to cost of the route
o For Type-2 AS external routes originate Type-5 External LSA
(OSPFv2) or AS-External LSA (OSPFv3) advertising Type-2 external
route with metric equal to the metric in TLV contributing to the
route
o In all above cases the LSA MUST have the Down bit set
To advertise into DIVE area routing information learned from attached
site network area Spoke router:
o MUST skip routes which were produced from LSAs with the Down bit
set
o Since site area is a non-backbone area Spoke router MUST NOT have
inter-area routes learned via the site network
o For other route types add into DIVE area's LSAs route TLV of the
type as listed below. The LSA MUST be added to LSDB of all
interfaces where exist Hub neighbor(s) in a state above Down:
* If route is intra-area then originate inter-area route TLV.
Use cost of the route as cost advertised in the TLV
* If route is Type-1 AS external or translatable NSSA route then
originate Type-1 AS External route TLV. Use cost of the route
as cost advertised in the TLV
Smirnov Expires October 12, 2015 [Page 19]
Internet-Draft OSPF Routing in large-scale networks April 2015
* If route is Type-2 AS external or translatable NSSA route then
originate Type-2 AS External route TLV. Use Type-2 metric of
the LSA which contributed to the route as cost advertised in
the TLV
7. Other considerations for the DIVE area
7.1. Routing considerations
Route propagation rules in DIVE area make sure that information is
advertised between Hubs and Spokes and into respective connected
areas. These rules prohibit multiple re-advertisement of the routing
information within the DIVE area. Thus the DIVE area may only serve
as shim layer between traditional OSPF areas and it is not possible
to build full OSPF network functioning on principles of distance-
vector protocol.
Routing information traveling through DIVE area looses track of its
true originator. To prevent routing loops, routes delivered via the
DIVE area are made worse. For routes carrying metric comparable with
cost of intra-domain path this is done by adding cost of links to
reach route's origin. For routes carrying cost external to the OSPF
domain this is done by incrementing the external cost.
This increment in the metric also solves problem of originator
receiving back its own routing information. For example, if spokes
are connected to a Hub by a point-to-multipoint interface and the Hub
wants to advertise to spokes prefix received from a Spoke router then
the Spoke router which originated the prefix will receive its own
information back even though the LSA has link-local flooding scope.
Fast-poisoning of routes which became unreachable is ensured by rules
which prevent a Spoke router from re-advertising back to Hubs
(directly or indirectly via other Spoke routers connected to the same
spoke site) any routing information received on the DIVE area
interface.
7.2. LSDB size considerations
LSAs in DIVE area have link-local flooding scope. This solves
scalability problems of spoke routers because they don't have to deal
with information originated for or from the other spokes (unless it
is desired). This also solves input-output constraints on hub
routers by limiting volume of information which has to be exchanged
with each spoke. On the other hand this may have adverse effect on
the size of the link-state database a hub router has to maintain.
This is the case when spoke routers are connected by point-to-point
OSPF interfaces. In this case the database size of hub router is
multiplied by number of interfaces to spoke sites. This problem can
Smirnov Expires October 12, 2015 [Page 20]
Internet-Draft OSPF Routing in large-scale networks April 2015
be addressed by grouping spoke connections into smaller number of
point-to-multipoint interfaces.
7.3. Optimal DIVE area design
Given these considerations, the recommended DIVE area design for Hub
routers is:
o Spoke routers are connected via small number of point-to-
multipoint interfaces
o Hub routers, if necessary, are interconnected within the DIVE area
via interfaces separate from connections to Spokes
o Hub routers do route summarization of routing information they
advertise both into the core network and into the DIVE area toward
Spoke routers.
8. Backward Compatibility
Devices attached to the DIVE area MUST conform to this specification.
Awareness of devices is checked via new options bits in Hello packets
before the start of adjacency formation, thus devices not supporting
this specification cannot join the DIVE area.
This specification is fully backward compatible with devices not
immediately connected to DIVE area. New information defined by this
specification is not propagated to such devices. Current
specification includes measures to protect a network in case of basic
misconfiguration or design problem.
9. Security Considerations
This document does not introduce any new security implications.
General security considerations described in
[I-D.ietf-ospf-prefix-link-attr] and
[I-D.ietf-ospf-ospfv3-lsa-extend] apply to LSAs in DIVE area.
10. IANA Considerations
This specification updates several IANA OSPF registries:
o New bits (DV-bits) are reserved in the "LLS Type 1 Extended
Options and Flags" registry of the Extended Options and Flags Link
Local Signaling TLV
o New bits (DV-bits) are registered in the "OSPFv3 Options" registry
Smirnov Expires October 12, 2015 [Page 21]
Internet-Draft OSPF Routing in large-scale networks April 2015
o One new value is being added to the registry of OSPFv2 Extended
Prefix TLV Sub-TLVs (Metric sub-TLV)
11. Acknowledgements
The author would like to thank Paul Wells and Alvaro Retana for early
discussions.
12. References
12.1. Normative References
[I-D.ietf-ospf-ospfv3-lsa-extend]
Lindem, A., Mirtorabi, S., Roy, A., and F. Baker, "OSPFv3
LSA Extendibility", draft-ietf-ospf-ospfv3-lsa-extend-04
(work in progress), September 2014.
[I-D.ietf-ospf-prefix-link-attr]
Psenak, P., Gredler, H., Shakir, R., Henderickx, W.,
Tantsura, J., and A. Lindem, "OSPFv2 Prefix/Link Attribute
Advertisement", draft-ietf-ospf-prefix-link-attr-01 (work
in progress), September 2014.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998.
[RFC5340] Coltun, R., Ferguson, D., Moy, J., and A. Lindem, "OSPF
for IPv6", RFC 5340, July 2008.
[RFC5613] Zinin, A., Roy, A., Nguyen, L., Friedman, B., and D.
Yeung, "OSPF Link-Local Signaling", RFC 5613, August 2009.
[RFC5820] Roy, A. and M. Chandra, "Extensions to OSPF to Support
Mobile Ad Hoc Networking", RFC 5820, March 2010.
12.2. Informative References
[RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private
Networks (VPNs)", RFC 4364, February 2006.
[RFC4576] Rosen, E., Psenak, P., and P. Pillay-Esnault, "Using a
Link State Advertisement (LSA) Options Bit to Prevent
Looping in BGP/MPLS IP Virtual Private Networks (VPNs)",
RFC 4576, June 2006.
Smirnov Expires October 12, 2015 [Page 22]
Internet-Draft OSPF Routing in large-scale networks April 2015
[RFC4577] Rosen, E., Psenak, P., and P. Pillay-Esnault, "OSPF as the
Provider/Customer Edge Protocol for BGP/MPLS IP Virtual
Private Networks (VPNs)", RFC 4577, June 2006.
[RFC4915] Psenak, P., Mirtorabi, S., Roy, A., Nguyen, L., and P.
Pillay-Esnault, "Multi-Topology (MT) Routing in OSPF", RFC
4915, June 2007.
Author's Address
Anton Smirnov
Cisco Systems, Inc.
De Kleetlaan 6a
Diegem 1831
Belgium
Email: as@cisco.com
Smirnov Expires October 12, 2015 [Page 23]