TOC |
|
By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress.”
The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html.
This Internet-Draft will expire on March 19, 2009.
The continued growth in the Default Free Routing Table (DFRT) stresses the global routing system in a number of ways. One of the most costly stresses is FIB size: ISPs often must upgrade router hardware simply because the FIB has run out of space, and router vendors must design routers that have adequate FIB. FIB suppression is an approach to relieving stress on the FIB by NOT loading selected RIB entries into the FIB. This document specifies two styles of FIB suppression. Edge suppression (ES) allows ISPs that deploy a core-edge topology to shrink the FIBs of their edge routers, including those that interface to other ISPs and exchange the full DFRT. Virtual Aggregation (VA) allows ISPs to shrink the FIBs of any and all routers. Both styles may be deployed autonomously by an ISP (cooperation between ISPs is not required), and can co-exist with legacy routers in the ISP.
1.
Introduction
1.1.
Scope of this Document
1.2.
Requirements notation
1.3.
Terminology
1.3.1.
Terms common to both VA and ES
1.3.2.
Terms unique to VA
1.3.3.
Terms unique to ES
1.4.
Temporary Sections
1.4.1.
Status as of September 2008
1.4.2.
Document revisions
1.4.3.
Open Questions
2.
Overview of Virtual Aggregation (VA)
2.1.
Mix of legacy and VA routers
2.2.
Summary of Tunnels and Paths
3.
Specification of Edge Suppression (ES)
4.
Specification of VA
4.1.
Requirements for VA
4.2.
VA Operation
4.2.1.
Legacy Routers
4.2.2.
Advertising and Handling Virtual Prefixes (VP)
4.2.3.
Border VA Routers
4.2.4.
Advertising and Handling Sub-Prefixes
4.2.5.
Suppressing FIB Sub-prefix Routes
4.3.
Requirements Discussion
4.3.1.
Response to router failure
4.3.2.
Traffic Engineering
4.3.3.
Incremental and safe deploy and start-up
4.3.4.
VA security
4.4.
New Configuration
5.
IANA Considerations
6.
Security Considerations
6.1.
Properly Configured VA
6.2.
Mis-configured VA
7.
Acknowledgements
8.
References
8.1.
Normative References
8.2.
Informative References
§
Authors' Addresses
§
Intellectual Property and Copyright Statements
TOC |
ISPs today manage constant DFRT growth in a number of ways. Most commonly, ISPs will upgrade their router hardware before DFRT growth outstrips the size of the FIB. In cases where an ISP wants to continue to use routers whose FIBs are not large enough, it may deploy them at edge locations where a full DFRT is not needed, for instance at the customer interface. Packets for which there is no route are defaulted to a "core" infrastructure that does contain the full DFRT. While this helps, it cannot be used for all edge routers, for instance those that interface with other ISPs. Alternatively, some lower-tier ISPs may simply ignore some routes, for instance /24's that fall within the aggregate of another route.
FIB Suppression is an approach to shrinking FIB size that requires no changes to BGP, no changes to packet forwarding mechanisms in routers, and relatively minor changes to control mechanisms in routers and configuration of those mechanisms. The core idea behind FIB suppression is to run BGP as normal, and in particular to not shrink the RIB, but rather to not load certain RIB entries into the FIB, for instance by not committing them to the Routing Table. This approach minimizes changes to routers, and in particular is simpler than more general routing architectures that try to shrink both RIB and FIB. With FIB suppression, there are no changes to BGP per se. The BGP decision process does not change. The selected AS-path does not change, and except on rare occasion the exit router does not change. ISPs can deploy FIB suppression autonomously and with no coordination with neighbor ASes.
This document describes two styles of FIB suppression, "Edge Suppression" (ES) and "Virtual Aggregation" (VA). ES can be used in ISPs that deploy a "core-edge" topology, where edge routers can default route to core routers. In fact, this basic approach is in use today with edge routers whose external peers do not require the full DFRT, for instance stub networks. ES extends this to edge routers whose external peers do require the full DFRT, including neighbor ISPs and many multi-homed stub networks. ES requires that core routers load the full DFRT into FIBs (i.e. do no FIB suppression). ES operates by tunneling MPLS packets from the core, through edge routers, to external peers (although edge routers strip the MPLS header before forwarding packets to external peers). ES works with legacy core routers, although they must be capable of using MPLS tunnels. ES also works with any mix of legacy and upgraded edge routers. ES imposes minimal new configuration requirements on network operators.
By contrast, Virtual Aggregation (VA) allows for FIB suppression in any and all routers within an ISP. The savings can be dramatic, easily 5x or 10x with only a slight path length and router load increase [va‑tech‑report‑08] (Francis, P., Ballani, H., and T. Cao, “Virtual Aggregation: A Configuration-only Approach to Reducing FIB Size,” July 2008.). VA operates by organizing the IP (v4 or v6) address space into Virtual Prefixes (VP), and using tunnels to aggregate the (regular) sub-prefixes within each VP.
TOC |
The scope of this document is limited to Intra-domain ES and VA operation. In other words, the case where a single ISP autonomously operates ES or VA internally without any coordination with neighboring ISPs.
Note that this document assumes that the ES or VA "domain" (i.e. the unit of autonomy) is the AS (that is, different ASes run VA independently and without coordination). For the remainder of this document, the terms ISP, AS, and domain are used interchangeably.
This document applies equally to IPv4 and IPv6.
ES or VA may operate with a mix of upgraded routers and legacy routers. There are no topological restrictions placed on the mix of routers. In order to avoid loops between upgraded and legacy routers, however, any legacy routers that require a full FIB MUST participate in tunnel formation (MPLS).
ES and VA use tunnels. While in principle a variety of tunnels may be used---any tunnel that works for deploying a VPN---this document limits itself to the use of MPLS tunnels, and indeed the terms "tunnel" and "LSP" (Label Switched Path) are used somewhat interchangeably. This document also generally assumes the use of the Label Distribution Protocol (LDP) as the default method of establishing LSPs [RFC5036] (Andersson, L., Minei, I., and B. Thomas, “LDP Specification,” October 2007.). Other methods of establishing LSPs may be used. Future versions of this document may specify the use of other tunnel types.
TOC |
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119] (Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels,” March 1997.).
TOC |
TOC |
- Install and Suppress:
- The terms "install" and "suppress" are used to describe whether a RIB entry has been loaded or not loaded into the FIB (or, equivalently, the Routing Table). In other words, the phrase "install a route" means "install a route into the FIB", and the phrase "suppress a route" means "do not install a route into the FIB".
- Legacy Router:
- A router that does not run VA or ES, and has no knowledge of VA or ES. Legacy routers, however, must participate in tunneling (with the exception of edge routers in ES that do not carry the full DFRT).
- Popular Prefix:
- A popular prefix is a sub-prefix that is installed in a router in addition to the sub-prefixes it holds by virtue of being a Aggregation Point Router (in the case of VA), or in addition to the default route (in the case of ES). The popular prefix allows packets to follow the shortest path. Note that different routers do not need to have the same set of popular prefixes.
- Routing Table:
- The term Routing Table is defined here the same way as in Section 3.2 of [RFC4271] (Rekhter, Y., Li, T., and S. Hares, “A Border Gateway Protocol 4 (BGP-4),” January 2006.): "Routing information that the BGP speaker uses to forward packets (or to construct the forwarding table used for packet forwarding) is maintained in the Routing Table." As such, FIB Suppression can be achieved by not installing a route into the Routing Table
- Routing Information Base (RIB):
- The term RIB is used rather sloppily in this document to refer either to the loc-RIB (as used in [RFC4271] (Rekhter, Y., Li, T., and S. Hares, “A Border Gateway Protocol 4 (BGP-4),” January 2006.)), or to the combined Adj-RIBs-In, the Loc-RIB, and the Adj-RIBs-Out.
TOC |
- Aggregation Point Router (APR):
- An Aggregation Point Router (APR) is a router that aggregates a Virtual Prefix (VP) by installing routes (into the FIB) for all of the sub-prefixes within the VP. APRs advertise the VP to other routers with BGP. For each sub-prefix within the VP, APRs have a Label Switched Path (LSP) from themselves to the external peer where packets for that prefix should be delivered.
- non-APR Router:
- In discussing VPs, it is often necessary to distinguish between routers that are APRs for that VP, and routers that are not APRs for that VP (but of course may be APRs for other VPs not under discussion). In these cases, the term "APR" will be taken to mean "a VA router that is an APR for the given VP", and the term "non-APR" will be taken to mean "a VA router that is not an APR for the given VP". The term non-APR router will not be used to refer to legacy routers.
- Sub-Prefix:
- A regular (physically aggregatable) prefix. These are equivalent to the prefixes that would normally comprise the DFRT in the absence of VA. A VA router will contain a sub-prefix entry either because the sub-prefix falls within a virtual prefix for which the router is an APR, or because the sub-prefix is installed as a popular prefix. Legacy routers hold the same sub-prefixes they hold today.
- VA router:
- A router that operates Virtual Aggregation according to this document.
- Virtual Prefix (VP):
- A Virtual Prefix (VP) is a prefix used to aggregate its contained regular prefixes (sub-prefixes). A VP is not physically aggregatable, and so it is aggregated at APRs through the use of tunnels.
- VP-List:
- A list of all VPs that must be statically configured into every VA router.
TOC |
- Core router:
- A router deployed in the core of a core-edge topology. Core routers may be legacy routers, but they MUST participate in tunnel creation (i.e. they must run MPLS), and they MUST NOT do FIB suppression.
- ES router:
- An edge router that operates Edge Suppression according to this document.
TOC |
This section contains temporary information, and will be removed in the final version.
TOC |
A "configuration-only" variant of VA (i.e. one that can be deployed with today's legacy routers) has been configured and tested on a small testbed of commercial routers, as described in [va‑tech‑report‑08] (Francis, P., Ballani, H., and T. Cao, “Virtual Aggregation: A Configuration-only Approach to Reducing FIB Size,” July 2008.). While this serves as proof that the data-plane portion of Virtual Aggregation works, this configuration is relatively complex, and there are some control-plane performance issues associated with the routers that we configured. The changes specified by this document (i.e. Section 4 (Specification of VA)) are currently under development.
TOC |
TOC |
TOC |
TOC |
For descriptive simplicity, this section starts by describing VA assuming that there are no legacy routers in the domain. Section 2.1 (Mix of legacy and VA routers) describes the additional functions required by VA routers to accommodate legacy routers.
A key concept behind VA is to operate BGP as normal, and in particular to populate the RIB with the full DFRT, but to suppress many or most prefixes from being loaded into the FIB. By populating the RIB as normal, we avoid any changes to BGP, and changes to router operation are relatively minor. The basic idea behind VA is quite simple. The address space is partitioned into large prefixes --- larger than any aggregatable prefix in use today. These prefixes are called virtual prefixes (VP). Different VPs do not need to be the same size. They may be a mix of \6, \7, \8 (for IPv4), and so on. Each ISP can independently select the size of its VPs.
VPs are not themselves physically aggregatable. VA makes the VPs aggregatable through the use of tunnels, as follows. Associated with each VP are one or more "Aggregation Point Routers" (APR). An APR (for a given VP) is a router that installs routes for all sub-prefixes (i.e. real physically aggregatable prefixes) within the VP. By "install routes" here, we mean:
Note that the AS-path is not effected at all by VA. Furthermore, the external peer selected by the ISP is the same whether or not VA is operating. This path may not follow the shortest path within the ISP (where shortest path is defined here as the path that would have been taken if VA were not operating), because the APR may not be on the shortest path between the ingress and egress routers. When this happens, the packet experiences additional latency and creates extra load (by virtue of taking more hops than it otherwise would have).
VA can avoid traversing the APR for selected routes by installing these routes in ingress routers. In other words, even if an ingress router is not an APR for a given sub-prefix, it may install that sub-prefix into its FIB. Packets in this case are tunneled directly from the ingress to the egress. These routes are called "Popular Prefixes", and are typically installed for policy reasons (i.e. customer routes are always installed), or for sub-prefixes that carry a high volume of traffic (Section 4.2.5.1 (Selecting Popular Prefixes)). Different routers may have different popular prefixes. As such, an ISP may assign popular prefixes per router, per POP, or uniformly across the ISP. A given router may have zero popular prefixes, or the majority of its FIB may consist of popular prefixes. The effectiveness of popular prefixes to reduce traffic load relies on the fact that traffic volumes follow something like a power-law distribution: i.e. that 90% of traffic is destined to 10% of the destinations. Internet traffic measurement studies over the years have consistently shown that traffic patterns follow this distribution, though there is no guarantee that they always will.
Note that for routing to work properly, every packet must sooner or later reach a router that has installed a sub-prefix route that matches the packet. This would obviously be the case for a given sub-prefix if every router has installed a route for that sub-prefix (which of course is the situation in the absence of VA). If this is not the case, then there must be at least one Aggregation Point Router (APR) for the sub-prefix's virtual prefix (VP). Ideally, every POP contains at least two APRs for every virtual prefix. By having APRs in every POP, the latency imposed by routing to the APR is minimal (the extra hop is within the POP). By having more than one APR, there is a redundant APR should one fail. In practice it is often not possible to have an APR for every VP in every POP. This is because some POPs may have only one or a few routers, and therefore there may not have enough cumulative FIB space in the POP to hold every sub-prefix. Note that any router ("edge", "core", etc.) may be an APR.
TOC |
It is important that an ISP be able to operate with a mix of "VA routers" (routers upgraded to operate VA as described in the document) and "legacy routers". This allows ISPs to deploy VA in an incremental fashion and to continue to use routers that for whatever reason cannot be upgraded. This document allows such a mix, and indeed places no topological restrictions on that mix. It does, however, require that legacy routers establish and use LSPs, so that APRs can forward packets to them. Specifically, when a legacy router is a border router, it must initiate LSPs to itself for instance using LDP, [RFC5036] (Andersson, L., Minei, I., and B. Thomas, “LDP Specification,” October 2007.), and must use its own address as the BGP NEXT_HOP in routes received from external peers.
VA prevents the routing loops that might otherwise occur when VA routers and legacy routers are mixed, as follows. First of all, note that once a packet reaches a VA router (either because the ingress router is a VA router, or because a legacy router forwards the packet to a VA router), it will follow tunnels all the way to the egress router (Section 2 (Overview of Virtual Aggregation (VA))). If the egress router is a VA router, then the packet is forwarded via the LSP mapping. If the egress router is a legacy router, then it will forward the packet to the appropriate external peer using its FIB entry.
If the ingress router is a legacy router, then it will forward the packet to the BGP NEXT_HOP via the associated tunnel.
Note that even in the unexpected case that some ingress legacy router actually does not use the tunnel but rather forwards the packet to the IGP-resolved next hop, the packet will either work its way towards the egress router, and will either progress through a series of legacy routers (in which case the IGP prevents loops), or it will eventually reach a VA router (after which it will exit the AS via tunnels as described above).
TOC |
To summarize, the following tunnels are created:
Ingress Some APR Egress External Router Router Router Router Peer ------- ------ ------ ------ -------- 1. VA===================>VA=========>VA(pop)====>LR 2. VA===================>VA=========>LR--------->LR 3. VA===============================>VA(pop)====>LR 4. VA===============================>LR--------->LR 5. LR===============================>VA(pop)====>LR 6. LR===============================>LR--------->LR (the following two are not expected, but may exist with some legacy router) 7. LR------->VA (remaining paths as in 1 to 4 above) 8. LR------->LR--------------------->LR--------->LR
The first and second paths represent the case where the ingress router does not have a popular prefix for the destination, and must tunnel the packet to an APR. The third and fourth paths represent the case where the ingress router does have a popular prefix for the destination, and so tunnels the packet directly to the egress. The fifth and sixth paths are similar, but where the ingress is a legacy router, and effectively has the popular prefix by virtue of holding the entire DFRT. (Note that some ISPs have only partial RIBs in their customer-facing edge routers, and default route to a router that holds the full DFRT. This case is not shown here.) Finally, paths 7 and 8 represent the unexpected case where legacy routers do not use an IGP-resolved next hop rather than a tunnel.
TOC |
Edge Suppression can be thought of as VA with only a single VP (i.e. the /0). Its operation, however, is much simpler. The topology for ES consists of core routers and ES routers. Core routers MUST install (into the FIB) the full DFRT, and MUST participate in tunnels as described below. Any legacy router with tunneling capability and a large enough FIB can be a core router.
ES routers are deployed at the edge. They MUST have a default route to (or towards) a core router, which MUST be installed. This style of configuration is common today, and so it is not necessary to specify here how the default route is configured and managed. The default route is the only route that ES routers must install, although they may (and typically will) install additional routes. Note that core routers or route reflectors that iBGP peer with an ES router may choose to filter routes they send to the ES router, with the obvious result that the ES router RIB will not contain the full DFRT. This can only be done if the ES router's external peers do not require the full DFRT. Whether or not an ISP chooses to do this is orthogonal to the operation of ES per se, and is not mentioned again.
ES routers initiate MPLS Label Switched Paths (LSP, or tunnel) that terminate at each of their external peers, which are then used by other routers to forward packets to their external peers. Specifically, ES routers MUST do the following:
It is important that if any router has a tunnel to the BGP NEXT_HOP of a route, that it use that tunnel. This should be normal behavior for any router, but ISPs must take care to insure that this is the case.
Sometimes an ES router may receive a packet from one external peer that needs to be forwarded to another of its external peers. If the only route in the FIB is the default route, then the packet will be routed to a core router, which will forward the packet back to the ES router via a tunnel. The extra hops can be avoided if the ES router installs additional prefixes into the FIB, but under certain constraints to prevent loops. Specifically, the router SHOULD install any routes where the IGP next hop router is not the same router as that of the default route, but only under the following conditions:
New configuration requirements for Edge Suppression (i.e. in addition to the configuration required today to deploy a core-edge topology with default routes at the edge) are minimal. The administrator must tell the ES router that it is an ES router, and must indicate the default route (including backup defaults). Given this, the ES router can automatically establish the appropriate tunnels, install the default route and the additional routes, and suppress all other routes.
TOC |
This section describes how to operate VA. It starts with a brief discussion of requirements, followed by a specification of router support for VA.
TOC |
While the core requirement is of course to be able to manage FIB size, this must be done in a way that:
In short, operation of VA must not significantly affect the way ISPs operate their networks today. Section 4.3 (Requirements Discussion) discusses the extent to which these requirements are met by the design presented in Section 4.2 (VA Operation).
TOC |
In this section, the detailed operation of VA is specified.
TOC |
VA can operate with a mix of VA and legacy routers. Although legacy routers have no notion of VA, they nevertheless MUST satisfy the following requirements:
As long as legacy routers install LSPs as described here, there are no topological restrictions on the legacy routers. They may be freely mixed with VA routers without the possibility of forming sustained loops (Section 2.1 (Mix of legacy and VA routers)).
TOC |
TOC |
VA routers must be able to distinguish VP's from sub-prefixes. This is primarily in order to know which routes to install. In particular, non-APR routers must know which prefixes are VPs before they receive routes for those VPs, for instance when they first boot up. This is in order to avoid the situation where they unnecessarily start filling their FIB with routes that they ultimately don't need to install (Section 4.2.5 (Suppressing FIB Sub-prefix Routes)).
It MUST be possible to statically configure the complete list of VP's into all VA routers. This list is known as the VP-List.
TOC |
From the point of view of best-match routing semantics, VPs are treated identically to any other prefix. In other words, if the longest matching prefix is a VP, then the packet is routed towards the VP. If a packet matching a VP reaches an Aggregation Point Router (APR) for that VP, and the APR does not have a better matching route, then the packet is discarded by the APR (just as a router that originates any prefix will discard a packet that does not have a better match).
The overall semantics of VPs, however, are subtly different from those of real prefixes (well, maybe not so subtly). Without VA, when a router originates a route for a (real) prefix, the expectation is that the addresses within the prefix are within the originating AS (or a customer of the AS). For VPs, this is not the case. APRs originate VPs whose sub-prefixes exist in different ASes. Because of this, it is important that VPs not be advertised across AS boundaries.
It is up to individual domains to define their own VPs. VPs MUST be "larger" (span a larger address space) than any real sub-prefix. If a VP is smaller than a real prefix, then packets that match the real prefix will nevertheless be routed to an APR owning the VP, at which point the packet will be dropped if it does not match a sub-prefix within the VP (Section 6 (Security Considerations)).
(Note that, in principle there are cases where a VP could be smaller than a real prefix. There is where the egress router to the real prefix is a VA router. In this case, the APR could theoretically tunnel the packet to the appropriate external peer, which would then forward the packet correctly. On the other hand, if the egress router is a legacy router, then the APR could not tunnel matching packets to the egress. This is because the egress would view the VP as a better match, and would loop the packet back to the APR. For this reason we require that VPs be larger than any real prefixes, and that APR's never install prefixes larger than a VP in their FIBs.)
It is valid for a VP to be a subset of another VP. For example, 20/7 and 20/8 can both be VPs. In fact, this capability is necessary for "splitting" a VP without increasing the FIB size in any router. (Section 4.2.2.5 (Adding and deleting VP's)).
TOC |
Any router may be configured as an Aggregation Point Router (APR) for one or more Virtual Prefixes (VP). For each VP for which a router is an APR, the router does the following:
TOC |
An ISP is free to select APRs however it chooses. The details of this are outside the scope of this document. Nevertheless, a few comments are made here. In general, APRs should be selected such that the distance to the nearest APR for any VP is small---ideally within the same POP. Depending on the number of routers in a POP, and the sizes of the FIBs in the routers relative to the DFRT size, it may not be possible for all VPs to be represented in a given POP. In addition, there should be multiple APRs for each VP, again ideally in each POP, so that the failure of one does not unduly disrupt traffic.
APRs may be (and probably should be) statically assigned. They may also, however, be dynamically assigned, for instance in response to APR failure. For instance, each router may be assigned as a backup APR for some other APR. If the other APR crashes (as indicated by the withdrawal of its routes to its VPs), the backup APR can install the appropriate sub-prefixes and advertise the VP as specified above. Note that doing so may require it to first remove some popular prefixes from its FIB to make room.
Note that, although VPs MUST be larger than real prefixes, there is intentionally no mechanism designed to automatically insure that this is the case. Such a mechanisms would be dangerous. For instance, if an ISP somewhere advertised a very large prefix (a /4, say), then this would cause APRs to throw out all VPs that are smaller than this. For this reason, VPs must be set through static configuration only.
TOC |
A non-APR router MUST install at least the following routes:
If the non-APR has a tunnel to the BGP NEXT_HOP of any such route, it MUST use the tunnel to forward packets to the BGP NEXT_HOP.
When an APR fails, routers MUST select another APR to send packets to (if there is one). This happens, however, through normal internal BGP convergence mechanisms. Note that it is strongly recommended that routers keep at least two VP routes in their RIB at all times. The main reason is that if the currently used VP route is withdrawn, the second VP route can be immediately installed, and the issue of whether to temporarily install sub-prefixes in the FIB is avoided (Section 4.2.5 (Suppressing FIB Sub-prefix Routes)). Another reason is that the IGP can be used to even more quickly detect that the APR has crashed, again allowing the second VP route to be immediately installed.
TOC |
An ISP may from time to time wish to reconfigure its VP-List. There are a number of reasons. For instance, early in its deployment an ISP may configure one or a small number of VPs in order to test VA. As the ISP gets more confident with VA, it may increase the number of VPs. Or, an ISP may start with a small number of large VPs (i.e. /4's), and over time move to more smaller VPs in order to save even more FIB. In this case, the ISP will need to "split" a VP. Finally, since the address space is not uniformly populated with prefixes, the ISP may want to change the size of VPs in order to balance FIB size across routers. This can involve both splitting and merging VPs. Of course, an ISP MUST be able to modify its VP-List without 1) interrupting service to any destinations, or 2) temporarily increasing the size of any FIB (i.e. where the FIB size during the change is no bigger than its size either before or after the change).
Adding a VP is straightforward. The first step is to configure the APRs for the VP. This causes the APRs to originate routes for the VP. Non-APR routers will install this route according to the rules in Section 4.2.2.4 (Non-APR Routers). even though they do not yet recognize that the prefix is a VP. Subsequently the VP is added to the VP-List of non-APR routers. The Non-APR routers can then start suppressing the sub-prefixes with no loss of service.
To delete a VP, the process is reversed. First, the VP is removed from the VP-Lists of non-APRs. This causes the non-APRs to install the sub-prefixes. After all sub-prefixes have been installed, the VP may be removed from the APRs.
In many cases, it is desirable to split a VP. For instance, consider the case where two routers, R1 and R2, are APRs for the same prefix. It would be possible to shrink the FIB in both routers by splitting the VP into two VPs (i.e. split one /6 into two /7's), and assigning each router to one of the VPs. While this could in theory be done by first deleting the larger VP, and then adding the smaller VPs, doing so would temporarily increase the FIB size in non-APRs, which may not have adequate space for such an increase. For this reason, we allow overlapping VPs.
To split a VP, first the two smaller VPs are added to the VP-lists of all non-APR routers (in addition to the larger superset VP). Next, the smaller VPs are added to the selected APRs (which may or may not be APRs for the larger VP). Because the smaller VPs are a better match than the larger VP, this will cause the non-APR routers to forward packets to the APRs for the smaller VPs. Next, the larger VP can be removed from the VP-lists of all non-APR routers. Finally, the larger VP can be removed from its APRs.
Finally, to merge two VPs, the new larger VP is configured in all non-APRs. This has no effect on FIB size or APR selection, since the smaller VPs are better matches. Next the larger VP is configured in its selected APRs. Next the smaller VPs are deleted from all non-APRs. Finally, the smaller VPs are deleted from their corresponding APRs.
TOC |
VA routers that are border routers MUST do the following:
(Note that an alternative approach would be to used stacked labels, with the outer label terminating at the border router, and the inner label identifying the external peer and distributed in BGP as described in [RFC3107] (Rekhter, Y. and E. Rosen, “Carrying Label Information in BGP-4,” May 2001.). This approach requires that fewer tunnels be installed by LDP. The need for this approach is for further study.)
TOC |
Sub-prefixes are advertised and handled by BGP as normal. VA does not effect this behavior. The only difference in the handling of sub-prefixes is that they might not be installed in the FIB, as described in Section 4.2.5 (Suppressing FIB Sub-prefix Routes).
In those cases where the route is installed, packets forwarded to prefixes external to the AS MUST be transmitted via the LSP established as described in Section 4.2.3 (Border VA Routers).
TOC |
Any route not for a known VP (i.e. not in the VP-List) is taken to be a sub-prefix. The following rules are used to determine if a sub-prefix route can be suppressed.
TOC |
Individual routers may independently choose which sub-prefixes are popular prefixes. There is no need for different routers to install the same sub-prefixes. There is therefore significant leeway as to how routers select popular prefixes. As a general rule, routers should fill the FIB as much as possible, because the cost of doing so is relatively small, and more FIB entries leads to fewer packets taking a longer path. Broadly speaking, an ISP may choose to fill the FIB by making routers APR's for as many VP's as possible, or by assigning relatively few APR's and rather filling the FIB with popular prefixes. Several basic approaches to selecting popular prefixes are outlined here. Router vendors are free to implement whatever approaches they want.
TOC |
This section describes the extent to which VA satisfies the list of requirements given in Section 4.1 (Requirements for VA).
TOC |
VA introduces a new failure mode in the form of Aggregation Point Router (APR) failure. There are two basic approaches to protecting against APR failure, static APR redundancy, and dynamic APR assignment (see Section 4.2.2.3.1 (Selecting APRs)). In static APR redundancy, enough APRs are assigned for each Virtual Prefix (VP) so that if one goes down, there are others to absorb its load. Failover to a static redundant APR is automatic with existing BGP mechanisms. If an APR crashes, BGP will cause packets to be routed to the next nearest APR. Nevertheless, there are three concerns here: convergence time, load increase at the redundant APR, and latency increase for diverted flows.
Regarding convergence time, note that, while fast-reroute mechanisms apply to the rerouting of packets to a given APR or egress router, they don't apply to APR failure. Convergence time was discussed in Section 4.2.2.4 (Non-APR Routers), which suggested that it is likely that BGP convergence times will be adequate, and if not the IGP mechanisms may be used.
Regarding load increase, in general this is relatively small. This is because substantial reductions in FIB size can be achieved with almost negligible increase in load. For instance, [va‑tech‑report‑08] (Francis, P., Ballani, H., and T. Cao, “Virtual Aggregation: A Configuration-only Approach to Reducing FIB Size,” July 2008.) shows that a 5x reduction in FIB size yields a less than one percent increase in load overall. Given this, depending on the configuration of redundant APRs, failure of one APR increases the load of its backups by only a few percent. This is well within the variation seen in normal traffic loads.
Regarding latency increase, some flows may see a significant increase in delay (and, specifically, an increase that puts it outside of its SLA boundary). Normally a redundant APR would be placed within the same POP, and so increased latency would be minimal (assuming that load is also quite small, and so there is no significant queuing delay). It is not always possible, however, to have an APR for every VP within every POP, much less a redundant APR within every POP, and so sometimes failure of an APR will result in significant latency increases for a small fraction of traffic.
TOC |
VA complicates traffic engineering because the placement of APRs and selection of popular prefixes influences how packets flow. (Though to repeat, increased load is in any event likely to be minimal, and so the effect on traffic engineering should not be great in any event.) Since the majority of packets may be forwarded by popular prefixes (and therefore follow the shortest path), it is particularly important that popular prefixes be selected appropriately. As discussed in Section 4.2.5.1 (Selecting Popular Prefixes), there are static and dynamic approaches to this. [va‑tech‑report‑08] (Francis, P., Ballani, H., and T. Cao, “Virtual Aggregation: A Configuration-only Approach to Reducing FIB Size,” July 2008.) shows that high-volume prefixes tend to stay high-volume for many days, and so a static strategy is probably adequate. VA can operate correctly using either RSVP-TE [RFC3209] (Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, V., and G. Swallow, “RSVP-TE: Extensions to RSVP for LSP Tunnels,” December 2001.) or LDP to establish tunnels.
TOC |
It must be possible to install and configure VA in a safe and incremental fashion, as well as start it up when routers reboot. This document allows for a mixture of VA and legacy routers, allows a fraction or all of the address space to fall within virtual prefixes, and allows different routers to suppress different FIB entries (including none at all). As a result, it is generally possible to deploy and test VA in an incremental fashion. Although MPLS and LDP must be operational everywhere, once done, an ISP can incrementally increase the number of VA routers, the number of VPs, and the number of suppressed FIB entries over time.
Likewise, routers can bootstrap VA by first bringing up the IGP, then establish LSPs, then establish routes to all required sub-prefixes, and then finally advertise VPs.
TOC |
Regarding ingress filtering, because in VA the RIB is effectively unchanged, routers contain the same information they have today for installing ingress filters [RFC2827] (Ferguson, P. and D. Senie, “Network Ingress Filtering: Defeating Denial of Service Attacks which employ IP Source Address Spoofing,” May 2000.). Presumably, installing an ingress filter in the FIB takes up some memory space. Since ingress filtering is most effective at the "edge" of the network (i.e. at the customer interface), the number of FIB entries for ingress filtering should remain relatively small---equal to the number of prefixes owned by the customer. Whether this is true in all cases remains for further study.
Regarding DoS attacks, there are two issues that need to be considered. First, does VA result in new types of DoS attacks? Second, does VA make it more difficult to deploy DoS defense systems. Regarding the first issue, one possibility is that an attacker targets a given router by flooding the network with traffic to prefixes that are not popular, and for which that router is an APR. This would cause a disproportionate amount of traffic to be forwarded to the APR(s). While it is up to individual ISPs to decide if this attack is a concern, it does not strike the authors that this attack is likely to significantly worsen the DoS problem.
Regarding DoS defense system deployment, more input about specific systems is needed. It is the authors' understanding, however, that at least some of these systems use dynamically established Routing Table entries to divert victims' traffic into LSPs that carry the traffic to scrubbers. The expectation is that this mechanism simply over-rides whatever route is in place (with or without VA), and so the operation of VA should not limit the deployment of these types of DoS defense systems. Nevertheless, more study is needed here.
TOC |
VA places new configuration requirements on ISP administrators. Namely, the administrator must:
TOC |
There are no IANA considerations.
TOC |
We consider the security implications of VA under two scenarios, one where VA is configured and operated correctly, and one where it is mis-configured. A cornerstone of VA operation is that the basic behavior of BGP doesn't change, especially inter-domain. Among other things, this makes it easier to reason about security.
TOC |
If VA is configured and operated properly, then the external behavior of an AS does not change. The same upstream ASes are selected, and the same prefixes and AS-paths are advertised. Therefore, a properly configured VA domain has no security impact on other domains.
This document discusses intra-domain security concerns in Section 4.3.4 (VA security) which argues that any new security concerns appear to be relatively minor.
If another ISP starts advertising a prefix that is larger than a given VP, this prefix will be ignored by APRs that have a VP that falls within the larger prefix (Section 4.2.2.3 (Aggregation Point Routers (APR))). As a result, packets that might otherwise have been routed to the new larger prefix will be dropped at the APRs. Note that the trend in the Internet is towards large prefixes being broken up into smaller ones, not the reverse. Therefore, such a larger prefix is likely to be invalid. If it is determined without a doubt that the larger prefix is valid, then the ISP will have to reconfigure its VPs.
TOC |
VA introduces the possibility that a VP is advertised outside of an AS. This in fact should be a low probability event, but it is considered here none-the-less.
If an AS leaks a large VP (i.e. larger than any real prefixes), then the impact is minimal. Smaller prefixes will be preferred because of best-match semantics, and so the only impact is that packets that otherwise have no matching routes will be sent to the misbehaving AS and dropped there. If an AS leaks a small VP (i.e. smaller than a real prefix), then packets to that AS will be hijacked by the misbehaving AS and dropped. This can happen with or without VA, and so doesn't represent a new security problem per se.
TOC |
The authors would like to acknowledge the efforts of Xinyang Zhang and Jia Wang, who worked on CRIO (Core Router Integrated Overlay), an early inter-domain variant of FIB suppression, and the efforts of Hitesh Ballani and Tuan Cao, who worked on the configuration-only variant of VA that works with legacy routers. We would also like to thank Hitesh and Tuan, as well as Scott Brim, Daniel Ginsburg, Robert Raszuk, and Rajiv Asati for their helpful comments. In particular, Daniel's comments significantly simplified the spec (eliminating the need for a new External Communities Attribute), and Robert suggested Edge Suppression.
TOC |
TOC |
[RFC1997] | Chandrasekeran, R., Traina, P., and T. Li, “BGP Communities Attribute,” RFC 1997, August 1996 (TXT). |
[RFC2119] | Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels,” BCP 14, RFC 2119, March 1997 (TXT, HTML, XML). |
[RFC2328] | Moy, J., “OSPF Version 2,” STD 54, RFC 2328, April 1998 (TXT, XML). |
[RFC2827] | Ferguson, P. and D. Senie, “Network Ingress Filtering: Defeating Denial of Service Attacks which employ IP Source Address Spoofing,” BCP 38, RFC 2827, May 2000 (TXT). |
[RFC3107] | Rekhter, Y. and E. Rosen, “Carrying Label Information in BGP-4,” RFC 3107, May 2001 (TXT). |
[RFC3209] | Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, V., and G. Swallow, “RSVP-TE: Extensions to RSVP for LSP Tunnels,” RFC 3209, December 2001 (TXT). |
[RFC4271] | Rekhter, Y., Li, T., and S. Hares, “A Border Gateway Protocol 4 (BGP-4),” RFC 4271, January 2006 (TXT). |
[RFC5036] | Andersson, L., Minei, I., and B. Thomas, “LDP Specification,” RFC 5036, October 2007 (TXT). |
TOC |
[va-tech-report-08] | Francis, P., Ballani, H., and T. Cao, “Virtual Aggregation: A Configuration-only Approach to Reducing FIB Size,” Cornell Technical Report http://hdl.handle.net/1813/11058 http://hdl.handle.net/1813/11058, July 2008. |
TOC |
Paul Francis | |
Cornell University | |
4108 Upson Hall | |
Ithaca, NY 14853 | |
US | |
Phone: | +1 607 255 9223 |
Email: | francis@cs.cornell.edu |
Xiaohu Xu | |
Huawei Technologies | |
No.3 Xinxi Rd., Shang-Di Information Industry Base, Hai-Dian District | |
Beijing, Beijing 100085 | |
P.R.China | |
Phone: | +86 10 82836073 |
Email: | xuxh@huawei.com |
Hitesh Ballani | |
Cornell University | |
4130 Upson Hall | |
Ithaca, NY 14853 | |
US | |
Phone: | +1 607 279 6780 |
Email: | hitesh@cs.cornell.edu |
TOC |
Copyright © The IETF Trust (2008).
This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights.
This document and the information contained herein are provided on an “AS IS” basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org.