MPLS | C. Villamizar, Ed. |
Internet-Draft | Outer Cape Cod Network Consulting |
Intended status: Informational | K. Kompella |
Expires: April 10, 2013 | Contrail Systems |
October 09, 2012 |
MPLS Forwarding Compliance and Performance Requirements
draft-villamizar-mpls-forwarding-00
This document provides guidelines for implementors regarding MPLS forwarding and a basis for evaluations of forwarding implementations. Guidelines cover basic MPLS forwarding, forwarding when a deep MPLS label stack is encountered, MPLS UHP operations which require one or more label POP plus a PUSH, guidelines for hashing an MPLS stack and payload for multipath, and conformance and performance requirements for recent pseudowire and MPLS standards.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http:/⁠/⁠datatracker.ietf.org/⁠drafts/⁠current/⁠.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 10, 2013.
Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http:/⁠/⁠trustee.ietf.org/⁠license-⁠info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
The document addresses concerns raised on the MPLS WG mailing list about shortcomings in implementations of MPLS forwarding.
Although this document is informational, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are used. For those who wish to take the advice of this document, these keywords SHOULD be interpreted as described in RFC 2119 [RFC2119]. Similarly, the References section is split into Normative and Informative subsections. In this case references which are normative for forwarding are listed as normative. References which describe signaling only, though normative with respect to signaling, are listed as informative here, as they are informative with respect to MPLS forwarding.
In early generations of forwarding silicon (which may now be behind us), there apparently were some misconceptions about MPLS. The following statements may clear up some of these misconceptions.
See
Section 2.2.
This document is intended for multiple audiences: implementor (implementing MPLS forwarding in silicon or in software); systems designer (putting together a MPLS forwarding systems); deployer (running an MPLS network). These guidelines are intended to serve the following purposes:
The implementor, systems designer, and deployer have a transitive supplier customer relationship. It is in the best interest of the supplier to review their product against their customer's checklist and customer's customer's checklist if applicable.
A brief review of forwarding issues is provided in the subsections that follow. This section provides some background on why some of these requirements exist. The questions to ask of suppliers and testing is covered in the following sections, Section 3 and Section 4.
Basic MPLS architecture and MPLS encapsulation, and thereforepacket forwarding is defined in [RFC3031] and [RFC3032]. RFC3031 and RFC3032 are somewhat LDP centric. RSVP-TE supports traffic engineering (TE) and fast reroute, features that LDP lacks. The base document for RSVP-TE based MPLSis [RFC3209].
A few RFCs update RFC3032. Those with impact on forwarding include the following.
A few RFCs update RFC3209. Those that are listed as updating RFC3209 generally impact only RSVP-TE signaling. Forwarding is modified by major extension built upon RFC3209. Some of these extensions are discussed in following subsections.
MPLS deployments in the early part of the prior decade (circa 2000) tended to support either LDP or RSVP-TE. LDP was favored by some for its ability to scale close to the network edges without adding deployment complexity. RSVP-TE was favored where traffic engineering or fast reroute were considered important.
The use of MPLS FRR [RFC4090] added a second label to MPLS traffic, but only when FRR protection was in use.
At least one major service provider made use of LDP over RSVP-TE in their core network in the circa 2000-2005 time frame. LDP supported VPN services to the provider edges. RSVP-TE provided TE and FRR in the core. This yields two labels on nearly all packets in the core. They also used FRR which yields three labels on a large subset of traffic while FRR protection is active. VPNs added yet another label, bringing the label stack depth (with FRR) to four.
MPLS Link Bundling was the first RFC to address the need for multiple parallel links between nodes [RFC4201]. MPLS Link Bundling is notable in that it tried not to change MPLS forwarding, except in specifying the "All-Ones" component link. MPLS Link Bundling is seldom if ever deployed. Instead multipath techniques described in Section 2.3 are used.
MPLS hierarchy is defined in [RFC4206]. Although RFC4206 is considered part of GMPLS, the Packet Switching Capable (PSC) portion of the MPLS hierarchy are applicable to MPLS and may be supported in an otherwise GMPLS free implementation. The MPLS PSC hierarchy remains the most likely means of providing further scaling in an RSVP-TE MPLS network, particularly where the network is designed to provide RSVP-TE connectivity to the edges. This is the case for envisioned MPLS-TP networks. The use of the MPLS PSC hierarchy can add as many as four labels to a label stack, though it is likely that only one layer of PSC will be used in the near future.
While average packet size of Internet traffic may be large, long sequences of small packets have both been predicted in theory and observed in practice. Traffic compression and TCP ACK compression can conspire to create long sequences of packets of 40-44 bytes in payload length. If carried over Ethernet, the 64 byte minimum payload applies, yielding a packet rate of approximately 150 Mpps (million packets per second) for the duration of the burst. The peak rate is higher for other encapsulations, as high as 250 Mpps.
The loss of some TCP ACK packets are not the primary concern when such a burst occurs. When a burst occurs, any other packets, regardless of packet length are dropped once input buffers are exceeded. Buffers in front of the packet decision engine are often very small.
Internet service providers and content providers generally specify full rate forwarding with 40 byte payload packets as a requirement. This requirement often can be waived if the provider can be convinced that when long sequence of short packets occur no packets will be dropped.
With adequate buffers before the packet decision engine, an LSR can absorb a long sequence of short packets. Even if the output is slowed to the point where light congestion occurs, the packets, having cleared the decision process, can make use of larger VOQ or output side buffers and be dealt with according to configured QoS treatment, rather than dropped completely at random.
Packet rate requirements apply regardless of which network tier equipment is deployed in. Whether deployed in the network core or near the network edges, packets must be processed at full line rate or with sufficient buffering prior to the packet decision engine.
In any large provider, service providers and content providers, hash based multipath techniques are used in the core. In many of these providers hash based multipath is used in the edge as well and in some cases the metro.
The most common multipath techniques are ECMP applied at the IP forwarding level, Ethernet LAG with inspection of the IP payload, and multipath on links carrying both IP and MPLS, where the IP header is inspected below the MPLS label stack. In most core networks, the vast majority of traffic is MPLS encapsulated.
In order to support an adequately even load distribution across multiple links, IP addresses must be used. Common practice today is to reinspect the IP addresses at each LSR and use the label stack and IP addresses in a hash performed at each LSR.
The use of this technique is so ubiquitous in large core networks that lack of support for multipath makes any product unsuitable for use in large core networks. This will continue to be the case in the near future, even as deployment of MPLS Entropy Label begins to relax the core LSR multipath performance requirements given the existing deployed base of edge equipment without the ability to add an Entropy Label.
A generation of edge equipment supporting the ability to add an MPLS Entropy Label is needed before the performance requirements for core LSR can be relaxed. However, it is likely that two generations of deployment in the future will allow core LSR to support full packet rate only when a relatively small number of MPLS labels need to be inspected before hashing. For now, don't count on it.
Within the core of a network some form of multipath is almost certain to be used. Multipath techniques deployed today are likely to be looking beneath the label stack for an opportunity to hash on IP addresses.
A pseudowire encapsulated at a network edge must have a means to prevent reordering within the core if the pseudowire will be crossing a network core, or any part of a network topology where multipath is used.
Not supporting the ability to encapsulate a pseudowire with a control word may lock a product out from consideration. A pseudowire capability without control word support might be sufficient for applications which are strictly both intra-metro and low bandwidth. However a provider with other applications will very likely not tolerate having equipment which can only support a subset of their pseudowire needs.
Unlike a pseudowire control word, a pseudowire flow label [RFC6391], is required only for relatively large capacity pseudowires. There are many cases where a pseudowire flow label makes sense. Any service such as a VPN which carries IP traffic within a pseudowire can make use of a pseudowire flow label.
Any pseudowire which does not carry a flow label is in effect a single microflow (in [RFC2475] terms). Where multipath makes use of a simple hash (see Section 2.3) the presense of large microflows that consumes 10% of the capacity of a potentially congested link, can upset the traffic balance and in effect reduce the effective capacity of the entire microflow by far more than 10%. Therefore is a network where a significant number of parallel 10 Gb/s links exists, even a 1 Gb/s pseudowire should carry a flow label if possible.
The MPLS Entropy Label simplifies flow group identification [I-D.ietf-mpls-entropy-label] in the network core. Prior to the MPLS Entropy Label core LSR needed to inspect the entire label stack and often the IP headers to provide an adequate distribution of traffic when using multipath techniques (see Section 2.3). With the use of MPLS Entropy Label, a hash can be performed closer to network edges, placed in the label stack, and used within the network core.
The MPLS Entropy Label avoid full label stack and payload inspection within the core where performance levels are most difficult to acheive (see Section 2.2). The label stack inspection can be terminated as soon as the first Entropy Label is encounted, which is generally after a small number of labels are inspected.
In order to provide these benefits in the core, LSR closer to the edge must be capable of adding an entropy label. This support may not be required in the access tier, the tier closest to the customer, but is likely to be required in the edge or the border to the network core. LSR peering with external networks will also need to be able to add an Entropy Label.
MPLS-TP introduces forwarding demands that will be extremely difficult to meet in a core network. Most troublesome is the requirement for Ultimate Hop Popping (UHP, the opposite of Penultimate Hop Popping or PHP). Using UHP opens the possibility of one or more MPLS POP operation plus an MPLS SWAP operation for each packet. The potential for multiple lookups and multiple counter instances per packet exists.
As networks grow and tunneling of LDP LSPs into RSVP-TE LSPs is used, and/or RSVP-TE hierarchy is used, the requrement to perform one or two or more MPLS POP operations plus a MPLS SWAP operation (and possibly a PUSH or two) increases. If MPLS-TP LM (link monitoring) OAM is enabled at each layer, then a packet and byte count must be maintained for each POP and SWAP operation.
The following questions should be asked of a supplier. These questions are grouped into broad categories.
Packet rate performance of equipment supporting a large number of 10 Gb/s or 100 Gb/s links is not possible using desktop computers or workstations. The use of high end workstations as a source of test traffic was barely viable 20 years ago, but is no longer at all viable. Though custom microcode has been used on specialized router forwarding cards to serve the purpose of generating test traffic and measuring it, for the most part performance testing will require specialized test equipment. There are multiple sources of suitable equipment.
The set of tests listed here do not correspond one-to-one to the set of questions in Section 3. The same categorization is used and these tests largely serve to validate answers provided the the prior questions, and can also provide answers where a supplier is unwilling to disclose compliance or performance.
Performance testing is the domain of the IETF Benchmark Methodology Working Group (BMWG). Below are brief descriptions of conformance and performance tests. Some very basic tests are specified in [RFC5695] which partially cover only the basic performance test T#2.
The following tests should be performed by the systems designer, or deployer, or performed by the supplier on their behalf if it is not practical for the potential customer to perform the tests directly. These tests are grouped into broad categories.
This memo includes no request to IANA.
This document reviews forwarding behaviour specified elsewhere and points out compliance and performance requirements. As such it introduces no new security requirements or concerns. Knowledge of potential performance shortcomings may serve to help avoid pitfalls, but in very unlikely circumstances such knowledge could in principle be the basis of denial of service. In practice such extreme data and packet rate would be needed to make this type of denial of service extremely unlikely and undetectable denial of service impossible.