Network Working Group | F. L. Templin, Ed. |
Internet-Draft | Boeing Research & Technology |
Intended status: Informational | June 23, 2012 |
Expires: December 23, 2012 |
Generic Tunnel MTU Determination
draft-generic-v6ops-tunmtu-06.txt
The Maximum Transmission Unit (MTU) for popular IP-within-IP tunnels is currently recommended to be set to 1500 (or less) minus the length of the encapsulation headers when static MTU determination is used. This requires the tunnel ingress to either fragment any IP packet larger than the MTU or drop the packet and return an ICMP Packet Too Big (PTB) message. Concerns for operational issues with Path MTU Discovery (PMTUD) point to the possibility of MTU-related black holes when a packet is dropped due to an MTU restriction. The current "Internet cell size" is effectively 1500 bytes (i.e., the minimum MTU configured by the vast majority of links in the Internet) and should therefore also be the minimum MTU assigned to tunnels, but the desired end state is full accommodation of MTU diversity. This document therefore presents a method to boost the tunnel MTU to larger values.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on December 23, 2012.
Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
The Maximum Transmission Unit (MTU) for popular IP-within-IP tunnels is currently recommended to be set to 1500 (or less) minus the length of the encapsulation headers when static MTU determination is used. This requires the tunnel ingress to either fragment any IP packet larger than the MTU or drop the packet and return an ICMP Packet Too Big (PTB) message [RFC0791][RFC2460]. Concerns for operational issues with Path MTU Discovery (PMTUD) [RFC1191][RFC1981] point to the possibility of MTU-related black holes when a packet is dropped due to an MTU restriction. The current "Internet cell size" is effectively 1500 bytes (i.e., the minimum MTU configured by the vast majority of links in the Internet) and should therefore also be the minimum MTU assigned to tunnels, but the desired end state is full accommodation of MTU diversity. This document therefore presents a method to boost the tunnel MTU to larger values.
Pushing the tunnel MTU to 1500 bytes or beyond is met with the challenge that the addition of encapsulation headers would cause an inner IP packet that is 1500 byte (or slightly less) to appear as a slightly larger than 1500 byte outer IP packet on the wire, where it may be too large to traverse a link on the path in one piece. One alternative is to perform IP fragmentation on the outer IP packet following encapsulation, however existing tunneling protocols do not require the egress to reassemble packets as large as 1500 bytes plus the size of the encapsulation headers. The tunnel ingress therefore has no way of knowing whether the egress can reassemble larger sizes. In the case of fragmentable packets, the tunnel ingress can instead perform IP fragmentation on the inner packet before encapsulating each fragment in outer headers. Considerations for both inner and outer fragmentation are presented in the following sections. A third alternative known as tunnel fragmentation is also given.
Existing tunneling protocols have by and large relied on PMTUD in order to provide necessary packet size feedback to the original source. When an IP tunnel configures an MTU smaller than 1500 bytes, packets that are small enough to traverse earlier links in the path toward the final destination may be dropped at the tunnel ingress with a PTB message returned to the original source. However, operational experience has shown that the PTB messages can be lost in the network due to filtering in which case the source does not receive notification of the loss. It is therefore highly desirable that the tunnel configure an MTU of at least 1500 bytes, even though encapsulation would cause the tunneled packet to be larger than 1500 bytes.
One possibility is to use IP fragmentation of the outer IP layer protocol so that inner packets up to 1500 bytes are delivered even if the tunnel encapsulation causes the outer packet to be larger than 1500 bytes. However, IPv4 fragmentation has been shown to be dangerous at high data rates due to the Identification field wrapping while reassemblies are still active [RFC4963]. Also, if outer IP fragmentation were used the tunnel ingress has no assurance that the egress can reassemble packets larger than 1500 bytes.
A second possibility is to enable PMTUD on the outer packet. However, the PTB messages that may result could either be lost on the return path to the tunnel ingress or may not contain enough information for translation into an inner packet PTB for delivery to the original source. Still another possibility is for the tunnel ingress to maintain state about MTU sizes for various tunnel egresses, but this becomes unwieldy when the number of egresses is large.
In short, PMTUD for existing tunneling protocols is a mess and a new approach is needed.
Section 3.2 of [RFC4213] presents both static and dynamic MTU determination algorithms. Similar algorithms appear in other tunneling mechanisms. These algorithms have been shown to be problematic in many instances, as discussed in Section 2.
The desired end state is for tunnels to support assured delivery of packets that are no larger than 1500 bytes while admitting larger packets into the tunnel without explicit assurances of delivery. Hosts should therefore set a tunnel ingress MTU of at least 1500 bytes, but should take care to not set so large an MTU that applications would be delayed by excessive PMTUD messages. Routers should instead set a constant value "HLEN" to the length of the encapsulation headers, then set an indefinite tunnel MTU, with a maximum upper bound of ((2^32 -1 ) - HLEN) for tunnels over IPv6 and ((2^16 - 1) - HLEN) for tunnels over IPv4.
This document therefore proposes a generic MTU determination method suitable for all tunnel types. In particular, the tunnel ingress admits inner packets into the tunnel based on their size, and may need to use inner packet fragmentation, outer packet fragmentation and/or tunnel fragmentation as necessary. The following sections discuss considerations for the approaches.
The tunnel ingress uses an algorithm for admitting packets of various sizes into the tunnel. In this algorithm, the ingress sets a constant value "HLEN" to the length of the encapsulation headers, and sets "MINMTU" to 576 for tunnels over IPv4 or 1280 for tunnels over IPv6. The algorithm used is as follows:
1) if the packet is larger than 1500: a. if the packet is an atomic packet (*) admit it into the tunnel if it is no larger than the MTU of the underlying interface to be used to carry the tunneled packet; otherwise, drop the packet and return a PTB message. b. if the packet is not an atomic packet, break it into N pieces (where N is minimized and each piece is at most (MINMTU-HLEN) bytes) and admit each piece into the tunnel. 2) if the packet is larger than (MINMTU-HLEN) but no larger than 1500: a. if the packet is an atomic packet, admit it into the tunnel and use either tunnel fragmentation or outer fragmentation to fragment it into pieces that are no larger than (MINMTU-HLEN). Also, for IPv6, return a PTB message with MTU set to (MINMTU-HLEN). b. if the packet is not an atomic packet, break it into N pieces (where N is minimized and each piece is at most (MINMTU-HLEN) bytes) and admit each piece into the tunnel. 3) if the packet is (MINMTU-HLEN) or less: a. admit the packet into the tunnel
[I-D.ietf-intarea-ipv4-id-update].
In the above algorithm, clause 1a.) requires that large atomic packets not be subject to reassembly at the tunnel egress. Instead, the tunnel ingress should process any PTB messages returned by the tunnel and translate them into a corresponding PTB message to return to the original source. (The size 1500 is chosen with the expectation that hosts that send packets larger than this also used host-based MTU determination, e.g., per [RFC4821].)
In clause 2a.), the ingress must use either tunnel fragmentation (see: Section 5) or outer fragmentation (see: Section 6) to break the inner packet into pieces that are no larger than (MINMTU-HLEN) before encapsulating each piece and sending them to the tunnel egress. For IPv6 inner packets, the ingress then also returns a PTB message with MTU set to (MINMTU-HLEN) as an indication to the source host that it must begin including a fragment header in the packets it sends (see Section 5 of [RFC2460]).
Clauses 1b.) and 2b.) in the algorithm perform inner fragmentation using the Identification value already present in the packet header. The fragment size of (MINMTU-HLEN) is chosen so that further fragmentation within the tunnel will not occur. The tunnel ingress then admits each fragment into the tunnel unconditionally, since it is the original source (and not the tunnel ingress) that asserts the uniqueness of the packet's Identification value.
Finally, clause 3a.) admits packets that are no larger than (MINMTU-HLEN) into the tunnel without need for tunnel or outer fragmentation.
Clause 2a.) in the algorithm of Section 4 requires the tunnel ingress to perform either tunnel fragmentation or outer fragmentation, with tunnel fragmentation preferred when it is available. Tunnel fragmentation requires separate packet Identification and segmentation control bits in a mid-layer of encapsulation that is added between the inner and outer IP headers. As for outer fragmentation, the tunnel egress is responsible for reassembly.
Tunnel fragmentation can be particularly useful for tunnels over IPv4, since the mid-layer encapsulation can include an extended Identification field that avoids the identification wrapping issues seen for IPv4 fragmentation [RFC4963]. Furthermore, when tunnel fragmentation is used the tunnel ingress has assurance that the egress has a minimum reassembly buffer size ("MINMRU") of at least (1500 + HLEN) bytes since both the ingress and egress are required to implement the scheme.
An example of tunnel fragmentation appears in SEAL [I-D.templin-intarea-seal].
When tunnel fragmentation cannot be used, clause 2a.) in the algorithm of Section 4 requires the tunnel ingress to perform outer fragmentation. As for tunnel fragmentation, reassembly is performed by the tunnel egress.
Using outer fragmentation, the ingress splits the inner packet into pieces that are no larger than (MINMTU-HLEN) bytes then adds the encapsulation headers. The tunnel ingress must however take care to ensure that the egress has a large enough MINMRU in order to reassemble packets up to (1500+HLEN) bytes. If this cannot be ensured, the ingress must instead drop the packet and return a PTB message with MTU size (MINMRU-HLEN).
When outer fragmentation is needed, the ingress must also ensure that it does not admit more packets into the tunnel than would cause the Identification value in the outer header to wrap while other fragmented packets may still be awaiting reassembly. See [I-D.ietf-intarea-ipv4-id-update] for considerations for this requirement.
Implementations can choose to ignore the outer fragmentation rule if there is reasonable assurance that all links on the path between the ingress and egress configure a sufficiently-large MTU. This approach is seen in widely-deployed tunnels over IPv4, but runs the risk of path MTU-related black holes and/or reassembly collisions when link MTUs cannot be controlled.
Following any inner, tunnel or outer fragmentation, the ingress must allow the encapsulated packets or fragments to be further fragmented by a router on the path that configures a link with a too-small MTU. These fragments would be reassembled by the tunnel egress the same as if the fragmentation occurred within the tunnel ingress.
This approach applies to existing IPv6-within-IPv4 transition mechanisms, including configured tunnels [RFC4213], 6to4 [RFC3056], ISATAP [RFC5214], DSMIP [RFC5555], 6rd [RFC5969], etc.
This same approach can further be applied to existing IP-within-IP tunneling mechanisms of all varieties, including GRE [RFC1701], IPv4-in-IPv4 [RFC2003], IPv6-in-IPv6 [RFC2473], IPv4-in-IPv6 [RFC6333], IPsec [RFC4301], Teredo [RFC4380], etc.
There are no IANA considerations for this document.
The security considerations for the various tunneling mechanisms apply also to this document.
This method was inspired through discussion on the IETF v6ops and NANOG mailing lists in the May/June 2012 timeframe.
[RFC0791] | Postel, J., "Internet Protocol", STD 5, RFC 791, September 1981. |
[RFC2460] | Deering, S.E. and R.M. Hinden, "Internet Protocol, Version 6 (IPv6) Specification", RFC 2460, December 1998. |