Internet DRAFT - draft-pauly-tcp-encapsulation
draft-pauly-tcp-encapsulation
Network Working Group T. Pauly
Internet-Draft E. Kinnear
Intended status: Informational Apple Inc.
Expires: December 27, 2018 June 25, 2018
TCP Encapsulation Considerations
draft-pauly-tcp-encapsulation-00
Abstract
Network protocols other than TCP, such as UDP, are often blocked or
suboptimally handled by network middleboxes. One strategy that
applications can use to continue to send non-TCP traffic on such
networks is to encapsulate datagrams or messages within in a TCP
stream. However, encapsulating datagrams within TCP streams can lead
to performance degradation. This document provides guidelines for
how to use TCP for encapsulation, a summary of performance concerns,
and some suggested mitigations for these concerns.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on December 27, 2018.
Copyright Notice
Copyright (c) 2018 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
Pauly & Kinnear Expires December 27, 2018 [Page 1]
Internet-Draft TCP Encapsulation June 2018
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Motivations for Encapsulation . . . . . . . . . . . . . . . . 3
2.1. UDP Blocking . . . . . . . . . . . . . . . . . . . . . . 3
2.2. UDP NAT Timeouts . . . . . . . . . . . . . . . . . . . . 3
3. Encapsulation Formats . . . . . . . . . . . . . . . . . . . . 3
3.1. Multiplexing Flows . . . . . . . . . . . . . . . . . . . 4
4. Deployment Considerations . . . . . . . . . . . . . . . . . . 5
5. Performance Considerations . . . . . . . . . . . . . . . . . 5
5.1. Loss Recovery . . . . . . . . . . . . . . . . . . . . . . 6
5.1.1. Concern . . . . . . . . . . . . . . . . . . . . . . . 6
5.1.2. Mitigation . . . . . . . . . . . . . . . . . . . . . 6
5.2. Bufferbloat . . . . . . . . . . . . . . . . . . . . . . . 7
5.2.1. Concern . . . . . . . . . . . . . . . . . . . . . . . 7
5.2.2. Mitigation . . . . . . . . . . . . . . . . . . . . . 8
5.3. Head of Line Blocking . . . . . . . . . . . . . . . . . . 8
5.3.1. Concern . . . . . . . . . . . . . . . . . . . . . . . 8
5.3.2. Mitigation . . . . . . . . . . . . . . . . . . . . . 9
6. Security Considerations . . . . . . . . . . . . . . . . . . . 9
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9
8. Informative References . . . . . . . . . . . . . . . . . . . 9
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 10
1. Introduction
TCP streams are sometimes used as a mechanism for encapsulating
datagrams or messages, which is referred to in this document as "TCP
encapsulation". Encapsulation may be used to transmit data over
networks that block or suboptimally handle non-TCP traffic. The
current motivations for using encapsulation generally revolve around
the treatment of UDP packets (Section 2).
Implementing a TCP encapsulation strategy consists of mapping
datagram messages into a stream protocol, often with a length-value
record format (Section 3). While these formats are described here as
applying to encapsulating datagrams in a TCP stream, the formats are
equally suited to encapsulating datagrams within any stream
abstraction. For example, the same format may be used for both raw
TCP streams and TLS streams running over TCP.
Pauly & Kinnear Expires December 27, 2018 [Page 2]
Internet-Draft TCP Encapsulation June 2018
2. Motivations for Encapsulation
The primary motivations for enabling TCP encapsulation that will be
explored in this document relate mainly to the treatment of UDP
packets on a given network. UDP can be used for real-time network
traffic, as a mechanism for deploying non-TCP transport protocols,
and as a tunneling protocol that is compatible with Network Address
Translators (NATs).
2.1. UDP Blocking
Some network middleboxes block any IP packets that do not appear to
be used for HTTP traffic, either as a security mechanism to block
unknown traffic or as a way to restrict access to whitelisted
services. Network applications that rely on UDP to transmit data
will be blocked by these middleboxes. In this case, the application
can attempt to use TCP encapsulation to transmit the same data over a
TCP stream.
2.2. UDP NAT Timeouts
Other networks may not altogether block non-TCP traffic, but instead
make other protocols unsuitable for use. For example, many Network
Address Translation (NAT) devices will maintain TCP port mappings for
long periods of time, since the end of a TCP stream can be detected
by the NAT. Since UDP packet flows do not signal when no more
packets will be sent, NATs often use short timeouts for UDP port
mappings. Thus, applications can attempt to use TCP encapsulation
when long-lived flows are required on networks with NATs.
3. Encapsulation Formats
The simplest approach for encapsulating datagram messages within a
TCP stream is to use a length-value record format. That is, a header
consisting of a length field, followed by the datagram message
itself.
For example, if an encapsulation protocol uses a 16-bit length field
(allowing up to 65536 bytes of datagram payload), it will use a
format like the following:
Pauly & Kinnear Expires December 27, 2018 [Page 3]
Internet-Draft TCP Encapsulation June 2018
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
~ Datagram Payload ~
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The format of the length header field could be longer or shorter
depending on the needs of the protocol. 16 bits is most appropriate
when encapsulating datagrams that would otherwise be sent directly in
IP packets, since the payload length field for an IP header is also
16 bits.
The length field must be specified to either include itself in the
length of the entire record, or to only describe the length of the
payload field. The protocol used for encapsulating IKE and ESP
packets in TCP [RFC8229] does include the length field itself in the
length of the record. This may be slightly easier for
implementations to parse out records, since they will not need to add
the length of the length field when finding record offsets within a
stream.
3.1. Multiplexing Flows
Since TCP encapsulation is used to avoid failures caused by NATs or
firewalls, some implementations re-use one TCP port or one
established TCP stream for multiple kinds of encapsulated traffic.
Using a single port or stream allows re-use of NAT bindings and
reduces the chance that a firewall will block some flows, but not
others.
If multiple kinds of traffic are multiplexed on the same listening
TCP port, individual streams opened to that port need to be
differentiated. This may require adding a one-time header that is
sent on the stream to indicate the type of encapsulated traffic that
will follow. For example, TCP encapsulated IKE [RFC8229] uses a
stream prefix to differentiate its encapsulation strategy from
proprietary Virtual Private Network (VPN) protocols.
Multiplexing multiple kinds of datagrams, or independent flows of
datagrams, over a single TCP stream requires adding a per-record type
field or marker to the encapsulation record format. For ease of
parsing records, this value should be placed after the length field
of the record format. For example, various ESP packet flows are
identified by the four-byte Security Parameter Index (SPI) that
Pauly & Kinnear Expires December 27, 2018 [Page 4]
Internet-Draft TCP Encapsulation June 2018
comprises the first bytes of the datagram payload, while IKE packets
in the same TCP encapsulated stream are differentiated by using all
zeros for the first four bytes.
4. Deployment Considerations
In general, any new TCP encapsulation protocol should allocate a new
TCP port. If TCP is being used to encapsulate traffic that is
normally sent over UDP, then the the most obvious port choice for the
TCP encapsulated version is the equivalent port value in the TCP port
namespace.
Simply using TCP instead of UDP may be enough in some cases to
mitigate the connectivity problems of using UDP with NATs and other
middleboxes. However, it may be useful to also add a layer of
encryption to the stream using TLS to obfuscate the contents of the
stream. This may be done for security and privacy reasons, or to
prevent middleboxes from mishandling encapsulated traffic or
ossifying around a particular format for encapsulation.
5. Performance Considerations
Many encapsulation or tunnelling protocols utilize an underlying
transport like UDP, which does not provide stateful features such as
loss recovery or congestion control. Because encapsulation using TCP
involves an additional layer of state that is shared among all
traffic inside the tunnel, there are additional performance
considerations to address.
Even though this document describes encapsulating datagrams or
messages inside a TCP stream, some protocols, such as ESP, themselves
often encapsulate additional TCP streams, such as when transmitting
data for a VPN protocol [RFC8229]. This introduces several potential
sources of suboptimal behavior, as multiple TCP contexts act upon the
same traffic.
For the purposes of this discussion, we will refer to the TCP
encapsulation context as the "outer" TCP context, while the TCP
context applicable to any encapsulated protocol will be referred to
as the "inner" TCP context.
The use of an outer TCP context may cause signals from the network to
be hidden from the inner TCP contexts. Depending on the signals that
the inner TCP contexts use for indicating congestion, events that
would otherwise result in a modification of behavior may go
unnoticed, or may build up until a large modification of behavior is
necessary. Generally, the main areas of concern are signals that
Pauly & Kinnear Expires December 27, 2018 [Page 5]
Internet-Draft TCP Encapsulation June 2018
inform loss recovery, Bufferbloat and delay avoidance, and head of
line blocking between streams.
5.1. Loss Recovery
5.1.1. Concern
The outer TCP context experiences packet loss on the network
directly, while any inner TCP contexts present observe the effects of
that loss on the delivery of their packets by the encapsulation
layer. Furthermore, inner TCP contexts still observe direct network
effects for any network segments that are traversed outside of the
encapsulation, as is common with a VPN.
In this way, the outer TCP context masks packet loss from the inner
contexts by retransmitting encapsulated segments to recover from
those losses. An inner context observes this as a delay while the
packets are retransmitted rather than a loss. This can lead to
spurious retransmissions if the recovery of the lost packets takes
longer than the inner context's retransmission timeout (RTO). Since
the outer context is retransmitting the packets to make up for the
losses, the spurious retransmissions waste bandwidth that could be
used for packets that advance the progress of the flows being
encapsulated. A RTO event on an inner TCP context also hinders
performance beyond generating spurious retransmissions, as many TCP
congestion control algorithms dramatically reduce the sending rate
after an RTO is observed.
When recovery from a loss event on the outer TCP context completes,
the network or endpoint on the other end of the encapsulation will
receive a potentially large burst of packets as the retransmitted
packets fill in any gaps and the entire set of pending data can be
delivered.
If content from multiple inner flows is shared within a single TCP
packet in the outer context, the effects of lost packets from the
outer context will be experienced by more than one inner flow at a
time. However, this loss is actually shared by all inner flows,
since forward progress for the entire encapsulation tunnel is
generally blocked until the lost segments can be filled in. This is
discussed further in Section 5.3.
5.1.2. Mitigation
Generally, TCP congestion controls and loss recovery algorithms are
capable of recovering from loss events very efficiently, and the
inner TCP contexts observe brief periods of added delay without much
penalty.
Pauly & Kinnear Expires December 27, 2018 [Page 6]
Internet-Draft TCP Encapsulation June 2018
A TCP congestion control should be selected and tuned to be able to
gracefully handle extremely variable RTT values, which may already
the case for some congestion controls, as RTT variance is often
greatly increased in mobile and cellular networks.
Additionally, use of a TCP congestion control that considers delay to
be a sign of congestion may help the coordination between inner and
outer TCP contexts. LEDBAT [RFC6817] and BBR
[I-D.cardwell-iccrg-bbr-congestion-control] are two examples of delay
based congestion control that an inner TCP context could use to
properly interpret loss events experienced by the outer TCP context.
Care must be taken to ensure that any TCP congestion control in use
is also appropriate for an inner context to use on any network
segments that are traversed outside of the encapsulation.
Since any losses will be handled by the outer TCP context, it might
seem reasonable to modify the the inner TCP contexts' loss recovery
algorithms to prevent retransmissions, there are often network
segments outside of the encapsulated segments that still rely on the
inner contexts' loss recovery algorithms. Instead, spurious
retransmissions can be reduced by ensuring that RTO values are tuned
such that the outer TCP context will fully time out before any inner
TCP contexts.
5.2. Bufferbloat
5.2.1. Concern
"Bufferbloat", or delay introduced by consistently full large buffers
along a network path [TSV2011] [BB2011], can increase observed RTTs
along a network path, which can harm the performance of latency
sensitive applications. Any spurious retransmissions sent on the
network take place in queues that would otherwise be filled by useful
data. In this case, any retransmission sent by an inner TCP context
for a loss or timeout along the network segments also covered by the
outer TCP context is considered to be spurious. This can pose a
performance problem for implementations that rely on interactive data
transfer.
Additionally, because there may be multiple inner TCP contexts being
multiplexed over a single outer TCP context, even a minor reduction
in sending rate by each of the inner contexts can result in a
dramatic decrease in data sent through the outer context. Similarly,
an increase in sending rate is also amplified.
Pauly & Kinnear Expires December 27, 2018 [Page 7]
Internet-Draft TCP Encapsulation June 2018
5.2.2. Mitigation
Great care should be taken in tuning the inner TCP congestion control
to avoid spurious retransmissions as much as possible. However, in
order to provide effective loss recovery for the segments of the
network outside the tunnel, the set of parameters used for tuning
needs to be viable both inside and outside the tunnel. Adjusting the
retransmission timeout (RTO) value for the TCP congestion control on
the inner TCP context to be greater than that of the out TCP context
will often help to reduce the number of spurious retransmissions
generated while the outer TCP context attempts to catch up with lost
or reordered packets.
In most cases, fast retransmit will be sufficient to recover from
losses on network segments after the inner flows leave the tunnel,
although loss events that trigger a full RTO on those last-mile
segments will carry a higher penalty with such tuning. However, in
many deployments, the last-mile segments will often observe lower
loss rates than the first-mile segments, leading to a balance that
often favors spurious retransmission avoidance on the first-mile over
loss recovery speed on the last-mile.
5.3. Head of Line Blocking
5.3.1. Concern
Because TCP provides in-order delivery and reliability, even if there
are multiple flows being multiplexed over the encapsulation layer,
loss events, spurious retransmissions, or other recovery efforts will
cause data for all other flows to back up and not be delivered to the
client. In deployments where there are additional network segments
to traverse beyond the encapsulation boundary, this may mean that
flows are not delivered onto those segments until recovery for the
outer TCP context is complete.
With UDP encapsulation, packet reordering and loss did not
necessarily prevent data from being delivered, even if it was
delivered out of order. Because TCP groups all data being
encapsulated into one outer congestion control and loss recovery
context, this may cause significant delays for flows not directly
impacted by a recovery event.
Reordering on the network will also cause problems in this case, as
it will often trigger fast retransmissions on the outer TCP context,
blocking all inner contexts from being able to deliver data until the
retransmissions are complete. However, a well behaved TCP will
reorder the data that arrived out of order and deliver it before the
Pauly & Kinnear Expires December 27, 2018 [Page 8]
Internet-Draft TCP Encapsulation June 2018
retransmissions arrive, reducing the detrimental impact of such
reordering.
5.3.2. Mitigation
One option to help address the head of line blocking would be to run
multiple tunnels, one for throughput sensitive flows and one for
latency sensitive flows. This can help to reduce the amount of time
that a latency sensitive flow can possibly be blocked on recovery for
any other flow. Latency sensitive flows should take extra care to
ensure that only the necessary amount of data is in flight at any
given time.
Explicit Congestion Notification (ECN) ([RFC3168], [RFC5562]) could
also be used to communicate between outer and inner TCP contexts
during any recovery scenario. In a strategy similar to that taken by
tunnelling of ECN fields in IP-in-IP tunnels [RFC6040], if an
implementation supports such behavior, any ECN markings communicated
to the outer TCP context by the network could be passed through to
any inner TCP contexts transported by a given packet. Alternately,
an implementation could elect to pass through such markings to all
inner TCP contexts if a greater reduction in sending rate was deemed
to be necessary.
6. Security Considerations
Any attacker on the path that observes the encapsulation could
potentially discard packets from the outer TCP context and cause
significant delays due to head of line blocking. However, an
attacker in a position to arbitrarily discard packets could have a
similar effect on the inner TCP context directly or on any other
encapsulation schemes.
7. IANA Considerations
This document has no request to IANA.
8. Informative References
[BB2011] "Bufferbloat: Dark Buffers in the Internet", n.d..
[I-D.cardwell-iccrg-bbr-congestion-control]
Cardwell, N., Cheng, Y., Yeganeh, S., and V. Jacobson,
"BBR Congestion Control", draft-cardwell-iccrg-bbr-
congestion-control-00 (work in progress), July 2017.
Pauly & Kinnear Expires December 27, 2018 [Page 9]
Internet-Draft TCP Encapsulation June 2018
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
of Explicit Congestion Notification (ECN) to IP",
RFC 3168, DOI 10.17487/RFC3168, September 2001,
<https://www.rfc-editor.org/info/rfc3168>.
[RFC5562] Kuzmanovic, A., Mondal, A., Floyd, S., and K.
Ramakrishnan, "Adding Explicit Congestion Notification
(ECN) Capability to TCP's SYN/ACK Packets", RFC 5562,
DOI 10.17487/RFC5562, June 2009,
<https://www.rfc-editor.org/info/rfc5562>.
[RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion
Notification", RFC 6040, DOI 10.17487/RFC6040, November
2010, <https://www.rfc-editor.org/info/rfc6040>.
[RFC6817] Shalunov, S., Hazel, G., Iyengar, J., and M. Kuehlewind,
"Low Extra Delay Background Transport (LEDBAT)", RFC 6817,
DOI 10.17487/RFC6817, December 2012,
<https://www.rfc-editor.org/info/rfc6817>.
[RFC8229] Pauly, T., Touati, S., and R. Mantha, "TCP Encapsulation
of IKE and IPsec Packets", RFC 8229, DOI 10.17487/RFC8229,
August 2017, <https://www.rfc-editor.org/info/rfc8229>.
[TSV2011] "Bufferbloat: Dark Buffers in the Internet", March 2011.
Authors' Addresses
Tommy Pauly
Apple Inc.
One Apple Park Way
Cupertino, California 95014
United States of America
Email: tpauly@apple.com
Eric Kinnear
Apple Inc.
One Apple Park Way
Cupertino, California 95014
United States of America
Email: ekinnear@apple.com
Pauly & Kinnear Expires December 27, 2018 [Page 10]