Internet-Draft | IP Traffic Flow Security | January 2021 |
Hopps | Expires 23 July 2021 | [Page] |
This document describes a mechanism to enhance IPsec traffic flow security by adding traffic flow confidentiality to encrypted IP encapsulated traffic. Traffic flow confidentiality is provided by obscuring the size and frequency of IP traffic using a fixed-sized, constant-send-rate IPsec tunnel. The solution allows for congestion control as well as non-constant send-rate usage.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 23 July 2021.¶
Copyright (c) 2021 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.¶
Traffic Analysis ([RFC4301], [AppCrypt]) is the act of extracting information about data being sent through a network. While one may directly obscure the data through the use of encryption [RFC4303], the traffic pattern itself exposes information due to variations in it's shape and timing ([I-D.iab-wire-image], [AppCrypt]). Hiding the size and frequency of traffic is referred to as Traffic Flow Confidentiality (TFC) per [RFC4303].¶
[RFC4303] provides for TFC by allowing padding to be added to encrypted IP packets and allowing for transmission of all-pad packets (indicated using protocol 59). This method has the major limitation that it can significantly under-utilize the available bandwidth.¶
The IP-TFS solution provides for full TFC without the aforementioned bandwidth limitation. This is accomplished by using a constant-send-rate IPsec [RFC4303] tunnel with fixed-sized encapsulating packets; however, these fixed-sized packets can contain partial, whole or multiple IP packets to maximize the bandwidth of the tunnel. A non-constant send-rate is allowed, but the confidentiality properties of its use are outside the scope of this document.¶
For a comparison of the overhead of IP-TFS with the RFC4303 prescribed TFC solution see Appendix C.¶
Additionally, IP-TFS provides for dealing with network congestion [RFC2914]. This is important for when the IP-TFS user is not in full control of the domain through which the IP-TFS tunnel path flows.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
This document assumes familiarity with IP security concepts described in [RFC4301].¶
As mentioned in Section 1 IP-TFS utilizes an IPsec [RFC4303] tunnel (SA) as it's transport. To provide for full TFC, fixed-sized encapsulating packets are sent at a constant rate on the tunnel.¶
The primary input to the tunnel algorithm is the requested bandwidth used by the tunnel. Two values are then required to provide for this bandwidth, the fixed size of the encapsulating packets, and rate at which to send them.¶
The fixed packet size MAY either be specified manually or could be determined through the other methods such as the Packetization Layer MTU Discovery (PLMTUD) ([RFC4821], [RFC8899]) or Path MTU discovery (PMTUD) ([RFC1191], [RFC8201]). PMTUD is known to have issues so PLMTUD is considered the more robust option.¶
Given the encapsulating packet size and the requested tunnel used bandwidth, the corresponding packet send rate can be calculated. The packet send rate is the requested bandwidth divided by the size of the encapsulating packet.¶
The egress of the IP-TFS tunnel MUST allow for and expect the ingress (sending) side of the IP-TFS tunnel to vary the size and rate of sent encapsulating packets, unless constrained by other policy.¶
As previously mentioned, one issue with the TFC padding solution in [RFC4303] is the large amount of wasted bandwidth as only one IP packet can be sent per encapsulating packet. In order to maximize bandwidth IP-TFS breaks this one-to-one association.¶
IP-TFS aggregates as well as fragments the inner IP traffic flow into fixed-sized encapsulating IPsec tunnel packets. Padding is only added to the the tunnel packets if there is no data available to be sent at the time of tunnel packet transmission, or if fragmentation has been disabled by the receiver.¶
This is accomplished using a new Encapsulating Security Payload (ESP, [RFC4303]) type which is identified by the number AGGFRAG_PAYLOAD (Section 6.1).¶
Other non-IP-TFS uses of this aggregation and fragmentation encapsulation have been identified, such as increased performance through packet aggregation, as well as handling MTU issues using fragmentation. These uses are not defined here, but are also not restricted by this document.¶
The AGGFRAG_PAYLOAD payload content defined in this document is comprised of a 4 or 24 octet header followed by either a partial, a full or multiple partial or full data blocks. The following diagram illustrates this payload within the ESP packet. See Section 6.1 for the exact formats of the AGGFRAG_PAYLOAD payload.¶
The BlockOffset
value is either zero or some offset into or past
the end of the DataBlocks
data.¶
If the BlockOffset
value is zero it means that the DataBlocks
data begins with a new data block.¶
Conversely, if the BlockOffset
value is non-zero it points to the
start of the new data block, and the initial DataBlocks
data
belongs to a previous data block that is still being re-assembled.¶
The BlockOffset
can point past the end of the DataBlocks
data
which indicates that the next data block occurs in a subsequent
encapsulating packet.¶
Having the BlockOffset
always point at the next available data
block allows for recovering the next inner packet in the
presence of outer encapsulating packet loss.¶
An example IP-TFS packet flow can be found in Appendix A.¶
A data block is defined by a 4-bit type code followed by the data block data. The type values have been carefully chosen to coincide with the IPv4/IPv6 version field values so that no per-data block type overhead is required to encapsulate an IP packet. Likewise, the length of the data block is extracted from the encapsulated IPv4 or IPv6 packet's length field.¶
It's worth noting that since a data block type is identified by its
first octet there is never a need for an implicit pad at the end of
an encapsulating packet. Even when the start of a data block occurs
near the end of a encapsulating packet such that there is no room for
the length field of the encapsulated header to be included in the
current encapsulating packet, the fact that the length comes at a
known location and is guaranteed to be present is enough to fetch the
length field from the subsequent encapsulating packet payload. Only
when there is no data to encapsulated is end padding required, and
then an explicit Pad Data Block
would be used to identify the
padding.¶
In order for a receiver to be able to reassemble fragmented
inner-packets, the sender MUST send the inner-packet fragments
back-to-back in the logical outer packet stream (i.e., using
consecutive ESP sequence numbers). However, the sender is allowed to
insert "all-pad" payloads (i.e., payloads with a BlockOffset
of
zero and a single pad DataBlock
) in between the packets carrying
the inner-packet fragment payloads. This possible interleaving of
all-pad payloads allows the sender to always be able to send a tunnel
packet, regardless of the encapsulation computational requirements.¶
When a receiver is reassembling an inner-packet, and it receives an "all-pad" payload, it increments the expected sequence number that the next inner-packet fragment is expected to arrive in.¶
Given the above, the receiver will need to handle out-of-order arrival of outer ESP packets prior to reassembly processing. ESP already provides for optionally detecting replay attacks. Detecting replay attacks normally utilizes a window method. A similar sequence number based sliding window can be used to correct re-ordering of the outer packet stream. Receiving a larger (newer) sequence number packet advances the window, and received older ESP packets whose sequence numbers the window has passed by are dropped. A good choice for the size of this window depends on the amount of re-ordering the user may normally experience.¶
As the amount of reordering that may be present is hard to predict the window size SHOULD be configurable by the user. Implementations MAY also dynamically adjust the reordering window based on actual reordering seen in arriving packets. Finally, we note that as IP-TFS is sending a continuous stream of packets there is no requirement for timers (although there's no prohibition either) as newly arrived packets will cause the window to advance and older packets will then be processed as they leave the window. Implementations that are concerned about memory use when packets are delayed (e.g., when an SA deletion is delayed) can of course use timers to drop packets as well.¶
While ESP guarantees an increasing sequence number with subsequently sent packets, it does not actually require the sequence numbers to be generated with no gaps (e.g., sending only even numbered sequence numbers would be allowed as long as they are always increasing). Gaps in the sequence numbers will not work for this specification so the sequence number stream is further restricted to not contain gaps (i.e., each subsequent outer packet must be sent with the sequence number incremented by 1).¶
When using the AGGFRAG_PAYLOAD in conjunction with replay detection, the window size for both MAY be reduced to share the smaller of the two window sizes. This is b/c packets outside of the smaller window but inside the larger would still be dropped by the mechanism with the smaller window size.¶
Finally, as sequence numbers are reset when switching SAs (e.g., when re-keying a child SA), an implementation SHOULD NOT send initial fragments of an inner packet using one SA and subsequent fragments in a different SA.¶
When the tunnel bandwidth is not being fully utilized, an implementation MAY pad-out the current encapsulating packet in order to deliver an inner packet un-fragmented in the following outer packet. The benefit would be to avoid inner-packet fragmentation in the presence of a bursty offered load (non-bursty traffic will naturally not fragment). An implementation MAY also choose to allow for a minimum fragment size to be configured (e.g., as a percentage of the AGGFRAG_PAYLOAD payload size) to avoid fragmentation at the cost of tunnel bandwidth. The cost with these methods is complexity and added delay of inner traffic. The main advantage to avoiding fragmentation is to minimize inner packet loss in the presence of outer packet loss. When this is worthwhile (e.g., how much loss and what type of loss is required, given different inner traffic shapes and utilization, for this to make sense), and what values to use for the allowable/added delay may be worth researching, but is outside the scope of this document.¶
While use of padding to avoid fragmentation does not impact interoperability, used inappropriately it can reduce the effective throughput of a tunnel. Implementations implementing either of the above approaches will need to take care to not reduce the effective capacity, and overall utility, of the tunnel through the overuse of padding.¶
In order to support reporting of congestion control information (described later) on a non-AGGFRAG_PAYLOAD enabled SA, IP-TFS allows for the sending of an AGGFRAG_PAYLOAD payload with no data blocks (i.e., the ESP payload length is equal to the AGGFRAG_PAYLOAD header length). This special payload is called an empty payload.¶
[RFC4301] provides some direction on when and how to map various values from an inner IP header to the outer encapsulating header, namely the Don't-Fragment (DF) bit ([RFC0791] and [RFC8200]), the Differentiated Services (DS) field [RFC2474] and the Explicit Congestion Notification (ECN) field [RFC3168]. Unlike [RFC4301], IP-TFS may and often will be encapsulating more than one IP packet per ESP packet. To deal with this, these mappings are restricted further. In particular IP-TFS never maps the inner DF bit as it is unrelated to the IP-TFS tunnel functionality; IP-TFS never IP fragments the inner packets and the inner packets will not affect the fragmentation of the outer encapsulation packets. Likewise, the ECN value need not be mapped as any congestion related to the constant-send-rate IP-TFS tunnel is unrelated (by design!) to the inner traffic flow. Finally, by default the DS field SHOULD NOT be copied although an implementation MAY choose to allow for configuration to override this behavior. An implementation SHOULD also allow the DS value to be set by configuration.¶
It is worth noting that an implementation MAY still set the ECN value of inner packets based on the normal ECN specification ([RFC3168]).¶
[RFC4301] specifies how to modify the inner packet TTL ([RFC0791]).¶
Any errors (e.g., ICMP errors arriving back at the tunnel ingress due to tunnel traffic) should be handled the same as with non IP-TFS IPsec tunnels.¶
Unlike [RFC4301], there is normally no effective MTU (EMTU) on an IP-TFS tunnel as all IP packet sizes are properly transmitted without requiring IP fragmentation prior to tunnel ingress. That said, an implementation MAY allow for explicitly configuring an MTU for the tunnel.¶
If IP-TFS fragmentation has been disabled, then the tunnel's EMTU and behaviors are the same as normal IPsec tunnels ([RFC4301]).¶
It is not the intention of this specification to allow for mixed use of an AGGFRAG_PAYLOAD enabled SA. In other words, an SA that has AGGFRAG_PAYLOAD enabled MUST NOT have non-AGGFRAG_PAYLOAD payloads such as IP (IP protocol 4), TCP transport (IP protocol 6), or ESP pad packets (protocol 59) intermixed with non-empty AGGFRAG_PAYLOAD payloads. Empty AGGFRAG_PAYLOAD payloads (Section 2.2.4) are used to transmit congestion control information on non-IP-TFS enabled SAs, so intermixing is allowed in this specific case. While it's possible to envision making the algorithm work in the presence of sequence number skips in the AGGFRAG_PAYLOAD payload stream, the added complexity is not deemed worthwhile. Other IPsec uses can configure and use their own SAs.¶
Just as with normal IPsec/ESP tunnels, IP-TFS tunnels are unidirectional. Bidirectional IP-TFS functionality is achieved by setting up 2 IP-TFS tunnels, one in either direction.¶
An IP-TFS tunnel can operate in 2 modes, a non-congestion controlled mode and congestion controlled mode.¶
In the non-congestion controlled mode IP-TFS sends fixed-sized packets at a constant rate. The packet send rate is constant and is not automatically adjusted regardless of any network congestion (e.g., packet loss).¶
For similar reasons as given in [RFC7510] the non-congestion controlled mode should only be used where the user has full administrative control over the path the tunnel will take. This is required so the user can guarantee the bandwidth and also be sure as to not be negatively affecting network congestion [RFC2914]. In this case packet loss should be reported to the administrator (e.g., via syslog, YANG notification, SNMP traps, etc) so that any failures due to a lack of bandwidth can be corrected.¶
With the congestion controlled mode, IP-TFS adapts to network congestion by lowering the packet send rate to accommodate the congestion, as well as raising the rate when congestion subsides. Since overhead is per packet, by allowing for maximal fixed-size packets and varying the send rate transport overhead is minimized.¶
The output of the congestion control algorithm will adjust the rate at which the ingress sends packets. While this document does not require a specific congestion control algorithm, best current practice RECOMMENDS that the algorithm conform to [RFC5348]. Congestion control principles are documented in [RFC2914] as well. An example of an implementation of the [RFC5348] algorithm which matches the requirements of IP-TFS (i.e., designed for fixed-size packet and send rate varied based on congestion) is documented in [RFC4342].¶
The required inputs for the TCP friendly rate control algorithm described in [RFC5348] are the receiver's loss event rate and the sender's estimated round-trip time (RTT). These values are provided by IP-TFS using the congestion information header fields described in Section 3. In particular these values are sufficient to implement the algorithm described in [RFC5348].¶
At a minimum, the congestion information must be sent, from the receiver and from the sender, at least once per RTT. Prior to establishing an RTT the information SHOULD be sent constantly from the sender and the receiver so that an RTT estimate can be established. The lack of receiving this information over multiple consecutive RTT intervals should be considered a congestion event that causes the sender to adjust it's sending rate lower. For example, [RFC4342] calls this the "no feedback timeout" and it is equal to 4 RTT intervals. When a "no feedback timeout" has occurred [RFC4342] halves the sending rate.¶
An implementation MAY choose to always include the congestion
information in it's IP-TFS payload header if sending on an IP-TFS
enabled SA. Since IP-TFS normally will operate with a large packet
size, the congestion information should represent a small portion of
the available tunnel bandwidth. An implementation choosing to always
send the data MAY also choose to only update the LossEventRate
and RTT
header field values it sends every RTT
though.¶
When an implementation is choosing a congestion control algorithm (or a selection of algorithms) one should remember that IP-TFS is not providing for reliable delivery of IP traffic, and so per packet ACKs are not required and are not provided.¶
It's worth noting that the variable send-rate of a congestion controlled IP-TFS tunnel, is not private; however, this send-rate is being driven by network congestion, and as long as the encapsulated (inner) traffic flow shape and timing are not directly affecting the (outer) network congestion, the variations in the tunnel rate will not weaken the provided inner traffic flow confidentiality.¶
In additional to congestion control, implementations MAY choose to define and implement circuit breakers [RFC8084] as a recovery method of last resort. Enabling circuit breakers is also a reason a user may wish to enable congestion information reports even when using the non-congestion controlled mode of operation. The definition of circuit breakers are outside the scope of this document.¶
In order to support the congestion control mode, the sender needs to know the loss event rate and also be able to approximate the RTT ([RFC5348]). In order to obtain these values the receiver sends congestion control information on it's SA back to the sender. Thus, in order to support congestion control the receiver must have a paired SA back to the sender (this is always the case when the tunnel was created using IKEv2). If the SA back to the sender is a non-AGGFRAG_PAYLOAD enabled SA then an AGGFRAG_PAYLOAD empty payload (i.e., header only) is used to convey the information.¶
In order to calculate a loss event rate compatible with [RFC5348], the
receiver needs to have a round-trip time estimate. Thus the sender
communicates this estimate in the RTT
header field. On startup this
value will be zero as no RTT estimate is yet known.¶
In order for the sender to estimate it's RTT
value, the sender
places a timestamp value in the TVal
header field. On first receipt
of this TVal
, the receiver records the new TVal
value along with
the time it arrived locally, subsequent receipt of the same TVal
MUST not update the recorded time. When the receiver sends it's CC
header it places this latest recorded value in the TEcho
header
field, along with 2 delay values, Echo Delay
and Transmit Delay
.
The Echo Delay
value is the time delta from the recorded arrival
time of TVal
and the current clock in microseconds. The second
value, Transmit Delay
, is the receiver's current transmission delay
on the tunnel (i.e., the average time between sending packets on it's
half of the IP-TFS tunnel). When the sender receives back it's TVal
in the TEcho
header field it calculates 2 RTT estimates. The first
is the actual delay found by subtracting the TEcho
value from it's
current clock and then subtracting Echo Delay
as well. The second
RTT estimate is found by adding the received Transmit Delay
header
value to the senders own transmission delay (i.e., the average time
between sending packets on it's half of the IP-TFS tunnel). The
larger of these 2 RTT estimates SHOULD be used as the RTT
value.
The two estimates are required to handle different combinations of
faster or slower tunnel packet paths with faster or slower fixed
tunnel rates. Choosing the larger of the two values guarantees that
the RTT
is never considered faster than the aggregate transmission
delay based on the IP-TFS tunnel rate (the second estimate), as well
as never being considered faster than the actual RTT along the tunnel
packet path (the first estimate).¶
The receiver also calculates, and communicates in the LossEventRate
header field, the loss event rate for use by the sender. This is
slightly different from [RFC4342] which periodically sends all the loss
interval data back to the sender so that it can do the calculation.
See Appendix B for a suggested way to
calculate the loss event rate value. Initially this value will be
zero (indicating no loss) until enough data has been collected by the
receiver to update it.¶
In additional to normal packet loss information IP-TFS supports use
of the ECN bits in the encapsulating IP header [RFC3168] for
identifying congestion. If ECN use is enabled and a packet arrives at
the egress endpoint with the Congestion Experienced (CE) value set,
then the receiver considers that packet as being dropped, although it
does not drop it. The receiver MUST set the E bit in any
AGGFRAG_PAYLOAD payload header containing a LossEventRate
value
derived from a CE value being considered.¶
As noted in [RFC3168] the ECN bits are not protected by IPsec and thus may constitute a covert channel. For this reason ECN use SHOULD NOT be enabled by default.¶
IP-TFS is meant to be deployable with a minimal amount of configuration. All IP-TFS specific configuration should be able to be specified at the unidirectional tunnel ingress (sending) side. It is intended that non-IKEv2 operation is supported, at least, with local static configuration.¶
Bandwidth is a local configuration option. For non-congestion controlled mode the bandwidth SHOULD be configured. For congestion controlled mode one can configure the bandwidth or have no configuration and let congestion control discover the maximum bandwidth available. No standardized configuration method is required.¶
The fixed packet size to be used for the tunnel encapsulation packets MAY be configured manually or can be automatically determined using other methods such as PLMTUD ([RFC4821], [RFC8899]) or PMTUD ([RFC1191], [RFC8201]). As PMTUD is known to have issues, PLMTUD is considered the more robust option. No standardized configuration method is required.¶
Congestion control is a local configuration option. No standardized configuration method is required.¶
As mentioned previously IP-TFS tunnels utilize ESP payloads of type AGGFRAG_PAYLOAD.¶
When using IKEv2, a new "USE_AGGFRAG" Notification Message is used to enable use of the AGGFRAG_PAYLOAD payload on a child SA pair. The method used is similar to how USE_TRANSPORT_MODE is negotiated, as described in [RFC7296].¶
To request using the AGGFRAG_PAYLOAD payload on the Child SA pair, the initiator includes the USE_AGGFRAG notification in an SA payload requesting a new Child SA (either during the initial IKE_AUTH or during non-rekeying CREATE_CHILD_SA exchanges). If the request is accepted then response MUST also include a notification of type USE_AGGFRAG. If the responder declines the request the child SA will be established without AGGFRAG_PAYLOAD payload use enabled. If this is unacceptable to the initiator, the initiator MUST delete the child SA.¶
The USE_AGGFRAG notification MUST NOT be sent, and MUST be ignored, during a CREATE_CHILD_SA rekeying exchange as it is not allowed to change use of the AGGFRAG_PAYLOAD payload type during rekeying. A new child SA due to re-keying inherits the use of AGGFRAG_PAYLOAD from the re-keyed child SA.¶
The USE_AGGFRAG notification contains a 1 octet payload of flags that specify any requirements from the sender of the message. If any requirement flags are not understood or cannot be supported by the receiver then the receiver should not enable use of AGGFRAG_PAYLOAD payload type (either by not responding with the USE_AGGFRAG notification, or in the case of the initiator, by deleting the child SA if the now established non-AGGFRAG_PAYLOAD using SA is unacceptable).¶
The notification type and payload flag values are defined in Section 6.1.4.¶
ESP Payload Type: 0x5¶
An IP-TFS payload is identified by the ESP payload type AGGFRAG_PAYLOAD which has the value 0x5. The first octet of this payload indicates the format of the remaining payload data.¶
0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+-+-+- | Sub-type | ... +-+-+-+-+-+-+-+-+-+-+-¶
This specification defines 2 payload sub-types. These payload formats are defined in the following sections.¶
The non-congestion control AGGFRAG_PAYLOAD payload is comprised of a 4
octet header followed by a variable amount of DataBlocks
data as
shown below.¶
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sub-Type (0) | Reserved | BlockOffset | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | DataBlocks ... +-+-+-+-+-+-+-+-+-+-+-¶
DataBlocks
data before the start of a
new data block. BlockOffset
can count past the end
of the DataBlocks
data in which case all the
DataBlocks
data belongs to the previous data block
being re-assembled. If the BlockOffset
extends
into subsequent packets it continues to only count
subsequent DataBlocks
data (i.e., it does not
count subsequent packets non-DataBlocks
octets).¶
The congestion control AGGFRAG_PAYLOAD payload is comprised of a 24
octet header followed by a variable amount of DataBlocks
data as
shown below.¶
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sub-type (1) | Reserved |E| BlockOffset | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | LossEventRate | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RTT | Echo Delay ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ... Echo Delay | Transmit Delay | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | TVal | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | TEcho | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | DataBlocks ... +-+-+-+-+-+-+-+-+-+-+-¶
LossEventRate
.¶
1/LossEventRate
.¶
0x3FFFFF
it MUST be set to 0x3FFFFF
.¶
TVal
value which it is sending back in TEcho
. If the value
is equal to or larger than 0x1FFFFF
it MUST be set to
0x1FFFFF
.¶
0x1FFFFF
it MUST
be set to 0x1FFFFF
.¶
TEcho
field, along with
an Echo Delay
value of how long that echo took.¶
TVal
field. The received TVal
is placed in TEcho
along with
an Echo Delay
value indicating how long it has been since
receiving the TVal
value.¶
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | IPv4, IPv6 or pad... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-¶
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0x4 | IHL | TypeOfService | TotalLength | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Rest of the inner packet ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-¶
These values are the actual values within the encapsulated IPv4 header. In other words, the start of this data block is the start of the encapsulated IP packet.¶
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0x6 | TrafficClass | FlowLabel | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | PayloadLength | Rest of the inner packet ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-¶
These values are the actual values within the encapsulated IPv6 header. In other words, the start of this data block is the start of the encapsulated IP packet.¶
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0x0 | Padding ... +-+-+-+-+-+-+-+-+-+-+-¶
As discussed in Section 5.1 a notification message USE_AGGFRAG is used to negotiate use of the ESP AGGFRAG_PAYLOAD payload type.¶
The USE_AGGFRAG Notification Message State Type is (TBD2).¶
The notification payload contains 1 octet of requirement flags. There are currently 2 requirement flags defined. This may be revised by later specifications.¶
+-+-+-+-+-+-+-+-+ |0|0|0|0|0|0|C|D| +-+-+-+-+-+-+-+-+¶
Data Block
). This value only
applies to what the sender is capable of receiving; the sender MAY
still send packet fragments unless similarly restricted by the
receiver in it's USE_AGGFRAG notification.¶
This document requests IANA create a registry called "AGGFRAG_PAYLOAD Sub-Type Registry" under a new category named "ESP AGGFRAG_PAYLOAD Parameters". The registration policy for this registry is "Standards Action" ([RFC8126] and [RFC7120]).¶
This initial content for this registry is as follows:¶
Sub-Type Name Reference -------------------------------------------------------- 0 Non-Congestion Control Format This document 1 Congestion Control Format This document 3-255 Reserved¶
This document requests a status type USE_AGGFRAG be allocated from the "IKEv2 Notify Message Types - Status Types" registry.¶
This document describes a mechanism to add Traffic Flow Confidentiality to IP traffic. Use of this mechanism is expected to increase the security of the traffic being transported. Other than the additional security afforded by using this mechanism, IP-TFS utilizes the security protocols [RFC4303] and [RFC7296] and so their security considerations apply to IP-TFS as well.¶
As noted previously in Section 2.4.2, for TFC to be fully maintained the encapsulated traffic flow should not be affecting network congestion in a predictable way, and if it would be then non-congestion controlled mode use should be considered instead.¶
Below an example inner IP packet flow within the encapsulating tunnel packet stream is shown. Notice how encapsulated IP packets can start and end anywhere, and more than one or less than 1 may occur in a single encapsulating packet.¶
The encapsulated IP packet flow (lengths include IP header and payload) is as follows: an 800 octet packet, an 800 octet packet, a 60 octet packet, a 240 octet packet, a 4000 octet packet.¶
The BlockOffset
values in the 4 IP-TFS payload headers for this
packet flow would thus be: 0, 100, 2900, 1400 respectively. The first
encapsulating packet ESP1 has a zero BlockOffset
which points at the
IP data block immediately following the IP-TFS header. The following
packet ESP2s BlockOffset
points inward 100 octets to the start of the
60 octet data block. The third encapsulating packet ESP3 contains the
middle portion of the 4000 octet data block so the offset points past
its end and into the forth encapsulating packet. The fourth packet
ESP4s offset is 1400 pointing at the padding which follows the
completion of the continued 4000 octet packet.¶
The current best practice indicates that congestion control SHOULD be done in a TCP friendly way. A TCP friendly congestion control algorithm is described in [RFC5348]. For this IP-TFS use case (as with [RFC4342]) the (fixed) packet size is used as the segment size for the algorithm. The main formula in the algorithm for the send rate is then as follows:¶
1 X = ----------------------------------------------- R * (sqrt(2*p/3) + 12*sqrt(3*p/8)*p*(1+32*p^2))¶
Where X
is the send rate in packets per second, R
is the
round trip time estimate and p
is the loss event rate (the inverse
of which is provided by the receiver).¶
In addition the algorithm in [RFC5348] also uses an X_recv
value (the
receiver's receive rate). For IP-TFS one MAY set this value according to
the sender's current tunnel send-rate (X
).¶
The IP-TFS receiver, having the RTT estimate from the sender can use the
same method as described in [RFC5348] and [RFC4342] to collect the loss
intervals and calculate the loss event rate value using the weighted
average as indicated. The receiver communicates the inverse of this
value back to the sender in the AGGFRAG_PAYLOAD payload header field
LossEventRate
.¶
The IP-TFS sender now has both the R
and p
values and can calculate
the correct sending rate. If following [RFC5348] the sender SHOULD also
use the slow start mechanism described therein when the IP-TFS SA is
first established.¶
The overhead of IP-TFS is 40 bytes per outer packet. Therefore the octet overhead per inner packet is 40 divided by the number of outer packets required (fractional allowed). The overhead as a percentage of inner packet size is a constant based on the Outer MTU size.¶
OH = 40 / Outer Payload Size / Inner Packet Size OH % of Inner Packet Size = 100 * OH / Inner Packet Size OH % of Inner Packet Size = 4000 / Outer Payload Size¶
The overhead per inner packet for constant-send-rate padded ESP (i.e., traditional IPsec TFC) is 36 octets plus any padding, unless fragmentation is required.¶
When fragmentation of the inner packet is required to fit in the outer IPsec packet, overhead is the number of outer packets required to carry the fragmented inner packet times both the inner IP overhead (20) and the outer packet overhead (36) minus the initial inner IP overhead plus any required tail padding in the last encapsulation packet. The required tail padding is the number of required packets times the difference of the Outer Payload Size and the IP Overhead minus the Inner Payload Size. So:¶
Inner Paylaod Size = IP Packet Size - IP Overhead Outer Payload Size = MTU - IPsec Overhead Inner Payload Size NF0 = ---------------------------------- Outer Payload Size - IP Overhead NF = CEILING(NF0) OH = NF * (IP Overhead + IPsec Overhead) - IP Overhead + NF * (Outer Payload Size - IP Overhead) - Inner Payload Size OH = NF * (IPsec Overhead + Outer Payload Size) - (IP Overhead + Inner Payload Size) OH = NF * (IPsec Overhead + Outer Payload Size) - Inner Packet Size¶
The following tables collect the overhead values for some common L3 MTU sizes in order to compare them. The first table is the number of octets of overhead for a given L3 MTU sized packet. The second table is the percentage of overhead in the same MTU sized packet.¶
Another way to compare the two solutions is to look at the amount of available bandwidth each solution provides. The following sections consider and compare the percentage of available bandwidth. For the sake of providing a well understood baseline normal (unencrypted) Ethernet as well as normal ESP values are included.¶
In order to calculate the available bandwidth the per packet overhead is calculated first. The total overhead of Ethernet is 14+4 octets of header and CRC plus and additional 20 octets of framing (preamble, start, and inter-packet gap) for a total of 38 octets. Additionally the minimum payload is 46 octets.¶
A sometimes unexpected result of using IP-TFS (or any packet aggregating tunnel) is that, for small to medium sized packets, the available bandwidth is actually greater than native Ethernet. This is due to the reduction in Ethernet framing overhead. This increased bandwidth is paid for with an increase in latency. This latency is the time to send the unrelated octets in the outer tunnel frame. The following table illustrates the latency for some common values on a 10G Ethernet link. The table also includes latency introduced by padding if using ESP with padding.¶
Notice that the latency values are very similar between the two solutions; however, whereas IP-TFS provides for constant high bandwidth, in some cases even exceeding native Ethernet, ESP with padding often greatly reduces available bandwidth.¶
We would like to thank Don Fedyk for help in reviewing and editing this work. We would also like to thank Valery Smyslov for reviews and suggestions for improvements as well as Joseph Touch for the transport area review and suggested improvements.¶
The following people made significant contributions to this document.¶
Lou Berger LabN Consulting, L.L.C. Email: lberger@labn.net¶