Internet DRAFT - draft-morton-tsvwg-interflow-intraflow-delays
draft-morton-tsvwg-interflow-intraflow-delays
Transport Working Group J. Morton
Internet-Draft
Intended status: Informational P. Heist
Expires: 18 November 2021 17 May 2021
Interflow vs Intraflow Delays
draft-morton-tsvwg-interflow-intraflow-delays-00
Abstract
Much current literature discusses queuing delays, and the effects of
different queue disciplines, active queue management algorithms, and
congestion control measures on these delays. This draft highlights
an important distinction between different types of delay, which may
be helpful to practitioners and theoreticians alike.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on 18 November 2021.
Copyright Notice
Copyright (c) 2021 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components
extracted from this document must include Simplified BSD License text
as described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Simplified BSD License.
Morton & Heist Expires 18 November 2021 [Page 1]
Internet-Draft interintraflow May 2021
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Baseline Path Delay (BPD) and Baseline Round-Trip Time
(BRTT) . . . . . . . . . . . . . . . . . . . . . . . . . 3
3. Between-Flow Induced Delay (BFID) . . . . . . . . . . . . . . 4
4. Within-Flow Induced Delay (WFID) . . . . . . . . . . . . . . 5
5. Latency Sensitivity of Traffic . . . . . . . . . . . . . . . 6
6. Security Considerations . . . . . . . . . . . . . . . . . . . 8
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8
8. Informative References . . . . . . . . . . . . . . . . . . . 8
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 8
1. Introduction
Throughput, packet loss ratio, and latency are the three most
prominent performance characteristics of Internet paths. Of these,
throughput has always been the most heavily marketed to consumers,
possibly because it is the only metric from this group in which
bigger numbers are better. Packet loss is also closely managed by
network engineers, and is mostly kept to usefully low levels in
practice, probably because excessive packet loss tends to cripple the
throughput of typical congestion-controlled traffic. However, while
latency has great practical importance to many Internet applications,
it is rarely given the attention it needs for proper management.
One consequence of this neglect is the phenomenon of bufferbloat.
Any given Internet path has a natural baseline delay, which is a
consequence of the speed of information propagation in the physical
media, plus processing delays in network nodes that connect link
segments together, plus (for some link types) additional delays
associated with shared media negotiation. To this baseline, we must
add the delay caused by packets waiting in a queue behind other
packets, which occurs if the link is busy. If the queue is permitted
to grow too much, these additional queuing delays can become very
noticeable to the user, and may even affect the reliability of
Internet protocols.
This document does not discuss in detail the many and varied means of
controlling latency that are currently or might someday become
available. Instead the characteristics of this delay are discussed,
including the distinction between "inter-flow induced delay" and
"intra-flow induced delay". Typically these two types of delay,
despite their similar names, have different effects and may be
controlled by different queue mechanisms. Simple queues, however, do
not attempt to distinguish them.
Morton & Heist Expires 18 November 2021 [Page 2]
Internet-Draft interintraflow May 2021
To improve the likelihood of distinguishing the names, the terms BFID
(Between-Flow Induced Delay) and WFID (Within-Flow Induced Delay)
will be used as synonyms for inter-flow and intra-flow delays,
respectively.
2. Baseline Path Delay (BPD) and Baseline Round-Trip Time (BRTT)
*Definition:* The delay on a one-way path or round-trip due entirely
to link characteristics and unavoidable processing delays.
For the avoidance of doubt, the word "unavoidable" in this definition
refers to the agency of the traffic traversing the path in question,
and not to that of the network operators or equipment manufacturers
involved.
The speed of light is a fundamental limitation on information
transmission velocity, and thus on the minimum latency of a
geographically long Internet path. On radio-based links, this limit
is approached closely; in optical fibre or copper wires, the
transmission velocity is somewhat slower. When avian carriers
[RFC1149] are involved, the transmission velocity necessarily falls
below the speed of sound. In practice, an allowance of one
millisecond round-trip delay per 100km is usually appropriate.
When a packet is received by a network node, it must be directed into
a processing buffer for at least long enough to determine in which
direction it should be sent next. Since the necessary information is
typically in the packet header, this may sometimes be less time than
is necessary to receive the entire packet, in which case the head of
the packet may be sent onward while the tail is still being received.
In other cases, the node may receive the packet in whole before
making a processing decision, and may even aggregate the packet with
others for efficiency of dispatch. This efficiency in throughput or
power consumption may be achieved at the expense of processing delay.
Some link types have significant overhead associated with initiating
a transmission, and/or utilise a shared medium into which only one or
a small number of stations (out of a larger possible total) may
transmit simultaneously. Similar characteristics may also be
exhibited by power-saving measures on portable devices. These may
result in significant and/or variable delays in forwarding over these
links, which cannot be avoided by altering characteristics of the
traffic itself.
In practice, an Internet packet can be sent around the world in about
300 milliseconds with current technology. The round-trip latency
between Eastern Europe and Western North America is presently about
160 milliseconds. A "typical" Internet round-trip delay can be taken
Morton & Heist Expires 18 November 2021 [Page 3]
Internet-Draft interintraflow May 2021
to be 80 milliseconds, though more localised paths are significantly
quicker in this respect. Within a LAN or a datacentre, the baseline
delay will often be less than one millisecond.
Whenever two or more packets require sending over the same link
within the time required to send either one of them, link contention
exists and must be resolved. This generally involves either placing
packets into a queue or discarding them. These practices are not
within the definition of "baseline" delays, but influence "induced"
delays as below.
3. Between-Flow Induced Delay (BFID)
*Definition:* The delay which the presence and volume of one flow
induces in traffic belonging to another flow.
When packets are held in a queue awaiting delivery, the order in
which these packets are dequeued is significant for managing delay.
The most common strategy to date is to employ a simple FIFO queue.
This means that all traffic traversing the same link at about the
same time experience the same amount of queue delay. It also means
that a single flow occupying a large part of the queue induces a
large delay to all other flows sharing that queue, even if without
the presence of that single flow there would be no need for queuing
at all. This is the essence of BFID.
Large BFIDs can be avoided by discriminating flows with high queue
occupancy from those with little or no queue occupancy, and queuing
them separately. One effective method of doing so, that is, placing
every flow in its own FIFO and serving them in deficit-round-robin
order, is described in detail by [RFC8290]; this "flow-isolating"
mechanism reduces the maximum BFID to the serialisation time of one
full-size packet from each active flow, and can be implemented with
or without the use of Active Queue Management. It is also feasible
to merely categorise flows into queue occupancy bands and use a
separate FIFO only for each band; this renders the BFID experienced
by each flow proportionate to the BFID it produces.
BFID can also be reduced in a simple FIFO by implementing Active
Queue Management. This is because in a simple FIFO, BFID and WFID
have the same cause and extent, so reducing WFID also reduces BFID.
The extent to which BFID can be reduced by this method is limited
compared to dedicated methods, and a significant amount of delay
variation typically remains, but this is significantly better than
allowing a large, uncontrolled BFID to exist.
Morton & Heist Expires 18 November 2021 [Page 4]
Internet-Draft interintraflow May 2021
Capacity-seeking flows with little latency sensitivity are
particularly prone to produce BFID, while latency-sensitive flows
that typically use little capacity are particularly affected by
receiving BFID.
4. Within-Flow Induced Delay (WFID)
*Definition:* The delay which the presence and volume of one flow
induces in traffic belonging to itself.
Regardless of the order in which packets are delivered from a queue,
if more than one packet belonging to a given flow is held in a queue,
one of them induces delay to the other by occupying transmission
capacity ahead of it. In general this WFID is calculable as the
product of the packet delivery rate of that flow and the packet
occupancy in the queue of that flow.
In congestion-controlled flows, one typical cause of WFID is that the
flow's congestion window exceeds the baseline Bandwidth-Delay Product
(BDP) of the flow's path, and the queue in question is the
controlling bottleneck defining the Bandwidth factor. This is a
natural result of capacity-seeking behaviour, where the congestion
window is increased continuously until some explicit signal of
capacity overload is detected. If the queue is large and does not
implement Active Queue Management, WFIDs of many seconds are easily
achieved and have been observed in practice.
Another typical cause is that the sender emitted a short-term burst
of packets, which subsequently collects in one or more downstream
queues and is thereby spread out in time at the receiver. This cause
also applies to non-congestion-controlled protocols that can have
large datagram payloads. This form of WFID is usually harmless to
the flow causing it, except that large bursts can exceed the capacity
of a queue to absorb them, resulting in packet loss and the need for
retransmission.
In simple FIFOs, or where a flow-isolating mechanism is defeated by
hash collisions or information hiding, the presence of WFID also
implies the presence of an equal degree of BFID to any other flows
sharing that queue. This implies a responsibility to try to minimise
WFID, even when the flow causing it is not very sensitive to its
effects (as is typical of capacity-seeking protocols). Buffer sizing
guidelines (eg. typical BDP / sqrt(flows) ) are among the simplest
ways to limit WFID to tolerable levels.
Active Queue Management (AQM) is the primary means of effectively
controlling WFID without impairing the ability to absorb short-term
bursts of traffic, by sending congestion signals to flows
Morton & Heist Expires 18 November 2021 [Page 5]
Internet-Draft interintraflow May 2021
experiencing high queue occupancy. Early forms of AQM were only able
to generate congestion signals by artificially inducing packet loss.
ECN [RFC3168] introduced the ability to flag congestion on a packet
without dropping it. AQM may be used alone as in [RFC8289], or in
conjunction with flow-isolation mechanisms as in [RFC8290]. In the
latter case, both WFID and BFID are addressed individually by
natively appropriate mechanisms.
Some flows fail to respond to congestion signals applied by an AQM.
If these flows cause high degrees of WFID, it is reasonable and
probably wise to include a backstop mechanism to prevent them from
completely dominating the queue, by artificially inducing enough
packet loss (without using the ECN "flag" mechanism) to materially
reduce that flow's queue occupancy. If possible, this "queue
protection" mechanism should be specific to the offending flow(s),
such that it mostly avoids dropping packets from appropriately
responsive or inoffensive flows. Without these features, an
unresponsive flow could seriously impair the quality of service of
other flows, either by producing a lot of BFID, or by causing an
overzealous AQM to drop the wrong packets.
5. Latency Sensitivity of Traffic
Some protocols and applications are more sensitive to latency, and
variations in delay, than others. Variations in delay are often
referred to as "jitter", which is the origin of the term "jitter
buffer" commonly used in some types of application.
If the response time for a DNS request exceeds 2 seconds, a timeout
occurs and the request may be retried or an error reported to the
application. Since DNS is a critical support protocol for many
Internet applications, the degree of BFID should be kept well below 2
seconds in all foreseeable cases. DNS timeouts are a significant
cause of user-visible application failure, often resulting in manual
retries and user frustration. If DNS stops working, "the Internet is
down".
Morton & Heist Expires 18 November 2021 [Page 6]
Internet-Draft interintraflow May 2021
Congestion-controlled reliable transports, such as TCP, can have
difficulty recovering from occasional packet loss efficiently if the
effective RTT is high, which can be caused by excessive WFID. The
recovery process may be visible to the user in the form of a "stall"
in the progress of a download or rendering of a Web page, since data
received beyond the lost packet(s) cannot be delivered to the
application until the lost packet's retransmission is successully
received. The duration of the stall is proportional to the effective
RTT, so keeping WFID low can maintain reasonably smooth perceived
application performance even in the face of packet loss and recovery.
Implementing AQM with ECN can also eliminate packet loss entirely, if
the underlying path is sufficiently reliable.
NTP assumes that delay is approximately symmetric on each path. In
the case of BPD, that is usually true except in certain highly
asymmetric routing scenarios. The assumption is violated, however,
in the case where BFID persists for an extended period of time that
exceeds NTP's built-in filter against it. Even quite small degrees
of BFID can distort NTP synchronisation.
VoIP and videoconferencing protocols can usually tolerate a
surprisingly high BRTT, often more than the human users communicating
over them. To accommodate delay variations caused by inherent link
characteristics, BFID and WFID, they require jitter buffers. The
round-trip latency presented to the users is the sum of the BRTT and
the jitter buffers in both directions, so the jitter buffers are
tuned at runtime to be only as large as necessary to accommodate
observed delay variations. Since these protocols usually don't
produce much WFID, protecting them from BFID to the greatest extent
practical will noticeably improve perceived call quality.
Multiplayer games are among the most latency-sensitive applications
visible to consumers. The effective RTT determines how quickly it is
possible for each player to perceive situations in the game and
transmit responses to them. In very fast-paced games, every
millisecond is considered a valuable competitive edge, and
experienced players become highly sensitive to even minor glitches
caused by network disturbances. In slower-paced games, there is
slightly more tolerance, but a significant "lag spike" at an
inopportune moment will still be noticed. Crucially, a defeat caused
by such a glitch is far more difficult for a player to accept than
one caused by his own mistakes or an opponent's genuinely superior
performance. Accordingly, this class of application requires
strictly minimising both BRTT and BFID, even at the expense of
throughput, and should not be routed over links with significant
inherent delay variation characteristics.
Morton & Heist Expires 18 November 2021 [Page 7]
Internet-Draft interintraflow May 2021
6. Security Considerations
This is an informational document and raises no security
considerations.
7. IANA Considerations
There are no IANA considerations.
8. Informative References
[RFC1149] Waitzman, D., "Standard for the transmission of IP
datagrams on avian carriers", RFC 1149,
DOI 10.17487/RFC1149, April 1990,
<https://www.rfc-editor.org/info/rfc1149>.
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
of Explicit Congestion Notification (ECN) to IP",
RFC 3168, DOI 10.17487/RFC3168, September 2001,
<https://www.rfc-editor.org/info/rfc3168>.
[RFC8289] Nichols, K., Jacobson, V., McGregor, A., Ed., and J.
Iyengar, Ed., "Controlled Delay Active Queue Management",
RFC 8289, DOI 10.17487/RFC8289, January 2018,
<https://www.rfc-editor.org/info/rfc8289>.
[RFC8290] Hoeiland-Joergensen, T., McKenney, P., Taht, D., Gettys,
J., and E. Dumazet, "The Flow Queue CoDel Packet Scheduler
and Active Queue Management Algorithm", RFC 8290,
DOI 10.17487/RFC8290, January 2018,
<https://www.rfc-editor.org/info/rfc8290>.
Authors' Addresses
Jonathan Morton
Kokkonranta 21
FI-31520 Pitkajarvi
Finland
Phone: +358 44 927 2377
Email: chromatix99@gmail.com
Peter G. Heist
Redacted
463 11 Liberec 30
Czech Republic
Morton & Heist Expires 18 November 2021 [Page 8]
Internet-Draft interintraflow May 2021
Email: pete@heistp.net
Morton & Heist Expires 18 November 2021 [Page 9]