Internet DRAFT - draft-dunbar-e2e-latency-arch-view-and-gaps
draft-dunbar-e2e-latency-arch-view-and-gaps
Network working group L. Dunbar
Internet Draft Huawei
Category: Informational
Expires: November 2019
August 30, 2018
Architectural View of E2E Latency and Gaps
draft-dunbar-e2e-latency-arch-view-and-gaps-02.txt
Abstract
Ultra-Low Latency is a highly desired property for many types of
services, such as 5G MTC (Machine Type Communication) requiring
E2E connection for V2V to be less than 2ms, AR/VR requiring delay
less than 5ms, V2X less than 20ms, etc.
This draft examines the E2E latency from architectural
perspective, from studying how different OSI layers contribute to
E2E latency, how different domains, which can be different
operators' domains or administrative domains, contribute to E2E
latency, to analyzing the gaps of recent technology advancement
in reducing latency.
By studying the contributing factors to E2E latency from various
angles, the draft identifies some gaps of recent technology
advancement for E2E services traversing multiple domains and
involving multiple layers. The discussion might touch upon
multiple IETF areas.
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current
Internet-Drafts is at
https://datatracker.ietf.org/drafts/current/.
Dunbar, et al Expires November 2019 [Page 1]
Internet-Draft E2E Over Internet Latency Taxonomy
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-
Drafts as reference material or to cite them other than as "work
in progress."
This Internet-Draft will expire on February 23, 2019.
Copyright Notice
Copyright (c) 2018 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with
respect to this document. Code Components extracted from this
document must include Simplified BSD License text as described in
Section 4.e of the Trust Legal Provisions and are provided
without warranty as described in the Simplified BSD License.
Table of Contents
1. Introduction................................................. 4
2. Terminology.................................................. 4
3. AR/VR Use Case............................................... 5
4. Contributing Factors to E2E Latency.......................... 5
5. Application Layer Initiative in reducing E2E latency......... 6
5.1. Content Placement mechanisms need visibility to Network. 6
6. Transport Layer Initiatives in reducing Latency and gaps..... 7
6.1. TCP Layer Latency Improvement Alone is not enough....... 7
6.2. LTE Latency Impact on TCP Performance................... 8
6.3. Low Latency via Multipath TCP Extension................. 8
7. Network and Link Layer Initiatives in reducing E2E Latency... 9
8. Radio Channel Quality Impact to flows with High QoS......... 10
9. E2E Latency Contributed by multiple domains................. 10
10. Conclusion................................................. 11
11. Security Considerations.................................... 11
12. IANA Considerations........................................ 11
13. Acknowledgements........................................... 11
14. References................................................. 11
14.1. Normative References.................................. 11
Dunbar, et al [Page 2]
Internet-Draft E2E Over Internet Latency Taxonomy
14.2. Informative References................................ 11
15. Appendix:.................................................. 12
15.1. Example: multi-Segments Latency for services via
Cellular Access............................................. 12
15.2. Latency contributed by multiple nodes................. 13
15.3. Latency through the Data Center that hosts S-GW & P-GW 14
Authors' Addresses............................................. 15
Dunbar, et al [Page 3]
Internet-Draft E2E Over Internet Latency Taxonomy
1. Introduction
Ultra-Low Latency is a highly desired property for many types of
services, such as 5G MTC (Machine Type Communication) requiring
E2E connection for V2V to be less than 2ms, AR/VR requiring delay
less than 5ms, V2X less than 20ms, etc.
This draft is to examine E2E latency from architectural
perspective, from studying how different OSI layers contribute to
E2E latency, how different domains, which can be different
operators' domains or administrative domains, contribute to E2E
latency, to analyzing the gaps of recent technology advancement
in reducing latency.
The primary purpose of studying E2E Latency from architectural
perspective is to help the IETF community identify potential work
areas for reducing E2E latency of services over the Internet.
In recent years, the internet industry has been exploring
technologies and innovations at all layers of the OSI stack to
reduce latency. At the upper (application) layer, more contents
are distributed to the edges closer to end points and more
progress in Mobile Edge Computing (MEC) has been made. At the
Transport layer, there are QUIC/L4S initiatives. At the network
layer, there are IP/MPLS Hardened pipe (RFC 7625), latency
optimized router design, and BBF's Broadband Assured Services
(BAS). At the link layer, there are IETF DETNET, IEEE 802.1 TSN
(Time Sensitive Networking), and Flex Ethernet (OIF).
By studying the contributing factors to E2E latency from various
angles, the draft identifies some gaps of recent technology
advancement for E2E services traversing multiple domains and
involving multiple layers. The discussion might touch upon
multiple IETF areas.
2. Terminology
DA: Destination Address
DC: Data Center
E2E: End To End
GTP: GPRS Tunneling Protocol (GTP) is a group of IP-based
communications protocols used to carry general packet
Dunbar, et al [Page 4]
Internet-Draft E2E Over Internet Latency Taxonomy
radio service (GPRS) within GSM, UMTS and LTE networks.
In 3GPP architectures, GTP can be decomposed into
separate protocols, GTP-C, GTP-U and GTP'. GTP-C is
used for signaling. GTP-U is used for carrying user
data.
LTE: Long Term Evolution
TS: Tenant System
VM: Virtual Machines
VN: Virtual Network
3. AR/VR Use Case
The E-2-E delays of AR/VR system come from delay of multiple
systems:
- Tracking delay
- Application delay
- Rendering delay
- Display delay
For human beings not to feel dizzy viewing AR/VR images, the
oculus delay should be less than 19.3ms, which includes display
delay, computing delay, transport delay, and sensoring delay.
That means the "Network Delay" budget is only 5ms at the most.
4. Contributing Factors to E2E Latency
Internet data is packaged and transported in small pieces of
data. The flow of these small pieces of data directly affects a
user's internet experience. When data packets arrive in a smooth
and timely manner, the user sees a continuous flow of data; if
data packets arrive with large and variable delays between
packets, the user's experience is degraded.
Key contributing factors to E2E latency:
- Generation: delay between physical event and availability of
data
Dunbar, et al [Page 5]
Internet-Draft E2E Over Internet Latency Taxonomy
- Transmission: signal propagation, initial signal encoding
- Processing: Forwarding, encap/decap, NAT, encryption,
authentication, compress, error coding, signal translation
- Multiplexing: Delays needed to support sharing; Shared channel
acquisition, output queuing, connection establishment
- Grouping: Reduces frequency of control information and
processing; Packetization, message aggregation
The 2013 ISOC Workshop [Latency-ISOC] on Internet Latency
concluded that:
o Bandwidth alone is not enough in reducing latency
o Bufferbloat is one of the main causes for high latency in
the Internet.
Figure 1 of the 2013 ISOC workshop report showed that the timing
of download of an apparently uncluttered example Web page
(ieeexplore.ieee.org), actually comprised of over one hundred
objects, transferred over 23 connections needing 10 different DNS
look-ups. This phenomenon just further proves that reducing E2E
latency will need multiple layers coordination and interaction.
5. Application Layer Initiative in reducing E2E latency
More and more End to End services over internet are from end
users/devices to applications hosted in data centers.
As most content today is distributed, E2E services usually do not
traverse the globe but rather more often than not, the network
segments that the E2E service traverses are from end users to
regional data centers. The practice of content distribution to
the edge has transformed reaching low latency goals from fighting
against the speed of light to optimizing communication between
end users and their desired content.
However, without awareness of latency characteristics of network
segments, the content distribution mechanisms & algorithms might
not achieve their intended optimal result.
5.1. Content Placement mechanisms need visibility to Network
To be added.
Dunbar, et al [Page 6]
Internet-Draft E2E Over Internet Latency Taxonomy
6. Transport Layer Initiatives in reducing Latency and gaps
IETF QUIC, L4S are some of the initiatives in reducing E2E
latency at the Transport Layer.
IETF QUIC focus on the improvement from end points. It doesn't
take into consideration of the network latency that the data
packets traverse.
The IETF L4S uses AQM for network nodes to purposely drop packets
or send indication to end points when their queues are above
certain thresholds. The goal is for the end nodes to reduce
transmission rate when intermediate nodes buffers are almost
full. It has following issues:
As network aggregates many flows from many different end points
and most flows have variable data rate, an intermediate network
node+port's buffer being almost full at one specific time
doesn't mean that the same amount of traffic will traverse the
same port a few microseconds later. If all end (source) points
reduce transmission rate upon receiving the AQM indication (or
experiencing packets drop), traffic through the network can be
greatly reduced (i.e. leaving no queue in the buffer). Then all
end points can increase their rate, causing traffic pattern
oscillation and buffer congestion again.
6.1. TCP Layer Latency Improvement Alone is not enough
The following example shows why simply optimizing transport layer
alone is not enough. More details can be found at
https://www.w3.org/Protocols/HTTP/Performance/Pipeline.html.
Typical web pages today contain a HyperText Markup Language
(HTML) document and many embedded images. Twenty or more
embedded images are quite common. Each of these images is an
independent object in the Web, retrieved (or validated for
change) separately. The common behavior for a web client,
therefore, is to fetch the base HTML document, and then
immediately fetch the embedded objects, which are typically
located on the same server.
The large number of embedded objects represents a change from
the environment in which the Web transfer protocol, the
Hypertext Transfer Protocol (HTTP), was designed. As a result,
HTTP/1.0 handles multiple requests from the same server
Dunbar, et al [Page 7]
Internet-Draft E2E Over Internet Latency Taxonomy
inefficiently, creating a separate TCP connection for each
object.
6.2. LTE Latency Impact on TCP Performance
HTTP/TCP is the dominating application and transport layer
protocol suite used on the internet today. According to HTTP
Archive (http://httparchive.org/trends.php), the typical size of
HTTP based transactions over the internet are in the range of a
few 10's of Kbytes up to 1 Mbyte. In this size range, the TCP
slow start period is a significant part of the total transport
period of the packet stream.
During TCP slow start, TCP exponentially increases its congestion
window, i.e. the number of segments it brings into flight, until
it fully utilizes the throughput that LTE (Radio + EPC) can
offer. The incremental increases are based on TCP ACKs which are
received after one round trip delay in the LTE system. Thus, as
it turns out, during TCP slow start the performance is latency
limited in Radio Network (LTE). Hence, improved latency in LTE
can improve the perceived data rate for TCP based data
transactions, which in its turn reduces the time it takes to
complete a data down-load or upload.
Despite rather small (in terms of milliseconds) improvements that
can be achieved over the radio round trip time, the total
increase in the perceived throughput and delay savings of
downloading an item below 1MB is significant due to the additive
effect of LTE latency improvements in the TCP slow start[LTE-
Research].
6.3. Low Latency via Multipath TCP Extension
There are some research work on how to use multi-path TCP to
reduce E2E latency, such as
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7510787. The
paper proposes an MPTCP extension that sends data redundantly
over multiple paths in the network, which basically exchanges
bandwidth for latency. The integration into the MPTCP protocol
provides benefits such as transparent end-to-end connection
establishment, multipath-enabled congestion control, and the
prevention of head of line blocking. The research paper claims
that their proposed Multipath TCP extension can halve the average
round-trip time and reduce its standard deviation by a factor of
19 for a real world mobile scenario in a stressed environment.
Dunbar, et al [Page 8]
Internet-Draft E2E Over Internet Latency Taxonomy
Those kind of researchers should be invited to the "Reducing
latency over Internet Deep-Dive" workshop or cross-area BOF (to
be organized by IAB).
7. Network and Link Layer Initiatives in reducing E2E Latency
Several industry initiatives already exist for improving latency
at the Link and Network layers:
- Link Layer: IEEE 802.1 TSN (Time Sensitive Networking), and
Flex Ethernet (OIF).
- The network layer: IETF DETNET, IP/MPLS Hardened pipe (RFC
7625).
Gaps:
IEEE 802.1 TSN (Time Sensitive Networking) requires stringent
synchronous timing among all the nodes, which is suitable for
small scoped network, but not suitable for the internet because
most routers/switches in the network don't support synchronous
timing.
IP/MPLS hardened pipe can guarantee no congestion and no
buffering on all nodes along the path, therefore, ensure the
lowest latency along the path. The hardened pipe is ideal for
flows with steady bandwidth requirement.
But for applications that don't have steady flow size, the
hardened pipe requires reserving the peak rate dedicated
channels, which, like TDM, will incur bandwidth waste when
application traffic goes below peak rate.
Traffic Engineering is one of the most commonly used methods to
reduce congestion at the network layer. However, it doesn't
completely prevent transient congestion. Depending on the tunnel
sizing, there could be momentary traffic bursts that exceed the
tunnel size, thus causing congestion if there isn't adequate
headroom on the trunk carrying the tunnel to absorb the burst. Or
a link or node outage, that reroutes the tunnel onto a secondary
path that becomes overloaded, could cause congestion.
Dunbar, et al [Page 9]
Internet-Draft E2E Over Internet Latency Taxonomy
8. Radio Channel Quality Impact to flows with High QoS.
QoS is one of the key methods employed by fixed IP network to
reduce latency for some flows. However, in Radio network, if a
UE's channel condition is poor, the eNB may schedule more frames
to other UEs whose flow are marked with much lower QoS.
There are many studies showing how Radio quality negatively
impact to the TCP performance.
It is beneficial to the whole industry if there is a workshop to
get people or SDOs working on different layers of Internet
service together to showcase their work or their pain points.
IESG can make much more informed decision on creating useful
initiatives when the community is aware of other work and
obstacles.
9. E2E Latency Contributed by multiple domains
All of the latency improvement initiatives in the link layer have
been within a single domain, such as IETF DETNET, IEEE 802.1 TSN
(Time Sensitive Networking), and Flex Ethernet (OIF). The network
layer latency improvement, such as IP/MPLS Hardened pipe (RFC
7625) is also within a single domain.
But E2E services usually traverse more than one domain, which can
be administrative domains or multiple operators' networks.
Yet today, there is no interface between domains to:
- Inquire about the latency characteristics or capabilities from
another domain
- Negotiate or reserve latency capabilities from another domain.
- Have a standardized method to characterize latency
IETF/IAB is an ideal organization to tackle those issues because
IETF has the expertise.
Dunbar, et al [Page 10]
Internet-Draft E2E Over Internet Latency Taxonomy
10. Conclusion
As end to end services traverse multiple types of network
segments and domains, and involve multiple layers, more informed
decision in each layer technological improvement is important.
- Need across domain coordination
- Need across layer coordination
11. Security Considerations
As the trend is going more encryption, it is getting more
difficult for various network segments to detect applications
sessions. Therefore, it is more important to create ways for
better coordination among different layers, for improved latency,
trouble shooting, restoration, etc.
12. IANA Considerations
This section gives IANA allocation and registry considerations.
13. Acknowledgements
Special thanks to Jari Arkko for encouraging writing this draft.
And many thanks to Andy Malis, Jim Guichard, Spenser Dawkins, and
Donald Eastlake for suggestions and comments to this draft.
14. References
14.1. Normative References
14.2. Informative References
[LTE-latency] https://www.ericsson.com/research-blog/lte/lte-
latency-improvement-gains/
[Latency-ISOC] 2013 ISOC organized Latency over Internet workshop
report
Dunbar, et al [Page 11]
Internet-Draft E2E Over Internet Latency Taxonomy
15. Appendix:
15.1. Example: multi-Segments Latency for services via Cellular
Access
Via Cellular network, there are User Plane Latency and Control
Plane Latency. Control plane deals with signaling and control
functions, while user plane deals with actual user data
transmission.
The User Plane latency can be measured by the time it takes for a
small IP packet to travel from the terminal through the network
to the internet server, and back. The Control Plane latency is
measured as the time required for the UE (User Equipment) to
transit from idle state to active state.
User Plane latency is relevant for the performance of many
applications. This document mainly focuses on the User Plane
Latency. The following diagram depicts a logical path from an end
user (smart phone) application to the application controller
hosted in a data center via 4G Mobile network, which utilize the
Evolved Packet Core (EPC).
+------+ +---------+
|DC | | EPC | +----+
|Apps |<----------->|P-GW/S-GW|< -------> | eNB|<---> UE
| | +---------+ Mobile +----+ Radio
+------+ Internet Backhaul Access
Mobility Management Entity (MME) is responsible for
authentication of the mobile device. MME retains location
information for each user and then selects the Serving Gateway
(S-GW) for a UE at the initial attach and at time of intra-LTE
handover involving Core Network (CN) node relocation.
The Serving Gateway (S-GW) resides in the user plane where it
forwards and routes packets to and from the eNodeB (eNB)
and packet data network gateway (P-GW). The S-GW also serves as
the local mobility anchor for inter-eNodeB handover and mobility
between 3GPP networks.
P-GW (Packet Data Network Gateway) provides connectivity from the
UE to external packet data networks by being the point of exit
and entry of traffic for the UE. A UE may have simultaneous
connectivity with more than one P-GW for accessing multiple
Packet Data Networks. The P-GW performs policy enforcement,
Dunbar, et al [Page 12]
Internet-Draft E2E Over Internet Latency Taxonomy
packet filtering for each user, charging support, lawful
interception and packet screening. Another key role of the P-GW
is to act as the anchor for mobility between 3GPP and non-3GPP
technologies such as WiMAX and 3GPP2 (CDMA 1X and EvDO).
Very often P-GW and S-GW are co-located. The data traffic between
eNB and S-GW is encapsulated by GTP-U.
The figure above shows that the end to end services from/to UE
consists of the following network segments:
- Radio Access network - RAN
- Mobile Backhaul network that connect eNB to S-GW.
- Network within the DC that hosts S-GW & P-GW
- Packet Data Network, which can dedicated VPN, internet, or
other data network.
- Network within the DC that hosts the App.
The RAN (Radio Access Network) is between UE (e.g. smart phone)
and eNB. 3GPP has a group TSG RAN working on improving
performance (including latency) of the Radio Access network.
There are many factors impacting the latency through RAN.
The Mobile Backhaul Network connects eNBs to S-GW/P-GW, with data
traffic being encapsulated in GTP protocol. The number of UEs
that one eNB can handle are in 100s. The number of UEs that one
S-GW/P-GW can handle are in millions. Therefore, the mobile
backhaul network connects 10s of thousands of eNBs to S-GW/P-GW.
Therefore, the number of network nodes in the Mobile Backhaul
network can be very large. Therefore, any new protocol
improvement in reducing latency can play a big part in reducing
the overall latency for the end to end services.
15.2. Latency contributed by multiple nodes
The variant of delay for data packets through network is caused
by network nodes along the path as the transmission delay on
physical link is fixed. When there is no congestion, the latency
across most routers and switches are very small, in the magnitude
of ~20us (worst case in ~40us). When congestion occurs within a
node, i.e. with buffer/queues being used to avoid dropping
packets, latency across a node can be in the magnitude of micro-
seconds. The recent improvements made within router architecture
have greatly improved latency through a node. However, there is
Dunbar, et al [Page 13]
Internet-Draft E2E Over Internet Latency Taxonomy
no standard methods for routers to characterize and expose
various latency characteristics through a network node.
Data packets also traverse through network functions, such as FW,
DPI, OPS, whose latency vary depending on the depth of the
processing and the equipment performance.
15.3. Latency through the Data Center that hosts S-GW & P-GW
S-GW and P-GW are hosted in Data center. There are typically 2~3
tiers of switches connecting the servers that hosts S-GW & P-GW
to the external network, as depicted in the following:
+---------+
| Gateway |
+---------+
\ +-------+ +------+ /
\ +/------+ | +/-----+ | /
\ | Aggr11| + ----- |AggrN1| + /
\ +---+---+/ +------+/ /
\ / \ / \ /
\ / \ / \ /
\ +---+ +---+ +---+ +---+ /
\- |T11|... |T1x| |T21| ... |T2y|---
+---+ +---+ +---+ +---+
| | | |
+-|-+ +-|-+ +-|-+ +-|-+ Servers
| |... |SGW| | S | | S |<-
+---+ +---+ +---+ +---+
| |... |PGW| | S | ... | S |
+---+ +---+ +---+ +---+
| |... | S | | S | ... | S |
+---+ +---+ +---+ +---+
As the distance within data center can be small, the transmission
delay within data center can be negligent. The majority of
latency within data center is caused by the switching within the
gateway routers, traffic traversing through middleware boxes such
as FW, DPI, IPS, value added services, the top of the rack
switches, and aggregation switches.
Dunbar, et al [Page 14]
Internet-Draft E2E Over Internet Latency Taxonomy
If the S-GW and P-GW are hosted in large data center, there could
be latency contributed by the
encapsulation/decapsulation such as work specified by
NVO3.
Authors' Addresses
Linda Dunbar
Huawei Technologies
5430 Legacy Drive, Suite #175
Plano, TX 75024, USA
Phone: (469) 277 5840
Email: linda.dunbar@huawei.com
Dunbar, et al [Page 15]