Internet DRAFT - draft-song-dclc-tcpdc
draft-song-dclc-tcpdc
INTERNET-DRAFT M. Ghobadi
Intended Status: Standard Track Microsoft Research
Expires: April 30, 2015 H. Song
R. Huang
Huawei
Y. Ganjali
University of Toronto
October 27, 2014
TCP Parameter Dynamic Control
draft-song-dclc-tcpdc-04
Abstract
Congestion control has been extensively studied for many years.
Today, the Transmission Control Protocol (TCP) is used in a wide
range of networks (LAN, WAN, data center, campus network, enterprise
network, etc.) as the de facto congestion control mechanism. Despite
its common usage, TCP operates in these networks with little
knowledge of the underlying network or traffic characteristics. As a
result, it is deemed to continuously increase or decrease its
congestion window size in order to handle changes in the network or
traffic conditions. Thus, TCP frequently overshoots or undershoots
the ideal rate making it a "Jack of all trades, master of none"
congestion control protocol. In light of the emerging popularity of
centrally controlled networks such as Software-Defined Networks
(SDNs), we propose a framework that takes advantage of the
information available at the central controller to improve TCP.
Specifically, in this document, we propose OpenTCP as a dynamic and
programmable TCP adaptation framework for centrally controlled
networks. OpenTCP gathers global information about the status of the
network and traffic conditions through the centralized controller,
and uses this information to adapt TCP. OpenTCP periodically sends
updates to end-hosts which, in turn, update their behaviour using a
simple kernel module.
This document describes a framework and message flows for centralized
congestion control parameter adaptation based on congestion control
policies and network status measurements, so that each end host in a
network can make better use of the network resource according to the
available resources. In the rest of this document we use TCP as a
standard congestion control mechanism, but the same idea can be
applied to other congestion control protocols as well. A TCP
Optimization Element and a TCP Optimization Agent are introduced. The
message patterns include request response and
subscription/notification. This mechanism can be used in network
Song&Huang Expires April 30, 2015 [Page 1]
INTERNET DRAFT TCP Parameter Dynamic Control October 27, 2014
service providers' networks, as well as in data center networks.
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as
Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/1id-abstracts.html
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
Copyright and License Notice
Copyright (c) 2014 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Conventions Used in This Document . . . . . . . . . . . . . . . 6
3 TCP Parameter Control Architecture . . . . . . . . . . . . . . 7
3.1 Guidance Level . . . . . . . . . . . . . . . . . . . . . . 8
Song&Huang Expires April 30, 2015 [Page 2]
INTERNET DRAFT TCP Parameter Dynamic Control October 27, 2014
3.2 Subscription Mode . . . . . . . . . . . . . . . . . . . . . 8
3.3 Request/Response Mode . . . . . . . . . . . . . . . . . . . 8
4 Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.1 Explicit RR . . . . . . . . . . . . . . . . . . . . . . . . 9
4.1.1 TcpParReq . . . . . . . . . . . . . . . . . . . . . . . 9
4.1.2 TcpParRes . . . . . . . . . . . . . . . . . . . . . . . 9
4.2 Subscription/Notification . . . . . . . . . . . . . . . . . 10
4.2.1 TcpParSub . . . . . . . . . . . . . . . . . . . . . . . 10
4.2.2 Notification . . . . . . . . . . . . . . . . . . . . . 11
4.3 Error Message . . . . . . . . . . . . . . . . . . . . . . . 11
5 Security Considerations . . . . . . . . . . . . . . . . . . . . 12
5 Fairness and Stability . . . . . . . . . . . . . . . . . . . . 12
7 IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 13
8 References . . . . . . . . . . . . . . . . . . . . . . . . . . 13
8.1 Normative References . . . . . . . . . . . . . . . . . . . 13
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 13
Song&Huang Expires April 30, 2015 [Page 3]
INTERNET DRAFT TCP Parameter Dynamic Control October 27, 2014
1 Introduction
100 +------------------------------------------------
|
| + +
80 +-+-------------------------------++-+---------+-+
| + + + + + +
| + +++++++ ++ + +
60 +---+---------+------+--------++------+-------+--
| + + + + + +
Utilization | + + + + + +
(%) 40 +------+----+---------+----+------------+----+---
| + ++ + + + +
| +++ ++ ++
20 +------------------------------------------------
|
|
0 +---------------+-------------------+------------
Day 1 Day 2
Figure 1 Link Utilization Rate during A Day
The Transmission Control Protocol (TCP) is used in a wide range of
networks as the congestion control mechanism. Measurements reveal
that 99.91% of traffic in Microsoft data centers is TCP, 10% of the
aggregate North America Internet traffic is YouTube over TCP, and
measurements from 10 major data centers including university,
enterprise, and cloud data centers show TCP as the dominant
congestion control protocol. TCP is a mature protocol and has been
extensively studied over a number of years. Hence, network operators
trust TCP as their congestion control mechanism to maximize the
bandwidth utilization of their network while keeping the network
stable.
Despite, and because of, its common usage, TCP operates in these
networks with little knowledge of the underlying network or traffic
characteristics. However, limiting TCP to a specific network and
taking advantage of the local characteristics of that network can
lead to major performance gains. For instance, DCTCP out-performs TCP
in data center networks, even though the results might not be
applicable in the Internet. With this mindset, one can adjust TCP
(the protocol itself and its parameters) to gain better performance
in specific networks (e.g. data centers). Moreover, even focusing on
a particular network, the effect of dynamic congestion control
adaptation to traffic patterns is not well understood in today's
Song&Huang Expires April 30, 2015 [Page 4]
INTERNET DRAFT TCP Parameter Dynamic Control October 27, 2014
networks. Such adaptation can potentially lead to major improvements,
as it provides another dimension that today's TCP does not explore.
Figure 1 depicts aggregate link utilization of a core link in a back-
bone service provider in North America[Hotnets]. We can see that the
link utilization is low for a significant period (below 50% for 6-8
hours). A pattern is seen on all the links in this network. In fact,
the presented link has the highest utilization and is considered to
be the bottleneck in this network. If the network operator aims at
minimizing flow completion times in this network, it makes sense to
increase TCP's initial congestion window size (init_cwnd) when the
network is not highly utilized (we focus on internal traffic in this
example). Ideally, the exact value of init_cwnd should be a function
of the network-wide state (here, the number of flow initiations in
the system) and how aggressively the operator wants the system to
behave (congestion control policy). The operator can define a policy
like the following: if link utilization is below 50%, init_cwnd
should be increased to 20 segments instead of the default value of
four segments. In other words, given the appropriate mechanisms the
operator could choose the right value for the initial congestion
window.
The forwarding capacity of the network is evolving very fast
nowadays. When the TCP was designed, the routers and switches have
low capacity, and the network was easy to be congested. So it was
designed with a very small initial congestion window. But small
initial congestion window size means more cycles during the slow
start period. So for Linux 3.0, Google proposed to increase the
init_cwnd. For example, when 1095 < MSS <= 2190,the original
init_cwnd = 3, but in Linux 3.0, Google proposes to increase it to
10. However, that's still a fixed number without considerations of
the network variations. In some areas of the world, the network
condition is much better than that of other areas. That init_cwnd
size should be even bigger to provide better performance for
applications inside that area (when both sender and receiver are
inside that area).
Currently, network operators use various ad-hoc solutions, as
temporary adjustments of TCP to fit their network and traffic. These
manual tweaks open the way for misconstruction, make debugging and
troubleshooting difficult, and can result in substantial operational
overhead. Moreover, making any changes to the underlying assumptions
about the network or traffic requires rethinking the impact of
various parameters and can result in ongoing efforts to manually
adjust TCP because any proposed change should work under all
conditions. Having a system that measures the state and dynamics of
the network and adapts TCP's behaviour accordingly can address these
problems.
Song&Huang Expires April 30, 2015 [Page 5]
INTERNET DRAFT TCP Parameter Dynamic Control October 27, 2014
This document addresses the need for a systematic way of adapting TCP
to network and traffic conditions. We propose OpenTCP as a framework
for dynamic adaptation of TCP based on network and traffic conditions
in centrally controlled networks. Figure 2 provides a schematic view
of how OpenTCP works. OpenTCP collects data on the underlying network
state (e.g. topology and routing information) as well as statistics
about network traffic (e.g. link utilization and traffic matrix).
Then, using this aggregated information and based on congestion
control policies defined by the network operator, OpenTCP determines
a specific set of adaptations for TCP.
At a high level, congestion control policies define which statistics
need to be collected, which high level performance metrics the
operator would like to optimize (e.g. minimize drops, maximize
utilization, or minimize flow completion times), and what the
constraints of the system are. OpenTCP periodically sends Congestion
Update Epistles or CUEs to the end-hosts which, in turn, update their
behaviour using a simple kernel module that can adapt TCP.
Consider the following simple example. Imagine a network where all
links have very low utilization (say below 50%) at all times. If the
network operator aims at minimizing flow completion times in this
network, it makes sense to increase the TCP initial congestion window
size, as suggested by Dukkipati et al. The exact value of the initial
congestion window will be a function of the number of flow
initiations in the system (network state), and how aggressively the
operator wants the system to behave (congestion control policy). For
a network where dropping a few packets is not a major problem, the
operator can define a policy like the following: if all link
utilizations are below 50%, the initial congestion window size can be
increased to 20 segments instead of the default value of four. If the
operator is more conservative, the window size can be set to a
smaller value (e.g. 5 segments), improving flow completion times with
smaller risk of causing packet drops. The operator can even leave it
to OpenTCP to dynamically choose the right value for the initial
congestion window size.
It is also possible to change the TCP timeout behaviors according to
the network status. When the timeout happens during the period that
relative network link utilization is under 50% (the cwnd size does
not exceed the peak buffer size, and the rate does not exceed the
subscription rate), the cwnd can be remained the same, without
reducing it tremendously, if the sending rate does not exceed the
subscription rate (upload rate of the sender and download rate of the
receiver) nor overflow the receiver's receiving window.
2 Conventions Used in This Document
Song&Huang Expires April 30, 2015 [Page 6]
INTERNET DRAFT TCP Parameter Dynamic Control October 27, 2014
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [KEYWORDS].
This document also uses the following conventions.
TOE: TCP Optimization Element, which accesses the network statistical
information from network measurement entities, such as an OAM server,
NMS, or a LMAP server and etc, and provides the TCP optimization
service to the TCP Optimization Agent (TOA).
TOA: TCP Optimization Agent, which is deployed in the end host, and
adjust the TCP stack behavior according to the guidance from the TOE.
Note that one TOA can serve multiple applications.
3 TCP Parameter Control Architecture
--------------
+-------+ / \
| | | |
| TOE +----| Internet |
| | | /
+---+---+ \---------------
-- | --
-- | --
--- | ---
-- | --
-- | --
+----------+ +----------+ +----------+
| +---+ | | +---+ | | +---+ |
| |TOA| | | |TOA| | | |TOA| |
| +---+ | | +---+ | | +---+ |
| End host | | End host | | End host |
+----------+ +----------+ +----------+
Figure 2 OpenTCP Architecture
It is assumed that there is existing method for the TOE to get the
routing information and network status for each link in a network,
for example, from a PCE server. Then the TOE knows the possible path
for each communication, and it also knows about the link utilization
rate, lost ratio, and the statistics information of the link and the
network. The TOE contemplates the network utilization rate at
different time during a day, and sets the TCP optimization parameters
accordingly. For example, from the midnight to early morning, the
network utilization is very low, end hosts can use larger init_cwnd,
Song&Huang Expires April 30, 2015 [Page 7]
INTERNET DRAFT TCP Parameter Dynamic Control October 27, 2014
size and the window size degradation behavior can be much slower
during time-out or receiving the same ACK event.
3.1 Guidance Level
There are different types of guidance from the TOE according to
different network levels.
The normal type would be the TCP optimization parameter for the whole
administrative network domain. When source end host and the
destination end host are inside the same administrative network
domain, they are suggested to use the parameters provided by the TOE
to optimize the TCP transport. The domain can be an intra DC network,
a LAN network or a NSP network.
Another type is TCP optimization parameter for a particular link, for
example, TOE provides optimization parameters to end hosts in two
data centers which share an inter-DC dedicated link. When the link is
congested, the TOE suggests the end hosts to use smaller init_cwnd
size and reduce the congestion window sharply during time-out or
replicated ACKs. This type of service is only available when the
source end host and the destination end host are deployed at two ends
of a particular link.
When either one of the communication endpoints is out of the scope of
the administrative boundaries, the recommendation TCP optimization
parameters MUST NOT be used.
3.2 Subscription Mode
TOA can use subscription mode to communicate with the TOE to get
updated TCP optimization parameters. This is very useful for long-
lived traffic, as well as for end hosts which have frequent TCP
connections. The guidance level can be either the network level or
the link level.
3.3 Request/Response Mode
TOA can also use the request response mode to communicate with the
TOE. With each TCP optimization request, the TOA lists the two
communication end hosts IP address, and indicate the level of
guidance. Then TOE gives the response of the current recommendation
parameters for TCP transport.
4 Messages
A TOA uses the HTTP protocol with an HTTP POST entity body of JSON
Song&Huang Expires April 30, 2015 [Page 8]
INTERNET DRAFT TCP Parameter Dynamic Control October 27, 2014
Objects, to request the TCP parameter guidance from a TOE server.
4.1 Explicit RR
Explicit request and response mode is mainly used for the guidance of
TCP parameters between two endpoints. If the path between two
endpoints is a dedicated link, it is easier to give the guidance with
considering the two endpoint properties and the link utilization
status. When the path between two endpoints is within the
administrative domain of the TOE, but subject to change (for example,
the route may be changed through routers), then the TOE should give
conservative guidance parameters.
4.1.1 TcpParReq
object {
TypedEndpointAddr: source;
TypedEndpointAddr: destination;
}TcpParReq;
Typed Endpoint Address: Typed Endpoint Addresses are encoded as
strings of the format 'AddressType:EndpointAddr', with the ':'
character as a separator. The type 'TypedEndpointAddr' is used to
indicate a string of this format.This document defines two values for
AddressType: 'ipv4' to refer to IPv4 addresses, and 'ipv6' to refer
to IPv6 addresses. EndpointAddr component of TypedEndPointAddr is
also encoded as a string. The exact characters and format depend on
AddressType. This document defines EndpointAddr when AddressType is
'ipv4' or 'ipv6'. IPv4 Endpoint Addresses are encoded as specified
by the 'IPv4address' rule in Section 3.2.2 of [RFC3986]. IPv6
Endpoint Addresses are encoded as specified in Section 4 of
[RFC5952].
Upon receive this request, TOE should lookup the subscription rate,
i.e. uplink rate quota of the source and the downlink rate quota of
the destination, and then examine the current link utilization rate,
then gives the appropriate TCP parameter guidance.
The media type for explicit request is "application/opentcp-rr+json".
4.1.2 TcpParRes
object {
TcpPar: parameters<0...*>;
}TcpParRes;
object {
ParType -> ParValue;
Song&Huang Expires April 30, 2015 [Page 9]
INTERNET DRAFT TCP Parameter Dynamic Control October 27, 2014
}TcpPar;
ParType: A JSONString defined the TCP parameter type, this document
defines the "initcwnd", "threshold", "timeOut", and
"repeatedtimeouts". (It is open for discussion).
ParValue: A JSONValue defined the value for the relative parameter
type.
The media type for explicit response is "application/opentcp-
rrparameters+json".
4.2 Subscription/Notification
This method is mainly used for getting the guidance for the TCP
parameters in the administrative domain, but can also be used for
long-lived traffic flows. In the response, it has indications on when
to change the TCP parameters.
4.2.1 TcpParSub
object {
JSONString: subscription_id;
JSONValue: request_type;
[TypedEndpointAddr: source;]
[TypedEndpointAddr: destination;]
GuidanceLevel: level;
}TcpParSub
subscription_id: a JSONString generated by the TOE to uniquely
identify a subscription. If it is the first time for this TOA to send
this particular subscription to the TOE, the subscription_id must be
"null". After the TOA gets the subscription_id from the TOE, it has
to insert the id for each following subscription message for the same
link or network guidance information.
request_type: this document defines the type "0" for unsubscription,
and "1" for the first time subscription and the following polls to
check if there is any update.
TypedEndpointAddr: the same as defined in previous sections.
GuidanceLevel: A JSONString which defines the level of guidance. This
document defines the value of "link" and "AS".
Destination address is optional. When the source end host sends
subscription for its TCP parameter guidance on the administrative
domain, it does not need the destination address. However, when the
Song&Huang Expires April 30, 2015 [Page 10]
INTERNET DRAFT TCP Parameter Dynamic Control October 27, 2014
end host sends subscription for the link, it has to provide the
destination address.
The media type for subscription is "application/opentcp-sub+json".
4.2.2 Notification
object {
JSONString: subscription_id;
[ConditionedTcpPar: cparameters<0...*>;]
}TcpParNotify
object {
Condition conditions<0...*>;
TcpPar: parameters<0...*>;
}ConditionedTcpPar;
subscription-id: a JSONString generated by the TOE to uniquely
identify a subscription.
Condition: A condition contains three entities separated by
whitespace: (1) a JSONString indicated the link or network status, or
the subscriber property, this document defines "link-utilization-
rate", "network-utilization-rate", "source-uplink-sub-rate", and
"destination-download-sub-rate". (2) an operator, 'gt' for greater
than, 'lt' for less than, 'ge' for greater than or equal to, 'le' for
less than or equal to, or 'eq' for equal to; (3) a target JSONValue.
The JSONValue is a number indicated to compare with the previous
status.
The media type for subscription is "application/opentcp-notify+json".
The TCP parameter guidance will be sent to the IP address/port which
subscribed earlier. When the template has changed, the TOE will send
an immediate notification to relative TOAs.
Note that the guidance delivers the message such as when network
utilization is between 50% to 80%, then the recommended parameters
are given. So it means the TOA also has to get the change of the
relative network status. Network or link status notification was
assumed to be provided by other protocols, but if needed, this
document can also be expanded to deliver the relative status. (Open
issue)
4.3 Error Message
TBD.
Song&Huang Expires April 30, 2015 [Page 11]
INTERNET DRAFT TCP Parameter Dynamic Control October 27, 2014
5 Security Considerations
Dynamic control of TCP parameters can be used for attacks and can
cause serious problems to the network or to the applications.
If there are no proper mechanisms to monitor the network, it may be
used to maliciously change the TCP parameters and cause network
congestion. But in most environments it can be avoided as there are
rate limitations.
It can also be used to attack the end hosts. So a mechanism to
protect the illegal modification is needed.
5 Fairness and Stability
Fairness and stability are guaranteed as long as the changes in
parameters are TCP-friendly. In other words, as long as the changes
do not deviate the equilibrium formula of TCP protocol. It is
possible that the network operator to defines a different metric for
fairness such as weighted fairness. For example, the operator might
want to give search queries a higher priority compared to background
flows. In that sense, fairness between the two classes of flows is
not meaningful. However, fairness among one set of flows is
guaranteed as long as they are all using the same TCP parameters and
follow TCP's algorithm to increase/decrease their congestion window
sizes.
In practical settings, we assume that the network operator has
expertise in defining the congestion control policies appropriate for
the network. To achieve pragmatic stability and fairness, there can
be a monitoring system in place which alerts the operator of churns
and instabilities in the network. This monitoring component should
alert the controller whenever there is an oscillation between states.
For example, if the controller is making adjustments to TCP flows in
time t1 and immediately in time t1 + T those changes are reverted
back, it is a good indication of a potential unstable condition. In
this case, the operator should be notified by the monitoring system
to adjust either the congestion control policies, the stability
constraints, or the overall re- sources in the network. One simple
stability metric is number of times a rule is applied and reverted.
The monitoring system can measure such stability metrics and alert
the operator.
6 Acknowledgement
Song&Huang Expires April 30, 2015 [Page 12]
INTERNET DRAFT TCP Parameter Dynamic Control October 27, 2014
Lingli Deng has provided many valuable comments to this document.
7 IANA Considerations
TBD.
8 References
8.1 Normative References
[KEYWORDS] Bradner, S., "Key words for use in RFCs to
Indicate Requirement Levels", BCP 14, RFC 2119, March
1997.
[RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter,
"Uniform Resource Identifier (URI): Generic Syntax", STD
66, RFC 3986, January 2005.
[RFC5952] Kawamura, S. and M. Kawashima, "A
Recommendation for IPv6 Address Text Representation", RFC
5952, August 2010.
[Hotnets] Ghobadi, M., Yeganeh, S. H., and Y. Ganjali,
"Rethinking End-to-End Congestion Control in Software-
Defined Networks", Hotnets '12, October 29-30, 2012,
Seattle, WA, USA.
Authors' Addresses
Monia Ghobadi
Email: monia@cs.toronto.edu
Haibin Song
EMail: haibin.song@huawei.com
Rachel Huang
Email: rachel.huang@huawei.com
Yashar Ganjali
Email: yganjali@cs.toronto.edu
Song&Huang Expires April 30, 2015 [Page 13]
INTERNET DRAFT TCP Parameter Dynamic Control October 27, 2014
Song&Huang Expires April 30, 2015 [Page 14]