Internet DRAFT - draft-wei-tsvwg-tunnel-congestion-feedback
draft-wei-tsvwg-tunnel-congestion-feedback
Internet Engineering Task Force X. Wei
INTERNET-DRAFT L.Zhu
Intended Status: Standards Track Huawei Technologies
Expires: January 2, 2016 L.Deng
China Mobile
B.Briscoe
July 1, 2015
Tunnel Congestion Feedback
draft-wei-tsvwg-tunnel-congestion-feedback-04
Abstract
This document describes a mechanism to calculate congestion of a
tunnel segment based on RFC 6040 recommendations, and a feedback
protocol by which to send the measured congestion of the tunnel from
egress to ingress . A basic model for measuring tunnel congestion
and feedback is described, and a protocol for carrying the feedback
data is outlined.
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as
Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/1id-abstracts.html
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
Copyright and License Notice
Copyright (c) 2015 IETF Trust and the persons identified as the
Wei Expires January 2, 2016 [Page 1]
INTERNET DRAFT Tunnel Congestion Feedback July 1, 2015
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3. Congestion Information Feedback Models . . . . . . . . . . . . 4
3.1 Direct Model . . . . . . . . . . . . . . . . . . . . . . . . 4
3.2 Centralized Model . . . . . . . . . . . . . . . . . . . . . 4
4. Congestion Level Measurement . . . . . . . . . . . . . . . . . 5
5. Congestion Information Delivery . . . . . . . . . . . . . . . . 7
5.1 IPFIX Extentions . . . . . . . . . . . . . . . . . . . . . . 7
5.1.1 ce-cePacketTotalCount . . . . . . . . . . . . . . . . . 7
5.1.2 ect-nectPacketTotalCount . . . . . . . . . . . . . . . . 8
5.1.3 ce-nectPacketTotalCount . . . . . . . . . . . . . . . . 8
5.1.4 ce-ectPacketTotalCount . . . . . . . . . . . . . . . . . 8
5.1.5 ect-ectPacketTotalCount . . . . . . . . . . . . . . . . 9
6. Congestion Management . . . . . . . . . . . . . . . . . . . . . 9
7. Security . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 10
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10
9.1 Normative References . . . . . . . . . . . . . . . . . . . 10
9.2 Informative References . . . . . . . . . . . . . . . . . . 10
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11
Wei Expires January 2, 2016 [Page 2]
INTERNET DRAFT Tunnel Congestion Feedback July 1, 2015
1. Introduction
In IP network, persistent congestion (or named congestion collapse)
would cause transport throughput to drop down, lead to waste of
network resource, so appropriate congestion control mechanisms are
critical to make sure the network not fall into persistent congestion
state. Currently, transport protocols such as TCP, SCTP, DCCP, has
their built-in congestion control mechanism, and even for certain
single transport protocol like TCP there could be a couple of
different congestion control mechanism to choose. All these
congestion control mechanisms are implemented on host side, and there
are reasons that only host side congestion control is not sufficient
for the whole network to keep away from persistent congestion, e.g.,
(1) some protocol's congestion control scheme might has internal
design flaws; (2) improper software implementation of protocol; (3)
some transport protocols even don't provide congestion control at
all.
In order to have a better control on network congestion status, it's
necessary for the network side to do certain kind of traffic control.
For example, ConEx [ConEx] provides a method for network operator to
learn about traffic's congestion contribution information, and then
congestion management action could be taken based on this
information.
Tunnels are widely deployed in various networks including public
Internet, datacenter network, and enterprise network etc, a tunnel
consists of an ingress, an egress and a set of interior routers. For
the tunnel scenario, a tunnel-based mechanism which is different from
ConEx is introduced for network traffic control to keep network away
from persistent congestion; in this case, tunnel ingress will
implement congestion management function to control the traffic
entering the tunnel.
In order to do congestion management at ingress, the ingress must
first get the inner tunnel congestion level information. But the
ingress cannot use the locally visible traffic rates, because it
would require additional knowledge of downstream capacity and
topology, as well as cross traffic that does not pass through this
ingress.
This document provide a mechanism of feeding back inner tunnel
congestion level to ingress, using this mechanism the egress could
feed the tunnel congestion level information it collects back to
ingress, after receiving the information ingress could do congestion
management according to network management policy.
2. Conventions
Wei Expires January 2, 2016 [Page 3]
INTERNET DRAFT Tunnel Congestion Feedback July 1, 2015
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119]
3. Congestion Information Feedback Models
According to specific network deployment, there are two kinds of
feedback model: direct model and centralized model.
3.1 Direct Model
Feedback
|-----------------------------------------|
| |
| |
| V
+----------+ tunnel +-----------+
|Egress |========================== |Inress |
|(Exporter)| |(Collector)|
+----------+ +-----------+
(a) Direct Feedback Model.
Direct model means egress feeds information directly to ingress. In
this model, egress collects network congestion level information and
feedback the information to ingress for congestion management. The
ingress here will act as both decision point that decides how to do
congestion management and action point that implements congestion
management decision.
3.2 Centralized Model
Feedback +-----------+
--------->|Controller |#####################
| |(Collector)| #
| +-----------+ #
| #
+----------+ tunnel +-----V-+
|Egress | ===========================|Ingress|
|(Exporter)| +-------+
+----------+
(b) Centralized Feedback Model
There are scenarios that ingress only takes the role of action point,
and it implements traffic control decision from another entity, named
"controller" here.
Wei Expires January 2, 2016 [Page 4]
INTERNET DRAFT Tunnel Congestion Feedback July 1, 2015
In this model, after egress collects network congestion level
information, it feeds back the information to controller instead of
ingress, and then the controller makes congestion management decision
and sends the decision to ingress.
4. Congestion Level Measurement
This section describes how to measure congestion level in tunnel.
There may be different approaches of packet loss detection for
different tunneling protocol scenarios, for instance, if there is a
sequence field in tunneling protocol header, it will be easy for
egress to detect packet loss through the gaps in sequence number
space; another approach is to compare the number of packets entering
ingress and the number of packets arriving at egress over the same
span of packets. This document will focus on the latter one which is
a more general approach.
If the routers support ECN, after router's queue length is over a
predefined threshold, the routers will marks ECN packets as CE
packets or drop not-ECN packets with the probability proportional to
queue length, if the queue overflows all packets will be dropped; if
the routers don't support ECN, after router's queue length is over a
predefined threshold, the routers will drop both ECN packets and not-
ECN packets with the probability proportional to queue length. It's
assumed all routers in the tunnel support ECN.
Faked ECT is used at ingress to defer packet loss to egress. The
basic idea of faked ECT is that, when encapsulating packets, ingress
first marks tunnel outer header according to RFC6040, and then
remarks outer header of Not-ECT packet as ECT, there will be three
kinds of combination of outer header ECN field and inner header ECN
field: CE|CE, ECT|N-ECT, ECT|ECT (in the form of outer ECN| inner
ECN).
In case all interior routers support ECN, the network congestion
level could be indicated through the ratio of CE-marked packet and
the ratio of packet drop, the relationship between these two kinds of
indicator is complementary. If the congestion level in tunnel is not
high enough, the packets would be marked as CE instead of being
dropped, and then it is easy to calculate congestion level according
to the ratio of CE-marked packets; if the congestion level is so high
that ECT packet will be dropped, then the packet loss ratio could be
calculated by comparing total packets entering ingress and total
packets arriving at egress over the same span of packets, if packet
loss is detected, it could be assumed that severe congestion has
occurred in the tunnel, because loss is only ever a sign of serious
congestion, so it doesn't need to measure loss ratio accurately.
Wei Expires January 2, 2016 [Page 5]
INTERNET DRAFT Tunnel Congestion Feedback July 1, 2015
The basic procedure of congestion level measurement is as follows:
+-------+ +------+
|Ingress| |Egress|
+-------+ +------+
| |
+----------------+ |
|cumulative count| |
+----------------+ |
| |
| <node id-i, ECN counts> |
|------------------------>|
|<node id-e, ECN counts> |
|<------------------------|
| |
| |
(a) Direct model feedback procedure
+----------+ +-------+ +------+
|Controller| |Ingress| |Egress|
+----------+ +-------+ +------+
| | |
| +----------------+ |
| |cumulative count| |
| +----------------+ |
| | |
| | <node id-i, ECN counts> |
| |------------------------>|
| | |
| |
| |
| <node id-i, ECN counts> |
| <node id-e, ECN counts> |
|<---------------------------------------|
| |
| |
| |
(b) Centralized model feedback procedure
Ingress encapsulates packets and marks outer header according to
faked ECT as described above. Ingress cumulatively counts packets for
three types of ECN combination (CE|CE, ECT|N-ECT, ECT|ECT) and then
the ingress regularly sends cumulative packet counts message of each
type of ECN combination to the egress. When each message arrives, the
Wei Expires January 2, 2016 [Page 6]
INTERNET DRAFT Tunnel Congestion Feedback July 1, 2015
egress cumulatively counts packets coming from the ingress and adds
its own packet counts of each type of ECN combination (CE|CE, ECT|N-
ECT, CE|N-ECT, CE|ECT, ECT|ECT) to the message and either returns the
whole message to the ingress, or to a central controller.
The counting of packets could be at the granularity of the all
traffic from the ingress to the egress to learn about the overall
congestion status of the path between the ingress and the egress; or
at the granularity of individual customer's traffic or a specific set
of flows to learn about their congestion contribution.
5. Congestion Information Delivery
As described above, the tunnel ingress needs to convey message of
cumulative packet counts of each type of ECN combination to tunnel
egress, and the tunnel egress also needs to feed the message of
cumulative packet counts of each type of ECN combination to the
ingress or central collector. This section describes how the messages
could be conveyed.
The message could be along the same path with network data traffic,
referred as in band signal; or go through a different path with
network data traffic, referred as out of band signal. Because out of
band scheme needs additional separate path which might limit its
actual deployment, so the in band scheme will be discussed here.
Because the message is transmitted in band, so the message packet
might get lost in case of network congestion. To cope with the
situation that message packet gets lost, the packet counts values are
sent as cumulative counters, so if a message is lost the next message
will recover the missing information.
IPFIX [RFC7011] is selected as a choice of candidate protocol. IPFIX
is preferred to use SCTP as transport, and because SCTP allows
partially reliable delivery [RFC3758], which makes sure the feedback
message will not be blocked to be sent in case of SCTP packets lost
due to network congestion.
When sending message from ingress to egress, the ingress acts as
IPFIX exporter and egress acts as IPFIX collector; when sending
message from egress to ingress or controller, the egress acts as
IPFIX exporter and ingress or controller acts as IPFIX collector.
5.1 IPFIX Extentions
5.1.1 ce-cePacketTotalCount
Description: The total number of incoming packets with CE|CE ECN
Wei Expires January 2, 2016 [Page 7]
INTERNET DRAFT Tunnel Congestion Feedback July 1, 2015
marking combination for this Flow at the Observation Point since the
Metering Process (re-)initialization for this Observation Point.
Abstract Data Type: unsigned64
Data Type Semantics: totalCounter
ElementId: TBD1
Statues: current
Units: packets
5.1.2 ect-nectPacketTotalCount
Description: The total number of incoming packets with ECT|N-ECT ECN
marking combination for this Flow at the Observation Point since the
Metering Process (re-)initialization for this Observation Point.
Abstract Data Type: unsigned64
Data Type Semantics: totalCounter
ElementId: TBD2
Statues: current
Units: packets
5.1.3 ce-nectPacketTotalCount
Description: The total number of incoming packets with CE|N-ECT ECN
marking combination for this Flow at the Observation Point since the
Metering Process (re-)initialization for this Observation Point.
Abstract Data Type: unsigned64
Data Type Semantics: totalCounter
ElementId: TBD3
Statues: current
Units: packets
5.1.4 ce-ectPacketTotalCount
Description: The total number of incoming packets with CE|ECT ECN
Wei Expires January 2, 2016 [Page 8]
INTERNET DRAFT Tunnel Congestion Feedback July 1, 2015
marking combination for this Flow at the Observation Point since the
Metering Process (re-)initialization for this Observation Point.
Abstract Data Type: unsigned64
Data Type Semantics: totalCounter
ElementId: TBD4
Statues: current
Units: packets
5.1.5 ect-ectPacketTotalCount
Description: The total number of incoming packets with ECT|ECT ECN
marking combination for this Flow at the Observation Point since the
Metering Process (re-)initialization for this Observation Point.
Abstract Data Type: unsigned64
Data Type Semantics: totalCounter
ElementId: TBD5
Statues: current
Units: packets
6. Congestion Management
After tunnel ingress (or controller) receives congestion level
information, then congestion management actions could be taken based
on the information, e.g. if the congestion level is higher than a
predefined threshold, then action could be taken to reduce the
congestion level.
Congestion management action must be delayed by more than a worst-
case global RTT, otherwise tunnel traffic management will not give
normal e2e congestion control enough time to do its job, and the
system could go unstable. The detailed description of congestion
management is out of scope of this document, as examples, congestion
management such as circuit breaker [CB] and congestion policing [CP]
could be applied.
7. Security
This document describes the tunnel congestion calculation and
Wei Expires January 2, 2016 [Page 9]
INTERNET DRAFT Tunnel Congestion Feedback July 1, 2015
feedback. For feeding back congestion, security mechanisms of IPFIX
are expected to be sufficient. No additional security concerns are
expected.
8. IANA Considerations
This document defines a set of new IPFIX Information Elements (IE).
New registry for these IE identifiers is needed.
TBD1~TBD5.
9. References
9.1 Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997,
<http://www.rfc-editor.org/info/rfc2119>.
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
of Explicit Congestion Notification (ECN) to IP",
RFC 3168, September 2001, <http://www.rfc-
editor.org/info/rfc3168>.
[RFC3758] Stewart, R., Ramalho, M., Xie, Q., Tuexen, M., and P.
Conrad, "Stream Control Transmission Protocol (SCTP)
Partial Reliability Extension", RFC 3758, May 2004,
<http://www.rfc-editor.org/info/rfc3758>.
[RFC7011] Claise, B., Ed., Trammell, B., Ed., and P. Aitken,
"Specification of the IP Flow Information Export (IPFIX)
Protocol for the Exchange of Flow Information", STD 77,
RFC 7011, September 2013, <http://www.rfc-
editor.org/info/rfc7011>.
9.2 Informative References
[CONEX] Matt Mathis, Bob Briscoe. "Congestion Exposure (ConEx)
Concepts, Abstract Mechanism and Requirements", draft-
ietf-conex-abstract-mech-13, October 24, 2014
[CB] G. Fairhurst. "Network Transport Circuit Breakers", draft-ietf-
tsvwg-circuit-breaker-01, April 02, 2015
[CP] Bob Briscoe, Murari Sridharan. "Network Performance Isolation
in Data Centres using Congestion Policing", draft-briscoe-
Wei Expires January 2, 2016 [Page 10]
INTERNET DRAFT Tunnel Congestion Feedback July 1, 2015
conex-data-centre-02, February 14, 2014
Authors' Addresses
Xinpeng Wei
Beiqing Rd. Z-park No.156, Haidian District,
Beijing, 100095, P. R. China
E-mail: weixinpeng@huawei.com
Zhu Lei
Beiqing Rd. Z-park No.156, Haidian District,
Beijing, 100095, P. R. China
E-mail:lei.zhu@huawei.com
Lingli Deng
Beijing, 100095, P. R. China
E-mail: denglingli@gmail.com
Bob Briscoe
B54/77, Adastral Park
Martlesham Heath
Ipswich IP5 3RE
UK
Wei Expires January 2, 2016 [Page 11]