Internet DRAFT - draft-yu-nvo3-geneve-pkt-reordering
draft-yu-nvo3-geneve-pkt-reordering
INTERNET-DRAFT Y. Yu
Intended Status: Standards Track Huawei Technologies
Expires: Mar 5, 2019 J. Wang
China Telecom
Sep 1, 2018
Packet Reordering in Geneve Overlay Network
draft-yu-nvo3-geneve-pkt-reordering-00
Abstract
Congestion is the killer of low latency and high throughput.Network
congestion occurs on the interconnection links of a data center due
to poor traffic distribution. Load balancing technologies are used to
solve network congestion. Packet spraying is a kind of load balancing
technology with finer granularity. During this situation, the packets
may arrive at the destination out of order. This document describes
a reordering protocol in the Geneve encapsulation network[1] using a
newly defined Geneve Option field.
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as
Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/1id-abstracts.html
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
<Yu, et al.> Expires <Mar 5, 2019> [Page 1]
INTERNET DRAFT <Reordering in Geneve Overlay Network> <Sep 1, 2018>
Copyright and License Notice
Copyright (c) 2018 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3 Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . 3
4 Problem Statements & Requirements . . . . . . . . . . . . . . . 3
5 Packet Reordering on Geneve . . . . . . . . . . . . . . . . . . 4
5.1 Packet Reordering Format . . . . . . . . . . . . . . . . . . 4
5.2 Packet Reordering Capability Discovery . . . . . . . . . . . 6
6 Security Considerations . . . . . . . . . . . . . . . . . . . . 8
7 IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 8
8 References . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9
<Yu, et al.> Expires <Mar 5, 2019> [Page 2]
INTERNET DRAFT <Reordering in Geneve Overlay Network> <Sep 1, 2018>
1 Introduction
In many current data centers, network utilization is not has high as
it could be. For example, in some scenarios, the average network
utilization is about 20% and the peak utilization is about 45%[2].
With the improvement of end systems (or endpoints), the deployment
of multi-services and high-volume traffic services (such as streaming
media, big data processing applications and user-oriented large-scale
web applications, etc.), more and more network performance problems
appear. These problems are created by traffic bursts and traffic
routing collisions. The imbalance of traffic on the network becomes
more and more prominent which leads to underutilized network
bandwidth and decreased overall performance of network applications.
In order to fully utilize the available network bandwidth, traffic
flows into the network are dispersed across multiple paths to achieve
load balancing. The finer the granularity of the load balancing, the
higher the utilization of available network bandwidth. Current flow-
based and flowlet-based[3] approaches are more coarse grain than
packet-based load balancing. During the packet spraying situation,
the packets may arrive at the destination out of order because the
difference latency of links. This document describes how to extend
the Geneve header to support reordering for packet-based load
balancing, called reordering in the Geneve encapsulation network.
2 Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
3 Abbreviations
GENEVE - Generic Network Virtualization Encapsulation
ECMP - Equal-cost multi-path routing
SDN - Software Defined Network
GFP - Geneve Forwarding Policy
4 Problem Statements & Requirements
The current general network topology in the data center is a multi-
rooted tree architecture, such as the typical CLOS network. This kind
of network has multiple paths and an equal division of bandwidth
across those paths which provides good scalability and flexibility
depending on how the multiple paths are utilized. In order to fully
<Yu, et al.> Expires <Mar 5, 2019> [Page 3]
INTERNET DRAFT <Reordering in Geneve Overlay Network> <Sep 1, 2018>
utilize the network bandwidth, traffic flows into the network are
dispersed on the multiple paths to achieve load balancing. Currently,
the granularity of load balancing can be seen in the following
approaches: flow-based load balancing (such as ECMP), flowlet-based
load balancing (such as CONGA[2]) and packet-based load balancing
(such as Packet Spraying). The finer the granularity of load
balancing, the more effective the load balancing is and the higher
the utilization of network bandwidth can be.
The effect of packet-based load balancing is the best one among the
three because the corresponding granularity is the smallest. However,
the consequence is that packets belonging to the same flow will be
allocated to different paths. When the forwarding delays of paths are
different, it is possible that packets may arrive at the receiver
out-of-order. To detect out-of-order packets and restore the correct
order, a sequence number is needed in the packets.
5 Packet Reordering on Geneve
5.1 Packet Reordering Format
The Geneve Header and the Geneve option have the following format[1]:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver| Opt Len |O|C| Rsvd. | Protocol Type |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Virtual Network Identifier (VNI) | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Variable Length Options |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Geneve Header
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Option Class | Type |R|R|R| Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Variable Option Data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Geneve Option
Option Class = To be assigned by IANA (TBA).
Type = TBA.
Length = 2 (8 byte)
The proposed Packet Reordering option for Geneve will have the
following format:
<Yu, et al.> Expires <Mar 5, 2019> [Page 4]
INTERNET DRAFT <Reordering in Geneve Overlay Network> <Sep 1, 2018>
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Option Class = GFP | Type |R|R|R| Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Flow Group ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sequencing Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Packet Reordering Format over Geneve
Option Class = Geneve Forwarding Policy(suggested), to be assigned by
IANA (TBA).
Type = TBA.
Length = 2 (8 byte)
Flow Group ID: will be described in 5.1.1
Sequencing Number: will be described in 5.1.2
5.1.1 Flow Group ID Field (4 Bytes)
The Flow Group ID field is a four byte field. The Flow Group ID
identifies a group of flows within the same reorder sequence space
between a pair of src/dest nodes. The Flow Group ID may correspond to
an individual flow, some subset of flows, or even all flows between
the src/dest pair. How the flow corresponds to the Flow Group ID is
not defined by this draft. The same Flow Group ID can be used by
different src/dest pairs (i.e. a Flow Group ID is only unique within
the context of a src/dest pair). A Flow Group is uniquely identified
by the 3 tuple that includes src IP, dest IP and Flow Group ID. The
source node allocates the sequence number according to the order
packets are sent for flows of the same Flow Group. The destination
will reorder the received packets of a Flow Group according to the
received sequence number.
5.1.2 Sequence Number Field
The Sequence Number field is a four byte field that closely follows
the definition of the Sequence Number in RFC 2890[4]. The sequence
number value ranges from 0 to (2**32)-1. The first datagram is sent
with a sequence number of 0. The sequence number is thus a
monotonically increasing counter represented modulo 2**32. The
receiver maintains the sequence number value of the last successfully
decapsulated packet. This value should be initialized to (2**32)-1.
A packet is considered an out-of-sequence packet if the sequence
number of the received packet is less than or equal to the sequence
<Yu, et al.> Expires <Mar 5, 2019> [Page 5]
INTERNET DRAFT <Reordering in Geneve Overlay Network> <Sep 1, 2018>
number of last successfully decapsulated packet. The sequence number
of a received message is considered less than or equal to the last
successfully received sequence number if its value lies in the range
of the last received sequence number and the preceding 2**31-1
values, inclusive.
If the received packet is an in-sequence packet, it is successfully
decapsulated. An in-sequence packet is one with a sequence number
exactly 1 greater than (modulo 2**32) the last successfully
decapsulated packet. If the received packet is neither an in-sequence
nor an out-of-sequence packet it indicates a sequence number gap. The
receiver may perform a small amount of buffering in an attempt to
recover the original sequence of transmitted packets. In this case,
the packet may be placed in a buffer sorted by sequence number. If
an in-sequence packet is received and successfully decapsulated, the
receiver should consult the head of this buffer to see if the next
in-sequence packet has already been received. If so, the receiver
should decapsulate it as well as the following in-sequence packets
that may be present in the buffer. The "last successfully
decapsulated sequence number" should then be set to the last packet
that was decapsulated from the buffer.
Under no circumstances should a packet wait more that
OUTOFORDER_TIMER microseconds in the buffer. If a packet has been
waiting that long, the receiver MUST immediately traverse the buffer
in sorted order, decapsulating packets (and ignoring any sequence
number gaps) until there are no more packets in the buffer that have
been waiting longer than OUTOFORDER_TIMER milliseconds. The "last
successfully decapsulated sequence number" should then be set to the
last packet so decapsulated.
The receiver may place a limit on the number of packets in any per-
flow group buffer (Packets with the same Flow Group ID Field value
belong to a flow group). If a packet arrives that would cause the
receiver to place more than MAX_PERFLOW_BUFFER packets into a given
buffer, then the packet at the head of the buffer is immediately
decapsulated regardless of its sequence number and the "last
successfully decapsulated sequence number" is set to its sequence
number. The newly arrived packet may then be placed in the buffer.
The received packets of flows from the same Flow Group are in the
same reorder sequence space. The source ensures to allocate the
sequence number according to the sequence of sent packets. If the
sequence number wraps, the source will allocate from 0 again.
5.2 Packet Reordering Capability Discovery
<Yu, et al.> Expires <Mar 5, 2019> [Page 6]
INTERNET DRAFT <Reordering in Geneve Overlay Network> <Sep 1, 2018>
The reorder function on the destination needs certain resources. For
example, there is a reorder queue corresponding to each Group ID(Flow
Group ID plus the Source IP address). For some resource-intensive
chips such as switch chips, the amount of queues are limited.
Therefore, it is important to not exceed the ability of the
destination when assigning the Group ID at the source. This requires
that the source understands the ability of the destination. There are
several solutions, such as static configuration, or direct signaling
between the two ends. In the following situations, the capability
notifications need to be sent to the peer:
1. When the source communicates with the destination for the first
time.
2. When receiving the peer packet for the first time
3. When receiving the capability notification from the source
4. When the Group ID of peer exceed the local capability
In the above cases, the destination needs to notify the capability
(reorder queues assigned to the peer) to the source. When receiving
the capability notification from the destination, the source needs to
tune the allocation mechanism of Group ID according to the capability
of destination to ensure the number of Group IDs does not exceed the
number of reordering queues allocated to the source.
When the number of Group IDs exceed the local capability, the
following 2 actions can be taken. Which option is selected is not
covered in this draft.
1.Discard the Geneve packet for the Group ID that exceeds the local
capability
2.Remove the Geneve encapsulation, without performing reordering and
pass the packet to higher layer protocol. For higher layer protocols
that can tolerate a certain degree of out-of-order packets (such as
TCP), the message may be processed correctly.
When the Group ID exceeds the local capability, the destination sends
a notification of the reordering capability to the source. To prevent
sending the capability notification too frequently, a notification
suppression capability is needed. When the destination wants to send
a notification of the capability of the source, it enters a
suppression cycle. The destination will not send the capability
notification to the source until the suppression cycle ends. The
suppression period is longer than the RTT between 2 nodes.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver| Opt Len |O|C| Rsvd. | Protocol Type |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
<Yu, et al.> Expires <Mar 5, 2019> [Page 7]
INTERNET DRAFT <Reordering in Geneve Overlay Network> <Sep 1, 2018>
| Virtual Network Identifier (VNI) | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Option Class = GFP | Type=Capacity |R|R|R| Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| MAX GROUP ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Capability notification message format
Length=1 (4 byte)
MAX GROUP ID is a four byte field. MAX Group ID indicate the max
Group ID assigned to the destination. The Group ID allocated by the
source must be limited to 0 ~ (MAX Group ID - 1).
6 Security Considerations
This document describes Geneve option which introduce Flow Group ID
and Sequence Number to reorder packets. Within the Sequence Number
Field, it is possible to inject packets with an arbitrary Sequence
Number and launch a Denial of Service attack. This is a general
security issue which is defined in Geneve security requirements[5].
In order to protect against such attacks, IPSec could be used to
protect the Geneve header and the tunneled payload. Any common Geneve
security mechanism also applies to this draft.
7 IANA Considerations
IANA is requested to allocate a Geneve "option class" number for
GFP(Geneve Forwarding Policy):
+---------------+-------------+---------------+
| Option Class | Description | Reference |
+---------------+-------------+---------------+
| x | GFP_ID | This document |
+---------------+-------------+---------------+
8 References
[1] J. Gross, Ed., I. Ganga, Ed., T. Sridhar, Ed., "Generic Network
Virtualization Encapsulation", [I-D.ietf-nvo3-geneve]
[2] Jiaxin Cao, et al, "Per-packet Load-balanced, Low-Latency Routing
<Yu, et al.> Expires <Mar 5, 2019> [Page 8]
INTERNET DRAFT <Reordering in Geneve Overlay Network> <Sep 1, 2018>
for Clos-based Data Center Networks", CoNEXT'13
[3] Mohammad Alizadeh, et al, "CONGA: Distributed Congestion-Aware
Load Balancing for Datacenters", Sigcomm'14
[4] G. Dommety, "Key and Sequence Number Extensions to GRE", RFC
2890, September 2000
[5] D. Migault, S. Boutros, D. Wing, S. Krishnan,"Geneve Protocol
Security Requirement", [I-D. draft-mglt-nvo3-geneve-security-
requirements-03]
Authors' Addresses
Yolanda Yu
Huawei Technologies Co., Ltd.
Email: yolanda.yu@huawei.com
Jianglong Wang
China Telecom
Email: wangjl1.bri@chinatelecom.cn
<Yu, et al.> Expires <Mar 5, 2019> [Page 9]