Internet DRAFT - draft-eromenko-ipff-tcp64
draft-eromenko-ipff-tcp64
INTERNET-DRAFT
"TCP 64-bit extension: Modern Variation",
Alexey Eromenko, 2016-09-29,
<draft-eromenko-ipff-tcp64-04.txt>
expiration date: 2017-03-29
Intended status: Standards Track
A.Eromenko
September 2016
TCP 64-bit extension: "Modern Variation"
===========================================
for Internet Protocol - Five Fields
Abstract
This document attempts to modernize TCP protocol for new reality,
faster bandwidth, encryption-optimization and optional checksums,
which is required for Identity-Locator Network Protocol (ILNP)
compatibility.
This extension is backwards compatible with the original TCP
specification during session establishment, but not compatible
during the rest of the session nor with deployed middleboxes.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
Copyright Notice
Copyright (c) 2015 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
Introduction
1. TCP Header: "Classic variation"
2. TCP Header: "Modern variation" a.k.a TCP.64
2.1. TCP Header: "Modern variation without CRC"
3. Initiating a TCP.64 Session "Modern Variation"
4. TCP.64 "Modern Variation" establishment options
4.1. TCP.64 Session option: Modern Maximum Segment Size
4.2. TCP.64 Session option: Modern window scaling
4.3. TCP.64 Session option: Checksum ignored
Authors' Contacts
Introduction
TCP in IP-FF comes in several variations, of flavors.
The questions is:
Our operating systems and processors are 64-bit.
Why not make TCP 64-bit also ?
Well, I decided to define what TCP 64-bit should look like.
The session begins with the good-old, time-tested
"Classic variation", which looks familiar.
Just Port fields have moved to the "IP" layer.
...and only during SYN/ACK, it MAY be moved to a different variation.
1. TCP Header: "Classic variation"
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
4| Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
8| Acknowledgment Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
12| Data | |N|C|E|U|A|P|R|S|F| |
| Offset| |S|W|C|R|C|S|S|Y|I| Window |
| | | |R|E|G|K|H|T|N|N| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
16| Checksum | Urgent Pointer |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
(bytes)
Few differences between original TCP and TCP/IPFF "classic":
a. Ports have moved to IPFF layer.
b. Checksum needs to be computed using the new pseudo-header,
according to [IPFF] specification.
But using the old checksum algorithm (not CRC); This is to allow
TCP/IPFF to function fairly fast on old processors.
c. 999 field limit check:
During session establishment, both <SYN> and <ACK> phases,
TCP MUST check for 999 field limit for IP-FF.
If any field has higher value (between 1000 and 1023), connection
MUST be dropped, either silently or rejected via TCP Reset flag.
This check MUST be performed against both source and destination
addresses.
At a minimum, IPFF implementation MUST support "classic variation".
Other TCP variations are needed for full stack support.
For Hardware with links over 40 Gbit/s, supporting TCP.64 is
mandatory.
2. TCP Header: "Modern variation" a.k.a TCP.64
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
4| Data Offset | Window |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
8| |N|C|E| |A|P|R|S|F| |
| |S|W|C| |C|S|S|Y|I| |
| | |R|E| |K|H|T|N|N| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
12| |
+ Sequence Number +
16| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
20| |
+ Acknowledgment Number +
24| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
28| |
+ CRC64c Checksum +
32| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
(bytes)
64-bit what ?
SYN/ACK fields, CRC64 checksum, and 64-bit logical window size.
Physical window size is 24-bits.
Design note: Bloated for a good reason.
I realize the downside of making TCP bloated by a whopping
extra 16 bytes, but I also realize this is a necessary evil at speeds
over 100 Gigabits-per-Second.
If you're on a slow link, just don't advertise that you're TCP 64-bit
capable, and stay on the "classic variation".
64-bit Checksums:
16-bit checksums of "classic variation" may fail badly.
Today, only Data-Link layer checksum saves Internet from complete
breakdown, as those checksums are fairly strong 32-bit CRCs.
But strong CRC64c checksums is an adequate protection for future
huge amounts of unencrypted data.
Going encryption of-course renders checksums useless.
64-bit Sequence numbers and acknowledgements:
The problem with 32-bit SYN/ACK is TCP Reliability
Quote from RFC-7323:
"An especially serious kind of error may result from an accidental
reuse of TCP sequence numbers in data segments. TCP reliability
depends upon the existence of a bound on the lifetime of a segment:
the "Maximum Segment Lifetime" or MSL.
Duplication of sequence numbers might happen in either of two ways:
(1) Sequence number wrap-around on the current connection
A TCP sequence number contains 32 bits. At a high enough
transfer rate of large volumes of data (at least 4 GiB in the
same session), the 32-bit sequence space may be "wrapped"
(cycled) within the time that a segment is delayed in queues.
(2) Earlier incarnation of the connection
Suppose that a connection terminates, either by a proper close
sequence or due to a host crash, and the same connection (i.e.,
using the same pair of port numbers) is immediately reopened. A
delayed segment from the terminated connection could fall within
the current window for the new incarnation and be accepted as
valid.
Duplicates from earlier incarnations, case (2), are avoided by
enforcing the current fixed MSL of the TCP specification, as
explained in Section 5.8 and Appendix B. In addition, the
randomizing of ephemeral ports can also help to probabilistically
reduce the chances of duplicates from earlier connections. However,
case (1), avoiding the reuse of sequence numbers within the same
connection, requires an upper bound on MSL that depends upon the
transfer rate, and at high enough rates, a dedicated mechanism is
required.""
On a gigabit link, Sequence numbers are rotated every 17 seconds.
On a 100-gigabit link, this is well under a second.
TCP originally was never designed for such speeds.
This is dangerous, because packets from older rotation might get
stuck in queue, then released by a router, get through and corrupt
user data if sent to destination, or cause a TCP reset, if sent as
an ack to the sender.
And those old packets do have correct sequence number and correct
checksum.
PAWS, a system designed to prevent such issues by using timers, may
fail badly and produce bad errors with 32-bit Sequence number,
due to inaccurate timing issues.
Data Offset: 8 bits; not including the fixed header:
the first 32 or 28 bytes, for "Modern" and "Modern-no-CRC"
variations, respectively.
This allows far more options in to be sent in TCP.64 in fact,
1020 bytes in options, while TCPv4 provided only 40 bytes of options.
Compatibility Note: 64-bit TCP breaks compatibility with existing
"middleboxes".
64-bit Initial sequence number:
Must be copied from the lower 32-bits to the upper 32-bits,
if not specified during SYN.
The actual switching to "Modern variation" happens from ACK
packet (3rd handshake), assuming the "ACK" advertises such
capability.
2.1. TCP Header: "Modern variation without CRC" or TCP.64-NO-CRC
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
4| Data Offset | Window |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
8| |N|C|E| |A|P|R|S|F| |
| |S|W|C| |C|S|S|Y|I| |
| | |R|E| |K|H|T|N|N| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
12| |
+ Sequence Number +
16| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
20| |
+ Acknowledgment Number +
24| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
(bytes)
Same as above, but without the CRC64c checksum. This is useful for
encryption protocols, such as SSL/TLS, that provide their own
checksum, protection or authentication mechanisms.
Initially the system switches to "Modern variation", on SYN/ACK,
and with CRC64 field, but that field gets eliminated after 65535
bytes of data, which is where switching to
"Modern variation without CRC" occurs.
(i.e. packets that begin with relative SYN over 65535, are expected
by the receiver to have no CRC64 field)
In first 64 KiB, CRC is required to negotiate encryption parameters.
3. Initiating a TCP.64 Session "Modern Variation"
This is the same signaling as for initiating a normal TCP connection,
but the SYN, SYN/ACK, and ACK packets also carry the 64BIT_CAPABLE
option.
Client Server
------ ------
64BIT_CAPABLE option ->
[SYN flag]
TCP SYN phase
<- 64BIT_CAPABLE option
[ACK flag]
TCP ACK phase
64-bit TCP ->
[SYN flag]
TCP SYN64 phase
<- 64-bit TCP
[SYN+ACK flags]
TCP SYN-ACK64 phase
Design Note:
3-way handshake just became 4-way handshake. Why?
In order to transfer extra options.
Original TCP protocol allows only 40 bytes of options to be sent,
which is nowhere near enough for future requirements,
and session initialization.
TCP.64 allows to send 1020 bytes of options, over 25x times more.
4. TCP.64 "Modern Variation" establishment options
TCP "Modern Variation" Option:
+---------+---------+---------+---------+
| Kind=31 | Length=7|Variation|Upper ISN|
| | | | 32-bits |
+---------+---------+---------+---------+
1 1 1 4
Kind: 31 (To be determined by IANA)
Length:
3 = For flags (such as "Checksum ignored")
7 = For setting variation + upper 32-bits of initial sequence
number.
Variations or codes:
0 = Checksum ignored (useful for ILNP);
Takes effect from 1st packet. (initial SYN)
1 = Modern Variation (TCP.64-bit capable)
2 = Modern Variation, without CRC field; (useful for encryption)
Takes effect after 64 KiB of data.
Those options must be sent during the initial <SYN> phase.
When setting "Modern variation", a node must also set the upper
32-bits of initial sequence number.
4.1. TCP.64 Session option: Modern Maximum Segment Size
Kind: 2 Length: 6 bytes MSS: 32-bit.
+---------+---------+---------+
| Kind=2 |Length=6 | MSS |
+---------+---------+---------+
1 1 4
Modern Maximum Segment Size Option Data: 32 bits
If this option is present, then it communicates the maximum
receive segment size at the TCP which sends this segment.
This field must only be sent in the initial connection request
(i.e., in segments in <SYN64> phase). If this
option is not used, any segment size is allowed.
This option, if present, overrides the classic 16-bit MSS.
This feature exists to support "Jumbograms". (packets > 64KiB)
4.2. TCP.64 Session option: Modern window scaling
Kind: 3 Length: 3 bytes shift.cnt: valid value up to 38
+---------+---------+---------+
| Kind=3 |Length=3 |shift.cnt|
+---------+---------+---------+
1 1 1
This option is an offer, not a promise; both sides must send
Modern Window Scale options in their SYN64 segments to enable window
scaling in either direction. If window scaling is enabled,
then the TCP that sent this option will right-shift its true
receive-window values by 'shift.cnt' bits for transmission in
SEG.WND. The value 'shift.cnt' may be zero (offering to scale,
while applying a scale factor of 1 to the receive window).
This option may be sent in an initial <SYN64> segment (i.e., a
segment with the SYN bit on and the ACK bit off). It may also
be sent in a <SYN,ACK64> segment, but only if a Window Scale op-
tion was received in the initial <SYN64> segment. A Window Scale
option in a segment without a SYN bit should be ignored.
The Window field in a SYN64 (i.e., a <SYN64> or <SYN,ACK64>) segment
itself is never scaled.
Receiving this option during <SYN64> phase by both sides overrides
the classic "Window Scale Option" set during normal <SYN> phase,
if any.
Actually moving to "Modern variation" by itself invalidates this
legacy option.
* All windows are treated as 64-bit quantities for storage in
the connection control block and for local calculations.
This includes the send-window (SND.WND) and the receive-
window (RCV.WND) values, as well as the congestion window.
* Valid value up to 38. Anything higher gets this option
ignored.
Design note:
"Classic variation" TCP with it's 30-bit window (16-bit window+
14-bit window scaling) will work till about 100 Gbit links.
At 100 GBit links + 100 ms latency, you will have 10 GBit
of unacknowledged data, mid-air.
Over 100 gigs TCP performance will start to degrade.
But today (2015) already 1 TBit experimental transmitters exist,
so planning for the future, we need *much* larger window or scaling.
This is why 38-bit Window Scaling was invented for TCP.64-bit,
along with 24-bit Window.
4.3. TCP.64 Session option: Checksum ignored
This option lets the receiver to ignore TCP-supplied checksum.
Affects both classic 16-bit checksum as well as CRC.
Affects TCP session from the initial SYN packet.
This allows for Identity-Locator Network Protocol (ILNP), as
defined in RFC 6740-48, to function with TCP/IP networks.
In this case ILNP or other protocol should provide their own
checksums or error correction for both TCP and IP headers and data.
When used together with TCP.64-bit "modern variation",
the system always resets to "64-bit-NO-CRC" from the first packet.
(i.e. SYN64 phase)
Acknowledgements:
Influenced by the hard work of Robert Ullmann
"TP/IX: The Next Internet"
[RFC-1475]
and: "Identifier-Locator Network Protocol (ILNP)",
[RFC-6740] written by "Randall J. Atkinson" and "SN Bhatti".
and:
[RFC-2675]; "IPv6 Jumbograms" by David A. Borman,
Stephen E. Deering, and Robert M. Hinden
and partially derived from:
[RFC-1323] and [RFC-7323]; "TCP Extensions for High Performance"
by "D. Borman", "B. Braden", "V. Jacobson", "R. Scheffenegger, Ed."
And big thanks to DARPA for the original specification of
Transmission Control Protocol, as defined in [RFC-793] !
One of the greatest inventions of the Internet !
Authors' Contacts
Alexey Eromenko
Israel
Skype: Fenix_NBK_
EMail: al4321@gmail.com
Facebook: https://www.facebook.com/technologov
INTERNET-DRAFT
Alexey
expiration date: 2017-03-29