Internet DRAFT - draft-robinson-intarea-mcphint
draft-robinson-intarea-mcphint
Network Working Group H. Robinson
Internet-Draft Stratus Technologies, Inc.
Intended status: Standards Track 26 May 2022
Expires: 27 November 2022
Multiple Core Performance Hint Option
draft-robinson-intarea-mcphint-00
Abstract
This standard defines a method for differentiating between unrelated
data streams when the source and destination ports are encrypted.
This method MAY be used by hardware or software to evenly distribute
incoming workload between multiple CPU cores and/or other processing
elements.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on 27 November 2022.
Copyright Notice
Copyright (c) 2022 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components
extracted from this document must include Revised BSD License text as
described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Revised BSD License.
Robinson Expires 27 November 2022 [Page 1]
Internet-Draft MCPHINT May 2022
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3
2. IPv4 Option Format . . . . . . . . . . . . . . . . . . . . . 3
3. IPv6 Option Format . . . . . . . . . . . . . . . . . . . . . 4
4. Differentiation Data . . . . . . . . . . . . . . . . . . . . 5
5. Forwarding . . . . . . . . . . . . . . . . . . . . . . . . . 5
6. Tunneling . . . . . . . . . . . . . . . . . . . . . . . . . . 5
7. Parsing Input Datagrams . . . . . . . . . . . . . . . . . . . 5
7.1. IPv4 . . . . . . . . . . . . . . . . . . . . . . . . . . 5
7.2. IPv6 . . . . . . . . . . . . . . . . . . . . . . . . . . 6
8. Future Considerations . . . . . . . . . . . . . . . . . . . . 6
9. Security Considerations . . . . . . . . . . . . . . . . . . . 6
10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 6
11. Appendix A - Design Considerations . . . . . . . . . . . . . 7
11.1. IP Nofification . . . . . . . . . . . . . . . . . . . . 7
11.2. Issues To Resolve . . . . . . . . . . . . . . . . . . . 7
12. Normative References . . . . . . . . . . . . . . . . . . . . 8
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 8
1. Introduction
The Internet protocol allows datagrams to be re-ordered. Protocols
which require datagrams to be ordered must retain out of order
datagams until preceding datagrams have been received. While this
works, the effect of out of order datagrams on network performance is
highly detrimental: Out of order packets at first appear to be packet
loss from the receivers point of view. The perceived packegt loss
can trigger unneeded retransmission and delays from TCP and any other
protocol which uses packet loss to implement congestion control.
With the advent of 10Gbit transmission speeds, it is not possible for
a single CPU core to keep up with the incoming data running at full
line speed. Hardware vendors have implemented mechanisms to
distribute incoming datagrams to multiple CPU cores. If they did
this on a random or round-robin basis, the different latencies
between the multiple cores would result in datagram re-ordering,
which can severly impact performance. Hardware solves this problem
by distributing the data deterministically between CPU cores: This is
done using a hash of the source and destination IP addresses and the
source and destination port numbers. Using just the source and
destination IP addresses is not sufficient, because the resulting
traffic will often go to a single CPU core.
A performance problem arises when handling IPSec traffic: The port
numbers are encrypted and can no longer be read by the hardware.
Robinson Expires 27 November 2022 [Page 2]
Internet-Draft MCPHINT May 2022
The performance problem also occurs with fragmented datagrams: The
port numbers are only in the first fragment.
This standard defines IPv4 and IPv6 options to provide
differentiation that can be used to distribute incoming datagrams to
multiple CPU cores.
1.1. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in BCP 14, RFC 2119
[RFC2119].
2. IPv4 Option Format
A host transmitting an IPv4 datagram MAY add an MCPHINT option to the
IPv4 header under any of the following circumstances:
* The datagram contains an AH or ESP header.
* The datagram is fragmented.
* The datagram is to be transmitted beyond the current subnet and
the don't fragment bit is not set.
The MCPHINT option provides 2 bytes of differentiation data. If
present, the MCPHINT option MUST occur first - at offset 20 from the
beginning of the IPv4 header.
The MCPHINT option MUST NOT be used with upper layer protocols which
do not have unique identifiers beyond the IPv4 source and destination
address.
The datagram MUST NOT be for the ICMP protocol.
The format of the IPv4 MCPHINT options is:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Length = 4 | Differentiation Data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Type = TBD_IP4OPT_MCPHINT
Refer to RFC0791 [RFC0791] for more information about IP options.
Robinson Expires 27 November 2022 [Page 3]
Internet-Draft MCPHINT May 2022
If there is a mechanism by which an application can provide IPv4
options for transmission and that mechanism is used to provide an
MCPHINT option, the value provided by the application MUST be used.
The macro OPT_MCPHINT MAY be added to netinet/in.h defined as
TBD_IP4OPT_MCPHINT.
3. IPv6 Option Format
A host transmitting an IPv6 datagram MAY add an MCPHINT option under
any of the following circumstances:
* The datagram contains an AH or ESP header.
* The datagram will be fragmented.
The MCPHINT option MUST be added to a destination options header.
The MCPHINT option provides 2 bytes of differentiation data. The
Destination options header is defined in section 4.6 of RFC8200
[RFC8200].
If present, the MCPHINT option MUST occur first in the first
destination options header - normally at offset 42 from the beginning
of the IPv6 header.
Note that RFC8200 [RFC8200] requires that per fragment destination
headers to be followed by a routing header. If one applies this hint
to a packet containing an IPv6 fragmentation header, a routing header
must be included. RFC8200 [RFC8200] explicitly states that a routing
header with zero "Segments Left" is always ignored; so, this is
possible.
The MCPHINT option MUST NOT be used with upper layer protocols which
do not have unique identifiers beyond the IPv6 source and destination
address.
The datagram MUST NOT be for the ICMP6 protocol.
The format of the IPv6 MCPHINT options is:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Data Len = 2 | Differentiation Data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Type = TBD_IP6OPT_MCPHINT
Robinson Expires 27 November 2022 [Page 4]
Internet-Draft MCPHINT May 2022
If there is a mechanism by which an application can provide
destination options for transmission and that mechanism is used to
provide an MCPHINT option, the value provided by the application MUST
be used.
The macro IP6OPT_MCPHINT MAY be added to netinet/ip6.h defined as
TBD_IP6OPT_MCPHINT.
4. Differentiation Data
For both IPv4 and IPv6, there is two bytes of differentiation data.
The differentiation data MUST NOT be zero. The differentation data
MUST be the same for all datagrams in a logical stream. The actual
value chosen for differentiation data is left to the implementation.
A preferable mechanism would be to generate two bytes of random data
when a socket is created and to use that data for the life of the
socket. The random data could be updated every time a connection is
specified.
Alternatively, exclusive or'ing the source and destination ports is
an acceptable method for generating the differentiation data.
5. Forwarding
Forwarding is already defined to pass through unknown options.
6. Tunneling
Tunneling implementations MAY copy the MCPHINT option from the
datagrams being tunneled to the outer headers.
7. Parsing Input Datagrams
7.1. IPv4
Refer to section 3.1 in RFC0791 [RFC0791]. The input parsing
algorithm for detecting the presence of differentiation data is
o IHL MUST be greater than or equal to 6
o The byte at offset 20 MUST be TBD_IP4OPT_MCPHINT
If those checks pass, then the differentation data can be found at
offset 22.
Robinson Expires 27 November 2022 [Page 5]
Internet-Draft MCPHINT May 2022
7.2. IPv6
Refer to sections 3, 4.2 and 4.6 in RFC8200 [RFC8200]. The input
parsing algorithm for detecting the presence of differentiation data
is
o Next Header (offset 6) MUST be 60 (for destination options).
o The byte at offset 42 MUST be TBD_IP6OPT_MCPHINT
If those checks pass, then the differentation data can be found at
offset 44.
8. Future Considerations
A future revision of this standard could allow the differentation
data to be longer as long as the first two bytes are generated the
same way.
A future revision of this standard could add fields to this option.
9. Security Considerations
The MCPHINT option provides some minimal insight to internal network
configurations that wouldn't otherwise be discernable for IPSec
tunnels.
Xor'ing the port numbers to obtain differentiation data provides
slightly more information than using random data.
The implementation MUST provide an adminitrative mechanism to disable
the use of MCPHINT options.
If the implementation implements both random generation of
differentiation data AND uses the Xor'ing ports method, there MUST be
separate administrative mechanisms for each method.
10. IANA Considerations
IANA is asked to assign a value for TBD_IP4OPT_MCPHINT under
"Internet Protocol Version 4 (IPv4) Parameters", "IP Option Numbers"
(https://www.iana.org/assignments/ip-parameters/ip-
parameters.xhtml#ip-parameters-1), Refer to RFC2780 [RFC2780] and
RFC0791 [RFC0791].
The Copy bit MUST be 1 and the class bits MUST be 00.
Robinson Expires 27 November 2022 [Page 6]
Internet-Draft MCPHINT May 2022
IANA is asked to assign a value for TBD_IP6OPT_MCPHINT under
"Internet Protocol Version 6 (IPv6) Parameters", "Destination Options
and Hop-by-Hop Options" (https://www.iana.org/assignments/ipv6-
parameters/ipv6-parameters.xhtml#ipv6-parameters-2), Refer to RFC2780
[RFC2780] and RFC8200 [RFC8200].
The act bits MUST 00 and the chg bit MUST be 0
11. Appendix A - Design Considerations
This is done as an option so it may be added without affecting
implementations that don't implement it.
Use with ICMP and ICMPv6 is prohibited because there is no reason to
optimize them and, given that correct IP layer behavior depends on
thier transmission, it is best to avoid anything that might interfere
with correct operation..
One should note that when using this option with IPSec, the same
security association is likely to be processed on multiple CPU cores.
This requires a good locking design to acheive the desired
performance improvement. It also requires much larger replay
windows.
11.1. IP Nofification
Stratus has applied for a patent on this. Stratus intends to allow
use of the patent free of charge. I will be filing the appropriate
formal notification as soon as I figure out what it is and get it
signed by the appropriate management.
11.2. Issues To Resolve
My original writeup of this put the new IPv6 option in the Hop-by-Hop
header, because that is always ensured to be a per fragment header.
The option was moved to the destination options header given the
advice in section 4.8 of RFC8200 Section 4.5 of RFC8200
[RFC8200].[RFC8200] explicitly states that there are only the
following combinations of per fragment headers:
IPv6 Header
IPv6 Headar, Hop-by-Hop Header
IPv6 Header, Destination Options Header, Routing Header
IPv6 Header, Hop-by-Hop Header, Dest Options Header, Routing Header
This implies that getting MCPHINTs into a fragmented header will
require the insertion of a null routing header if one isn't present
(which is the normal case).
Robinson Expires 27 November 2022 [Page 7]
Internet-Draft MCPHINT May 2022
So, I am wondering if I was mislead by section 4.8 in RFC8200
[RFC8200] and this option really belongs in the hop-by-hop header?
I see that some other drafts have picked new values for option
numbers and instructed the IANA to allocate specific numbers. I like
this idea. Can anyone recommend deprected values which could be
assigned without getting into trouble?
12. Normative References
[RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791,
DOI 10.17487/RFC0791, September 1981,
<https://www.rfc-editor.org/info/rfc791>.
[RFC8200] Deering, S. and R. Hinden, "Internet Protocol, Version 6
(IPv6) Specification", STD 86, RFC 8200,
DOI 10.17487/RFC8200, July 2017,
<https://www.rfc-editor.org/info/rfc8200>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.
[RFC2780] Bradner, S. and V. Paxson, "IANA Allocation Guidelines For
Values In the Internet Protocol and Related Headers",
BCP 37, RFC 2780, DOI 10.17487/RFC2780, March 2000,
<https://www.rfc-editor.org/info/rfc2780>.
Author's Address
Herb Robinson
Stratus Technologies, Inc.
5 Mill & Main Place, Suite 500
Maynard, Massachusetts 1004
United States of America
Email: Herbie.Robinson@stratus.com
URI: https://www.stratus.com/
Robinson Expires 27 November 2022 [Page 8]