Internet DRAFT - draft-robinson-intarea-mcphint

draft-robinson-intarea-mcphint







Network Working Group                                        H. Robinson
Internet-Draft                                Stratus Technologies, Inc.
Intended status: Standards Track                             26 May 2022
Expires: 27 November 2022


                 Multiple Core Performance Hint Option
                   draft-robinson-intarea-mcphint-00

Abstract

   This standard defines a method for differentiating between unrelated
   data streams when the source and destination ports are encrypted.
   This method MAY be used by hardware or software to evenly distribute
   incoming workload between multiple CPU cores and/or other processing
   elements.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 27 November 2022.

Copyright Notice

   Copyright (c) 2022 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.




Robinson                Expires 27 November 2022                [Page 1]

Internet-Draft                   MCPHINT                        May 2022


Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
     1.1.  Requirements Language . . . . . . . . . . . . . . . . . .   3
   2.  IPv4 Option Format  . . . . . . . . . . . . . . . . . . . . .   3
   3.  IPv6 Option Format  . . . . . . . . . . . . . . . . . . . . .   4
   4.  Differentiation Data  . . . . . . . . . . . . . . . . . . . .   5
   5.  Forwarding  . . . . . . . . . . . . . . . . . . . . . . . . .   5
   6.  Tunneling . . . . . . . . . . . . . . . . . . . . . . . . . .   5
   7.  Parsing Input Datagrams . . . . . . . . . . . . . . . . . . .   5
     7.1.  IPv4  . . . . . . . . . . . . . . . . . . . . . . . . . .   5
     7.2.  IPv6  . . . . . . . . . . . . . . . . . . . . . . . . . .   6
   8.  Future Considerations . . . . . . . . . . . . . . . . . . . .   6
   9.  Security Considerations . . . . . . . . . . . . . . . . . . .   6
   10. IANA Considerations . . . . . . . . . . . . . . . . . . . . .   6
   11. Appendix A - Design Considerations  . . . . . . . . . . . . .   7
     11.1.  IP Nofification  . . . . . . . . . . . . . . . . . . . .   7
     11.2.  Issues To Resolve  . . . . . . . . . . . . . . . . . . .   7
   12. Normative References  . . . . . . . . . . . . . . . . . . . .   8
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .   8

1.  Introduction

   The Internet protocol allows datagrams to be re-ordered.  Protocols
   which require datagrams to be ordered must retain out of order
   datagams until preceding datagrams have been received.  While this
   works, the effect of out of order datagrams on network performance is
   highly detrimental: Out of order packets at first appear to be packet
   loss from the receivers point of view.  The perceived packegt loss
   can trigger unneeded retransmission and delays from TCP and any other
   protocol which uses packet loss to implement congestion control.

   With the advent of 10Gbit transmission speeds, it is not possible for
   a single CPU core to keep up with the incoming data running at full
   line speed.  Hardware vendors have implemented mechanisms to
   distribute incoming datagrams to multiple CPU cores.  If they did
   this on a random or round-robin basis, the different latencies
   between the multiple cores would result in datagram re-ordering,
   which can severly impact performance.  Hardware solves this problem
   by distributing the data deterministically between CPU cores: This is
   done using a hash of the source and destination IP addresses and the
   source and destination port numbers.  Using just the source and
   destination IP addresses is not sufficient, because the resulting
   traffic will often go to a single CPU core.

   A performance problem arises when handling IPSec traffic: The port
   numbers are encrypted and can no longer be read by the hardware.




Robinson                Expires 27 November 2022                [Page 2]

Internet-Draft                   MCPHINT                        May 2022


   The performance problem also occurs with fragmented datagrams: The
   port numbers are only in the first fragment.

   This standard defines IPv4 and IPv6 options to provide
   differentiation that can be used to distribute incoming datagrams to
   multiple CPU cores.

1.1.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in BCP 14, RFC 2119
   [RFC2119].

2.  IPv4 Option Format

   A host transmitting an IPv4 datagram MAY add an MCPHINT option to the
   IPv4 header under any of the following circumstances:

   *  The datagram contains an AH or ESP header.

   *  The datagram is fragmented.

   *  The datagram is to be transmitted beyond the current subnet and
      the don't fragment bit is not set.

   The MCPHINT option provides 2 bytes of differentiation data.  If
   present, the MCPHINT option MUST occur first - at offset 20 from the
   beginning of the IPv4 header.

   The MCPHINT option MUST NOT be used with upper layer protocols which
   do not have unique identifiers beyond the IPv4 source and destination
   address.

   The datagram MUST NOT be for the ICMP protocol.

   The format of the IPv4 MCPHINT options is:

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |   Type        |  Length = 4   |  Differentiation Data         |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       Type = TBD_IP4OPT_MCPHINT

   Refer to RFC0791 [RFC0791] for more information about IP options.





Robinson                Expires 27 November 2022                [Page 3]

Internet-Draft                   MCPHINT                        May 2022


   If there is a mechanism by which an application can provide IPv4
   options for transmission and that mechanism is used to provide an
   MCPHINT option, the value provided by the application MUST be used.

   The macro OPT_MCPHINT MAY be added to netinet/in.h defined as
   TBD_IP4OPT_MCPHINT.

3.  IPv6 Option Format

   A host transmitting an IPv6 datagram MAY add an MCPHINT option under
   any of the following circumstances:

   *  The datagram contains an AH or ESP header.

   *  The datagram will be fragmented.

   The MCPHINT option MUST be added to a destination options header.
   The MCPHINT option provides 2 bytes of differentiation data.  The
   Destination options header is defined in section 4.6 of RFC8200
   [RFC8200].

   If present, the MCPHINT option MUST occur first in the first
   destination options header - normally at offset 42 from the beginning
   of the IPv6 header.

   Note that RFC8200 [RFC8200] requires that per fragment destination
   headers to be followed by a routing header.  If one applies this hint
   to a packet containing an IPv6 fragmentation header, a routing header
   must be included.  RFC8200 [RFC8200] explicitly states that a routing
   header with zero "Segments Left" is always ignored; so, this is
   possible.

   The MCPHINT option MUST NOT be used with upper layer protocols which
   do not have unique identifiers beyond the IPv6 source and destination
   address.

   The datagram MUST NOT be for the ICMP6 protocol.

   The format of the IPv6 MCPHINT options is:

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |   Type        |  Data Len = 2 |   Differentiation Data        |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       Type = TBD_IP6OPT_MCPHINT





Robinson                Expires 27 November 2022                [Page 4]

Internet-Draft                   MCPHINT                        May 2022


   If there is a mechanism by which an application can provide
   destination options for transmission and that mechanism is used to
   provide an MCPHINT option, the value provided by the application MUST
   be used.

   The macro IP6OPT_MCPHINT MAY be added to netinet/ip6.h defined as
   TBD_IP6OPT_MCPHINT.

4.  Differentiation Data

   For both IPv4 and IPv6, there is two bytes of differentiation data.
   The differentiation data MUST NOT be zero.  The differentation data
   MUST be the same for all datagrams in a logical stream.  The actual
   value chosen for differentiation data is left to the implementation.

   A preferable mechanism would be to generate two bytes of random data
   when a socket is created and to use that data for the life of the
   socket.  The random data could be updated every time a connection is
   specified.

   Alternatively, exclusive or'ing the source and destination ports is
   an acceptable method for generating the differentiation data.

5.  Forwarding

   Forwarding is already defined to pass through unknown options.

6.  Tunneling

   Tunneling implementations MAY copy the MCPHINT option from the
   datagrams being tunneled to the outer headers.

7.  Parsing Input Datagrams

7.1.  IPv4

   Refer to section 3.1 in RFC0791 [RFC0791].  The input parsing
   algorithm for detecting the presence of differentiation data is

    o IHL MUST be greater than or equal to 6
    o The byte at offset 20 MUST be TBD_IP4OPT_MCPHINT

   If those checks pass, then the differentation data can be found at
   offset 22.







Robinson                Expires 27 November 2022                [Page 5]

Internet-Draft                   MCPHINT                        May 2022


7.2.  IPv6

   Refer to sections 3, 4.2 and 4.6 in RFC8200 [RFC8200].  The input
   parsing algorithm for detecting the presence of differentiation data
   is

    o Next Header (offset 6) MUST be 60 (for destination options).
    o The byte at offset 42 MUST be TBD_IP6OPT_MCPHINT

   If those checks pass, then the differentation data can be found at
   offset 44.

8.  Future Considerations

   A future revision of this standard could allow the differentation
   data to be longer as long as the first two bytes are generated the
   same way.

   A future revision of this standard could add fields to this option.

9.  Security Considerations

   The MCPHINT option provides some minimal insight to internal network
   configurations that wouldn't otherwise be discernable for IPSec
   tunnels.

   Xor'ing the port numbers to obtain differentiation data provides
   slightly more information than using random data.

   The implementation MUST provide an adminitrative mechanism to disable
   the use of MCPHINT options.

   If the implementation implements both random generation of
   differentiation data AND uses the Xor'ing ports method, there MUST be
   separate administrative mechanisms for each method.

10.  IANA Considerations

   IANA is asked to assign a value for TBD_IP4OPT_MCPHINT under
   "Internet Protocol Version 4 (IPv4) Parameters", "IP Option Numbers"
   (https://www.iana.org/assignments/ip-parameters/ip-
   parameters.xhtml#ip-parameters-1), Refer to RFC2780 [RFC2780] and
   RFC0791 [RFC0791].

   The Copy bit MUST be 1 and the class bits MUST be 00.






Robinson                Expires 27 November 2022                [Page 6]

Internet-Draft                   MCPHINT                        May 2022


   IANA is asked to assign a value for TBD_IP6OPT_MCPHINT under
   "Internet Protocol Version 6 (IPv6) Parameters", "Destination Options
   and Hop-by-Hop Options" (https://www.iana.org/assignments/ipv6-
   parameters/ipv6-parameters.xhtml#ipv6-parameters-2), Refer to RFC2780
   [RFC2780] and RFC8200 [RFC8200].

   The act bits MUST 00 and the chg bit MUST be 0

11.  Appendix A - Design Considerations

   This is done as an option so it may be added without affecting
   implementations that don't implement it.

   Use with ICMP and ICMPv6 is prohibited because there is no reason to
   optimize them and, given that correct IP layer behavior depends on
   thier transmission, it is best to avoid anything that might interfere
   with correct operation..

   One should note that when using this option with IPSec, the same
   security association is likely to be processed on multiple CPU cores.
   This requires a good locking design to acheive the desired
   performance improvement.  It also requires much larger replay
   windows.

11.1.  IP Nofification

   Stratus has applied for a patent on this.  Stratus intends to allow
   use of the patent free of charge.  I will be filing the appropriate
   formal notification as soon as I figure out what it is and get it
   signed by the appropriate management.

11.2.  Issues To Resolve

   My original writeup of this put the new IPv6 option in the Hop-by-Hop
   header, because that is always ensured to be a per fragment header.
   The option was moved to the destination options header given the
   advice in section 4.8 of RFC8200 Section 4.5 of RFC8200
   [RFC8200].[RFC8200] explicitly states that there are only the
   following combinations of per fragment headers:

    IPv6 Header
    IPv6 Headar, Hop-by-Hop Header
    IPv6 Header, Destination Options Header, Routing Header
    IPv6 Header, Hop-by-Hop Header, Dest Options Header, Routing Header

   This implies that getting MCPHINTs into a fragmented header will
   require the insertion of a null routing header if one isn't present
   (which is the normal case).



Robinson                Expires 27 November 2022                [Page 7]

Internet-Draft                   MCPHINT                        May 2022


   So, I am wondering if I was mislead by section 4.8 in RFC8200
   [RFC8200] and this option really belongs in the hop-by-hop header?

   I see that some other drafts have picked new values for option
   numbers and instructed the IANA to allocate specific numbers.  I like
   this idea.  Can anyone recommend deprected values which could be
   assigned without getting into trouble?

12.  Normative References

   [RFC0791]  Postel, J., "Internet Protocol", STD 5, RFC 791,
              DOI 10.17487/RFC0791, September 1981,
              <https://www.rfc-editor.org/info/rfc791>.

   [RFC8200]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
              (IPv6) Specification", STD 86, RFC 8200,
              DOI 10.17487/RFC8200, July 2017,
              <https://www.rfc-editor.org/info/rfc8200>.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC2780]  Bradner, S. and V. Paxson, "IANA Allocation Guidelines For
              Values In the Internet Protocol and Related Headers",
              BCP 37, RFC 2780, DOI 10.17487/RFC2780, March 2000,
              <https://www.rfc-editor.org/info/rfc2780>.

Author's Address

   Herb Robinson
   Stratus Technologies, Inc.
   5 Mill & Main Place, Suite 500
   Maynard, Massachusetts 1004
   United States of America
   Email: Herbie.Robinson@stratus.com
   URI:   https://www.stratus.com/













Robinson                Expires 27 November 2022                [Page 8]