Internet DRAFT - draft-mrossberg-ipsecme-multiple-sequence-counters

draft-mrossberg-ipsecme-multiple-sequence-counters







Network                                                      M. Rossberg
Internet-Draft                                                TU Ilmenau
Intended status: Informational                               S. Klassert
Expires: 18 August 2024                                          secunet
                                                             M. Pfeiffer
                                                              TU Ilmenau
                                                        15 February 2024


 Broadening the Scope of Encapsulating Security Payload (ESP) Protocol
         draft-mrossberg-ipsecme-multiple-sequence-counters-02

Abstract

   There are certain use cases where the Encapusalating Security Payload
   (ESP) protocol in its current form cannot reach its maximum potential
   regarding security, features and performance.  Although these
   scenarios are quite different, the shortcomings could be remedied by
   three measures: Introducing more fine-grained sub-child-SAs, adapting
   the ESP header and trailer format, and allowing parts of the
   transport layer header to be unencrypted.  These mechanisms are
   neither completely interdependent, nor are they entirely orthogonal,
   as the implementation of one measure does influence the integration
   of another.  Although an independent specification and implementation
   of these mechanisms is possible, it may be worthwhile to consider a
   combined solution to avoid a combinatorial explosion of optional
   features.

   Therefore, this document does not yet propose a specific change to
   ESP.  Instead, explains the relevant scenarios, details possible
   modifications of the protocol, collects arguments for (and against)
   these changes, and discusses their implications.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."




Rossberg, et al.         Expires 18 August 2024                 [Page 1]

Internet-Draft         Broadening the Scope of ESP         February 2024


   This Internet-Draft will expire on 18 August 2024.

Copyright Notice

   Copyright (c) 2024 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
     1.1.  Requirements Language . . . . . . . . . . . . . . . . . .   3
   2.  Envisioned Scenarios  . . . . . . . . . . . . . . . . . . . .   3
     2.1.  Multicore Software Processing . . . . . . . . . . . . . .   4
     2.2.  Implementing QoS mechanisms . . . . . . . . . . . . . . .   4
     2.3.  Multipath . . . . . . . . . . . . . . . . . . . . . . . .   4
     2.4.  Multicast . . . . . . . . . . . . . . . . . . . . . . . .   4
     2.5.  High-Speed Links  . . . . . . . . . . . . . . . . . . . .   5
     2.6.  Software-Defined Networking (SDN) . . . . . . . . . . . .   5
   3.  Requirements  . . . . . . . . . . . . . . . . . . . . . . . .   5
   4.  Discussion of possible approaches . . . . . . . . . . . . . .   5
     4.1.  Disabling Replay Protection . . . . . . . . . . . . . . .   6
     4.2.  Using multiple IKE SAs  . . . . . . . . . . . . . . . . .   7
     4.3.  Using multiple (per-CPU) child SAs  . . . . . . . . . . .   8
     4.4.  Increasing anti-replay window sizes . . . . . . . . . . .   9
     4.5.  Using Sub-Child SAs . . . . . . . . . . . . . . . . . . .  11
     4.6.  Using an encryption offset  . . . . . . . . . . . . . . .  14
     4.7.  Moving the ESP header . . . . . . . . . . . . . . . . . .  15
     4.8.  Transmitting the sequence number entirely . . . . . . . .  15
     4.9.  Removing the trailer  . . . . . . . . . . . . . . . . . .  15
     4.10. Dropping the 4-byte alignment requirement . . . . . . . .  16
   5.  Remark on steering  . . . . . . . . . . . . . . . . . . . . .  16
   6.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  16
   7.  Security Considerations . . . . . . . . . . . . . . . . . . .  16
   8.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  16
     8.1.  Normative References  . . . . . . . . . . . . . . . . . .  17
     8.2.  Informative References  . . . . . . . . . . . . . . . . .  17
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  18





Rossberg, et al.         Expires 18 August 2024                 [Page 2]

Internet-Draft         Broadening the Scope of ESP         February 2024


1.  Introduction

   This document does not (yet) describe an addition to IPsec.  Rather,
   it attempts to describe scenarios where ESP currently cannot be used
   optimally.  Afterwards, possible solutions for those scenarios are
   discussed and evaluated.

1.1.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in BCP
   14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

2.  Envisioned Scenarios

   Especially, but not limited, to intra-data-center traffic there are
   several challenges when deploying IPsec.  In particular, these
   challenges originate in implementing one of the following techniques,
   or a combination thereof:

   1.  Multicore Software Processing

   2.  Quality-of-Service (QoS)

   3.  Multipath

   4.  Multicast

   5.  High-Speed Links

   6.  Software-Defined Networking (SDN)

   As these challenges are due to the same root causes and can therefore
   solved by the same set of measures, they are commonly addressed in
   this document.  In particular, these root causes are:

   1.  The idea of an SA forming a single stream of packets that is
       generally in-order.

   2.  The concept of a flow as five tuple, including TCP and UDP port
       numbers that are invisible to the network equipment due to ESP's
       encryption.

   3.  The header/trailer format that does not always match current
       hardware's realities (4-byte sequence number, header/trailer
       split, and the 4-byte alignment requirement).



Rossberg, et al.         Expires 18 August 2024                 [Page 3]

Internet-Draft         Broadening the Scope of ESP         February 2024


   Before discussing possible solutions, the following sections will
   elaborate how the techniques above collide with these root causes.

2.1.  Multicore Software Processing

   Due to IPsec being often processed in software, small-packet
   throughputs of significantly above 10Gbit/s are currently only
   achievable when scaling to multiple CPU cores.  However, this scaling
   only works if cores do not have to synchronize tightly.  In
   particular, it is impossible to synchronize anti-replay windows and
   sequence counters efficiently, even when using atomic CPU
   instructions.  Detailed explanations may be found in
   [I-D.pwouters-ipsecme-multi-sa-performance].  Consequently, scaling
   over multiple cores leads to multiple packet streams, one per
   processing core.  These streams may advance independently, and thus
   introduce packet reordering.  This reordering contradicts to the
   concept of an anti-replay window which does not allow for packets
   being too far out of order.  Consequently, packets might be dropped
   unpredictably.

2.2.  Implementing QoS mechanisms

   Similarly, traffic may be categorized into different classes to
   provide quality of service.  QoS classes do not belong to the traffic
   selector of a Child SA.  So using different QoS classes for the same
   traffic selector will introduce reordering of packets within a child
   SA.  In contrast to multicore software processing, this type of
   packet reordering is intentional and not accidental.  The
   consequences, however, are comparable.

2.3.  Multipath

   A sender may also decide to send packets to single receiver via
   multiple paths, e.g., by using multiple uplinks in an SD-WAN
   scenario.  Depending on the characteristics of the uplinks, this
   shows similarities to the multicore scenario (uplinks with relatively
   similar characteristics) or the QoS scenario (uplinks with rather
   different characteristics).

2.4.  Multicast

   A multicast scenario with only a single sender does not pose an
   issue, as the sender can simply increment its sequence counter.  Each
   receiver has a complete view of the traffic and can thus maintain its
   replay window as usual.  But as soon as there are multiple senders,
   they would need to coordinate their sequence number usage, which is
   even less efficiently implementable than in the multicore case.
   Finally, as currently only the lower half of the sequence number is



Rossberg, et al.         Expires 18 August 2024                 [Page 4]

Internet-Draft         Broadening the Scope of ESP         February 2024


   actually transmitted in the packet, a receiver joining late cannot
   guess the respective upper half.  Therefore, replay protection is
   usually disabled in multicast scenarios.

2.5.  High-Speed Links

   Increasing link speeds and basically constant packet sizes lead to
   higher and higher packet rates that need to be processed.  Multi-core
   processing as described in Section 2.1 can only partially compensate
   for this, as, e.g., a single TCP flow cannot be parallelized across
   multiple cores.  Thus, modern implementations must be highly
   optimized to cope with the high packet rates.  However, ESP's split
   header/trailer makes this unnecessarily complicated, as header and
   trailer end up in different cache lines.  Similarly, the alignment to
   a 4-byte boundary is too short for modern architectures.  Please
   refer to [PRGS20] for a more elaborate discussion.

2.6.  Software-Defined Networking (SDN)

   Software-defined networking often uses information from the transport
   header, e.g., the port numbers, for identifying flows, steering and
   microsegmentation.  This can currently not be combined with ESP, as
   this information is encrypted.  On the other hand, in many scenarios
   this intended to avoid leaking flow information.  An adaptable
   approach could cater for both needs.

3.  Requirements

   Besides the obvious requirement of not impairing security the
   following shall be considered:

   1.  Deterministic performance

   2.  Scalability

   3.  Robustness

   4.  Simple implementation

4.  Discussion of possible approaches

   There are several approaches to deal with the presence of multiple
   independent packet streams:

   1.  Disabling replay protection (Section 4.1)

   2.  Using multiple IKE SAs (Section 4.2)




Rossberg, et al.         Expires 18 August 2024                 [Page 5]

Internet-Draft         Broadening the Scope of ESP         February 2024


   3.  Using multiple child SAs (Section 4.3)

   4.  Increasing anti-replay window sizes (Section 4.4)

   5.  Using sub-child SAs (Section 4.5)

   SDN could be enabled by:

   1.  Using an encryption offset(Section 4.6)

   2.  Moving the ESP header(Section 4.7)

   The ESP header/trailer format could be modernized by:

   1.  Transmitting the sequence number entirely(Section 4.8)

   2.  Removing the trailer by moving its fields to the
       header(Section 4.9)

   3.  Dropping the 4-byte alignment requirement(Section 4.10)

4.1.  Disabling Replay Protection

   A straightforward solution would be to simply disable replay
   protection.  For example, PSP was designed without replay protection
   (see [PSP]).

   Advantages:

   *  Trivially solves all the reordering and synchronization issues
      discussed previously.  Note: This may still violate existing RFCs,
      which require sequence numbers to be generated in order, but this
      violation should not have an impact.

   Disadvantages:

   *  The approach significantly lowers the level of security.  Although
      most upper layer protocols (e.g., TCP) provide protection from
      duplicated data, this cannot be assumed for the general case.
      Even if the duplicates are never delivered to a user application,
      they usually do trigger responses from the receivers' network
      stack, e.g., TCP RSTs or ICMP errors.  This in turn enables an
      attacker to trigger ciphertext generation, possibly facilitating
      subsequent attacks.  Such attacks have practically been used
      against WiFi encryption in the early 2000s.






Rossberg, et al.         Expires 18 August 2024                 [Page 6]

Internet-Draft         Broadening the Scope of ESP         February 2024


   *  It is unclear how an SA protecting multiple plaintext flows can be
      distributed to multiple cores on the receiver.  Receive-Side
      Scaling (RSS) or explicit steering rules need some indication
      which packets carry the same plaintext flow and thus need to be
      sent to the same core.  Otherwise, intra-flow reordering is
      introduced, which may severely disturb higher level protocols,
      e.g., TCP's congestion control or VoIP audio streams.  Thus,
      efficient multicore processing is not possible for the receiver.

   This approach may be acceptable for specific scenarios (e.g.,
   multicast), but not for the general case.  It is especially
   problematic for any multicore scenarios, as the status quo without
   parallelization provides replay protection.  This approach is
   therefore not discussed any further.

4.2.  Using multiple IKE SAs

   For some scenarios, it might be reasonable to set up multiple,
   separate IKE SAs.

   Advantages:

   *  As there are independent sequence numbers and anti-replay windows,
      there is no need to synchronize between multiple CPU cores or
      senders.

   *  Distinct SPIs allow RSS or explicit steering, and thus enable
      processing without reordering.

   *  No changes to existing standards required.

   Disadvantages:

   *  There is a time and communication overhead due to the negotiation
      of every IKE SA requiring network round trips, packet processing,
      asymmetric cryptography, etc.  The initial setup could be
      accelerated by a reactive instead of a proactive SA negotiation,
      i.e., delaying the setup of the SA for a specific core or QoS
      class until the first packet arrives on the core or with the
      respective QoS tag.  However, this is a highly debatable strategy,
      as it induces either drops or large delays for the initial packets
      of these flows.

   *  There is a state/memory overhead due to completely separate state
      of every SA, e.g., traffic selectors, keys, lifetimes.  To a large
      extent, these states will hold identical information.





Rossberg, et al.         Expires 18 August 2024                 [Page 7]

Internet-Draft         Broadening the Scope of ESP         February 2024


   *  During operation, there is overhead due to the regular rekeying of
      each SA and, if enabled, dead peer detection.

   *  Additional effort to configure the required number of SAs must be
      made.  Furthermore, monitoring larger networks becomes more
      complex due to the fact that multiple SAs now mapping to identical
      connections.

   *  The failure model is unspecified if a subset of the IKE SAs cannot
      be established.  For example, in the multicore scenario, this
      leads to packet loss or at least performance fluctuations on some
      plaintext flows, depending on the core they are processed on.
      Such situations historically have a bad track record, e.g.,
      partially loading websites with (non-persistent) HTTP; SIP-
      working-but-RTP-failing conditions in VoIP, etc.

   In summary, the main issue of this approach is scalability.  It may
   be appropriate for certain scenarios, where the total number of
   additional IKE SAs is low.  It is not suited for general usage in
   large deployments.  In particular, deploying multiple of the
   techniques described in Section 2 leads to a combinatorial explosion
   of the number of required SAs.  For example, if one intends to
   transport traffic with 8 QoS classes between two gateways with 32
   cores, there would be already 256 SAs solely between these two
   gateways.  Even if the data plane and IKE daemon can support such a
   setup, there may be too much complexity pushed into the operational
   domain.  Therefore, this approach is not generally applicable.

4.3.  Using multiple (per-CPU) child SAs

   This approach has been proposed recently as a draft
   [I-D.pwouters-ipsecme-multi-sa-performance].  The draft is restricted
   to the multicore scenario outlined in Section 2.1.  It is similar to
   establishing multiple IKE SAs, but avoids a significant portion of
   their overhead by restricting the multiple instantiations to child
   SAs.

   Advantages:

   *  There is significantly less overhead compared to setting up
      independent IKE SAs.

   *  As there are independent sequence numbers and anti-replay windows,
      there is no need to synchronize between multiple CPU cores or
      senders.

   *  Distinct SPIs allow RSS or explicit steering, and thus enable
      processing without reordering.



Rossberg, et al.         Expires 18 August 2024                 [Page 8]

Internet-Draft         Broadening the Scope of ESP         February 2024


   *  The draft incurs only a small change in standards and existing
      source code, as multiple child SAs are already possible in IKEv2
      [RFC7296], and the draft simply adds a mechanism to negotiate them
      explicitly.

   Disadvantages:

   *  Due to the setup of child SAs via separate CREATE_CHILD_SA
      exchanges, there is still communications overhead, especially for
      larger numbers of SAs.  As for multiple IKE SAs, both a proactive
      setup or a reactive setup are possible, i.e., resulting in a
      longer establishment time or a less predictable runtime behavior,
      respectively.

   *  There is still some per-child-SA state overhead in the data plane.
      However, as the IKE daemon knows about those SAs being child per-
      Queue children of the same IKE SA, an optimized implementation
      might be able to reduce that overhead to a minimum.

   *  During operation, there is overhead traffic due to the regular
      rekeying.

   *  Similar to separate IKE SAs, there is the possibility of a
      partially working SA if some the child SAs fail to set up.  It is
      not immediately clear what the correct reaction should be,
      especially in the scope of a large VPN deployment, compared to the
      all-or-nothing failure model when parallel child SAs are not used.

   Using multiple child SAs is a significant step forward for the
   multicore scenario.  It is a simple (in the positive sense),
   straightforward solution harvesting low-hanging fruits.  But this
   simplicity inherits some drawbacks from the multiple-IKE-SAs approach
   caused by the independence of the child SAs regarding setup, state,
   rekeying and failure.  These disadvantages get worse the more child
   SAs are required.  Therefore, the per-CPU child SAs approach is not
   an ideal fit to the other scenarios described in Section 2, or a
   combination of the scenarios.

4.4.  Increasing anti-replay window sizes

   This approach differs from the previous two as it does not attempt to
   create multiple replay windows, but to accommodate the traffic within
   a single anti-replay window.  This fits to the QoS scenario depicted
   in Section 2.2 if any higher-prioritized traffic does not advance the
   anti-replay window too far for the lower-prioritized traffic.  The
   idea is not applicable to the multicore or multicast scenarios, as
   larger windows can only solve the problem of packets being reordered
   by the network, but do not allow unsynchronized sequence counters



Rossberg, et al.         Expires 18 August 2024                 [Page 9]

Internet-Draft         Broadening the Scope of ESP         February 2024


   (as, e.g., [RFC4303] requires strict monotonicity).

   Advantages:

   *  No changes to standards are required, as the anti-replay window
      size is a local matter.

   *  The approach inherits the advantages of a single child SA, e.g.,
      there is no setup overhead, less state overhead than with multiple
      child SAs (only the larger replay windows) and no complex failure
      model.

   Disadvantages:

   *  Even in software implementations, the anti-replay windows cannot
      grow indefinitely large.  Especially in latency-sensitive
      deployments, i.e., where one would use QoS, achieving throughput
      above 10 Gbit/s depends on the ability to keep state in the CPU
      caches, even for a larger number of peers.

   *  Complex configuration: Choosing a correct value of for window size
      depends not on only the number of QoS classes, but also on the
      maximum divergence of sequence numbers, which in turn depends on
      the QoS configuration, the possible throughput and the traffic
      mix.

   As discussed previously, this approach is only suitable for the QoS
   and multipath scenarios.  A comparison with other mechanisms requires
   an estimation of the required window sizes.  The time low-priority
   packets may be delayed by shapers and queues depends on many
   parameters, e.g., the actual and admitted traffic rates, the sizes of
   admissible burst, strict-priority scheduling, etc.

   An attempt to simplify the problem is to make windows large enough to
   admit packets that are delayed up to a certain time threshold T.
   Consider a packet being "stuck" in the network due to other packets
   being prioritized.  Those packets advance the replay window.  Let
   their Ethernet size be S and their throughput TP.  It makes sense for
   TP to be an interface speed, otherwise, the delayed packet would not
   be stuck.  We therefore end up with the following packets rates R:











Rossberg, et al.         Expires 18 August 2024                [Page 10]

Internet-Draft         Broadening the Scope of ESP         February 2024


                   +==========+=============+==========+
                   | S [byte] | TP [Gbit/s] | R [Mp/s] |
                   +==========+=============+==========+
                   |       64 |          10 |   14.881 |
                   +----------+-------------+----------+
                   |     1518 |          10 |    0.813 |
                   +----------+-------------+----------+
                   |       64 |         100 |  148.810 |
                   +----------+-------------+----------+
                   |     1518 |         100 |    8.127 |
                   +----------+-------------+----------+

                           Table 1: Packet rates

   For T = 100 ms, this would mean that the windows must, in the worst
   case, accommodate between 80,000 and 14.8 million packets.  It might
   be argued that the higher boundary is currently unrealistic, as it
   would require a 100 Gbit/s link to be saturated with small,
   prioritized packets.  On the other hand, 100 ms is the acceptable
   delay for VoIP, whereas for applications with low priority demands,
   it might make sense to deliver even older packets.

4.5.  Using Sub-Child SAs

   The final possibility is standardizing a new approach that tries to
   combine the advantages of the approaches discussed previously.  In
   essence, it is the idea of allowing multiple sequence counters (and
   thus use multiple anti-replay windows) per child SA.  These sequence
   counters must allow incrementing independently of each other, making
   the approach applicable to all outlined scenarios.  It is also
   possible to think of the individual counter/windows pairs as _sub-
   SAs_ within a child SA.

   First of all, receivers must be able to distinguish those sub-SAs.
   There are multiple possibilities to achieve this:

   *  Using the SPI: The SPI would be allocated per sub-SA, i.e., a
      range of SPIs would belong to a single child SA.  Therefore, it is
      possible to embed, e.g., the ID of the sending core in some bits
      of the SPI.

   *  Using the sequence number: Some bits of the sequence number would
      be used to indicate the sub-SA.  This approach reduces the
      available sequence numbers.  Note that the consequences depend on
      whether the traffic is distributed evenly among the individual
      sub-SAs (e.g., multicore scenario) or not (e.g., QoS scenario).





Rossberg, et al.         Expires 18 August 2024                [Page 11]

Internet-Draft         Broadening the Scope of ESP         February 2024


   *  Using an additional field: Of course, it is also possible to
      introduce a new field to the ESP header.  This can lead to a
      simpler design, but also constitutes the largest change to
      existing standards.  It was proposed by
      [I-D.ponchon-ipsecme-anti-replay-subspaces] in the later versions
      of the draft.

   In any case, the approach necessitates some additional
   clarifications:

   *  The receiver may use the steering capabilities of its NIC to map
      ingress packets to its sub-SAs, e.g., to different queues, to
      allow for efficient multicore utilization.  This is especially
      important for the multicore scenario, as software redirects to
      other cores must be avoided for performance reasons.  The simplest
      case is the sub-SA being encoded in the SPI, as many NICs already
      provide features for matching on SPIs.  For the other two
      distinguishing mechanisms, flexible or raw matchers may be used.

   *  The setup and renewal of sub-SAs should happen in bulk, i.e.,
      there is only one exchange to set up the child SA.  This leads to
      reliable performance characteristics, as there is no on-demand
      sub-SA creation.  Furthermore, the failure model is very simple:
      The child SA with all its sub-SAs exists, or it does not.

   *  Only the sequence counters and anti-replay windows would be
      allocated per sub-SA.

   *  All other properties of the SA are per child SA i.e., traffic
      selectors, mode, but also the key material.  Using the same key
      for all sub-SAs needs to be done with care to avoid effects on
      security (details will follow shortly).  However, if there were
      different keys, neither the scalability (bulk setup and rekeying)
      nor the predictable failure model would be possible.

   Using a single key for multiple sub-SAs has implications on security:

   *  It must be ensured that this approach cannot lead to reused IVs
      for counter modes.  For example, in the case of AES-GCM [RFC4106],
      this means either the salt must be different for each sub-SA, or
      the IV space must be partitioned accordingly.  Note that
      partitioning the IV space is not possible with implicit IV modes
      ([RFC8750]), as [RFC4303] requires sequence numbers to be
      initialized to zero.

   *  Hard limits for packet and byte counters must be scaled
      accordingly.  For example, if no more than 2^64 packets should be
      transmitted using a given key, and the child SA consists of 2^8



Rossberg, et al.         Expires 18 August 2024                [Page 12]

Internet-Draft         Broadening the Scope of ESP         February 2024


      sub-SAs, then every sub-SA must not be allowed to send more than
      2^56 packets, in case no fine-grained synchronization is possible.
      In case transmission happens on the same CPU core, overcommitting
      may be possible as long as the total number of packets or bytes is
      ensured to be never exceeded.

   *  Rekey limits must apply to all sub-SAs combined.  For example, if
      a child SA is configured to be rekeyed after transmission of X
      bytes or Y packets, then the rekey must be triggered if the sum of
      bytes or packets on all sub-SAs reaches X or Y.  For situations
      where overcommitting is not possible, we suggest to reference the
      sub-SA with the maximum number of bytes/packets already sent, say
      X'_max and Y'_max.  X'_max and Y'_max are multiplied with number
      of sub-SAs and if that value exceeds X or Y, a rekeying is
      initiated.

   *  In case SPIs or an explicit header field are used to encode sub-
      SAs it may (theoretically) be possible to send more than 2^64
      packets using a single key.  This may form a problem for ciphers,
      such as AES-GCM.  In this case a hard limit of at most 2^64
      packets MUST be enforced.

   Advantages:

   *  Independent sequence numbers and anti-replay windows are
      available.

   *  The approach allows for RSS or explicit steering, especially if
      the SPI-encoding is used.

   *  Most scalable approach: The child SA setup requires exchanging,
      e.g., an SPI range but does not depend on the number of sub-SAs
      allocated.  Similarly, there is only an ID, sequence counters, and
      an anti-replay window to store per sub-SA.  The remainder of state
      can be shared.

   *  There is no rekeying overhead, as just a single Child SA needs to
      be rekeyed.

   *  Predictable performance characteristics due to the batched,
      proactive establishment.

   *  Clean failure model due to the all-or-nothing setup.

   Disadvantages:

   *  There are potential security implications, which must be discussed
      thoroughly, to avoid weakening security at any point.



Rossberg, et al.         Expires 18 August 2024                [Page 13]

Internet-Draft         Broadening the Scope of ESP         February 2024


   *  The change in the data plane may seem be a bit more complex change
      compared to per-CPU child SAs.  Nevertheless, fallback SAs like
      mentioned in [I-D.pwouters-ipsecme-multi-sa-performance] are
      avoided.

   Compared to setting up separate IKE or child SAs, it might be argued
   that the idea of sub-SAs keeps the complexity and overhead away from
   the VPN's operation.  Furthermore, storing an SPI, a 64-bit sequence
   number, and a replay window for 64 packets for 64 different QoS
   classes requires a total of 10240 bit.  This is significantly less
   than even the lower boundary established for the approach described
   in Section 4.4.  However, of the discussed alternatives, it is the
   most complex change to existing standard and implementation
   semantics.

4.6.  Using an encryption offset

   It would be possible to add an encryption offset to ESP, signalling
   that a number of bytes at the beginning of the packet are not
   encrypted.  Note that they can still be authenticated, e.g., as
   Additional Authenticated Data in modern AEAD modes.  This approach
   was chosen for [PSP].  Usecases for an encryption offset can be found
   in wrapped ESP [RFC5840].  Here an integrity-only ESP approach was
   choosen, but the addressed usecases are the same.  An encryption
   offset can be seen as an improvement over the integrity-only ESP
   approach, as this does not reveal the full payload data to the
   network.

   Advantages:

   *  Enables SDN use cases.

   *  Is optional (zero offset), and can even be applied to just a
      subset of the packets.

   Disadvantages:

   *  Significant change to IPsecs semantics and security guarantees.

   *  Intermediate devices need to implement ESP to parse the header.
      This is a significant issues for flow matching engines implemented
      in hardware.









Rossberg, et al.         Expires 18 August 2024                [Page 14]

Internet-Draft         Broadening the Scope of ESP         February 2024


4.7.  Moving the ESP header

   As more drastic approach, one could insert the ESP header not between
   network and transport layer headers, but, e.g., between transport
   layer header and the payload.  Alternatively, the transport layer
   header could be "copied out".

   Advantages:

   *  Transparent for intermediate devices, i.e., no changes to their
      hardware of software necessary.

   Disadvantages:

   *  Significant change to IPsecs semantics due to the layering
      violation.

   *  The receiver needs to know where the ESP header can be found.
      This is only simple if all senders use the same logic, otherwise,
      a complex negotiation is required.

   *  Transport layer length and checksum fields must be adapted if they
      are checked by any device on the path.

4.8.  Transmitting the sequence number entirely

   Transmitting the entire sequence number makes processing easier, and
   enables receivers to join late in multicast scenarios.  In the later
   versions of [I-D.ponchon-ipsecme-anti-replay-subspaces], it was
   proposed to transmit a larger portion of the sequence number.

   Advantages:

   *  Relatively minor change.

   Disadvantages:

   *  Uses more bandwidth, although at least for counter-based cipher
      modes, this can be compensated by using implicit IVs (see
      [RFC8750]).

4.9.  Removing the trailer

   The trailer could be removed by moving its fields to header.
   Contrary to Ethernet, there appears to be no requirement for "cut-
   through" packet processing.

   Advantages:



Rossberg, et al.         Expires 18 August 2024                [Page 15]

Internet-Draft         Broadening the Scope of ESP         February 2024


   *  Software packet processing benefits from cache locality.

   *  Parsing is simpler as there is no variable-length payload in
      between.

   Disadvantages:

   *  Larger change to the existing packet layout.

4.10.  Dropping the 4-byte alignment requirement

   Dropping the 4-byte alignment does probably not warrant a change of
   ESP on its own.  However, when the ESP frame format is updated for
   other reasons, it is worth considering, as modern architectures and
   their SIMD instructions typically require larger alignment.  Please
   note that the removing the cryptographic padding (which is not
   required for all current AEAD modes) would allow even more
   simplification, but also significantly limit cryptographic agility.

   Advantages:

   *  Relatively minor change.

   Disadvantages:

   *  Enables only minimal simplification of processing on its own.

5.  Remark on steering

   Please note: For any of the sub-child-SA approaches it is essential
   for the receiver to steer traffic being generated by a CPU core of
   the sender to a determined CPU core that handles the incoming
   traffic.  For example, if a two CPU cores at the sender generate
   large amounts of traffic in one QoS class, it is not only sufficient
   to perform RSS on the child SAs or sub-child SAs, as this would not
   avoid the two streams being mapped to the same receiver CPU.

6.  IANA Considerations

   This memo includes no request to IANA.

7.  Security Considerations

   *TODO:* In its current state, this draft discusses multiple
   alternatives.  Please refer to Section 4 for a discussion including
   remarks on security.

8.  References



Rossberg, et al.         Expires 18 August 2024                [Page 16]

Internet-Draft         Broadening the Scope of ESP         February 2024


8.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

   [RFC7296]  Kaufman, C., Hoffman, P., Nir, Y., Eronen, P., and T.
              Kivinen, "Internet Key Exchange Protocol Version 2
              (IKEv2)", STD 79, RFC 7296, DOI 10.17487/RFC7296, October
              2014, <https://www.rfc-editor.org/info/rfc7296>.

   [RFC4303]  Kent, S., "IP Encapsulating Security Payload (ESP)",
              RFC 4303, DOI 10.17487/RFC4303, December 2005,
              <https://www.rfc-editor.org/info/rfc4303>.

   [RFC4106]  Viega, J. and D. McGrew, "The Use of Galois/Counter Mode
              (GCM) in IPsec Encapsulating Security Payload (ESP)",
              RFC 4106, DOI 10.17487/RFC4106, June 2005,
              <https://www.rfc-editor.org/info/rfc4106>.

   [RFC8750]  Migault, D., Guggemos, T., and Y. Nir, "Implicit
              Initialization Vector (IV) for Counter-Based Ciphers in
              Encapsulating Security Payload (ESP)", RFC 8750,
              DOI 10.17487/RFC8750, March 2020,
              <https://www.rfc-editor.org/info/rfc8750>.

8.2.  Informative References

   [I-D.pwouters-ipsecme-multi-sa-performance]
              Antony, A., Brunner, T., Klassert, S., and P. Wouters,
              "IKEv2 support for per-queue Child SAs", Work in Progress,
              Internet-Draft, draft-pwouters-ipsecme-multi-sa-
              performance-05, 8 November 2022,
              <https://datatracker.ietf.org/doc/html/draft-pwouters-
              ipsecme-multi-sa-performance-05>.











Rossberg, et al.         Expires 18 August 2024                [Page 17]

Internet-Draft         Broadening the Scope of ESP         February 2024


   [I-D.ponchon-ipsecme-anti-replay-subspaces]
              Ponchon, P., Shaikh, M., Dernaika, H., Pfister, P., and G.
              Solignac, "IPsec and IKE anti-replay sequence number
              subspaces for traffic-engineered paths and multi-core
              processing", Work in Progress, Internet-Draft, draft-
              ponchon-ipsecme-anti-replay-subspaces-03, 23 October 2023,
              <https://datatracker.ietf.org/doc/html/draft-ponchon-
              ipsecme-anti-replay-subspaces-03>.

   [PRGS20]   Pfeiffer, M., Rossberg, M., Girlich, F., and G. Schaefer,
              "Vector Packet Encapsulation: The Case for a Scalable
              IPsec Encryption Protocol",
              <https://doi.org/10.1145/3407023.3407060>.

   [PSP]      "PSP Architecture Specification",
              <https://github.com/google/psp/blob/main/doc/
              PSP_Arch_Spec.pdf>.

   [RFC5840]  Grewal, K., Montenegro, G., and M. Bhatia, "Wrapped
              Encapsulating Security Payload (ESP) for Traffic
              Visibility", RFC 5840, DOI 10.17487/RFC5840, April 2010,
              <https://www.rfc-editor.org/info/rfc5840>.

Authors' Addresses

   Michael Rossberg
   Technische Universität Ilmenau
   Helmholtzplatz 5
   98693 Ilmenau
   Germany
   Email: michael.rossberg@tu-ilmenau.de


   Steffen Klassert
   secunet Security Networks AG
   Ammonstrasse 74
   01067 Dresden
   Germany
   Email: steffen.klassert@secunet.com


   Michael Pfeiffer
   Technische Universität Ilmenau
   Helmholtzplatz 5
   98693 Ilmenau
   Germany
   Email: michael.pfeiffer@tu-ilmenau.de




Rossberg, et al.         Expires 18 August 2024                [Page 18]