Internet DRAFT - draft-eckert-pim-rts-forwarding

draft-eckert-pim-rts-forwarding







PIM                                                       T. Eckert, Ed.
Internet-Draft                                Futurewei Technologies USA
Intended status: Experimental                                   M. Menth
Expires: 5 September 2024                                     S. Lindner
                                                 University of Tuebingen
                                                            4 March 2024


   Stateless Multicast Replication with Segment Routed Recursive Tree
                            Structures (RTS)
                   draft-eckert-pim-rts-forwarding-01

Abstract

   BIER provides stateless multicast in BIER domains using bitstrings to
   indicate receivers.  BIER-TE extends BIER with tree engineering
   capabilities.  Both suffer from scalability problems in large
   networks as bitstrings are of limited size so the BIER domains need
   to be subdivided using set identifiers so that possibly many packets
   need to be sent to reach all receivers of a multicast group within a
   subdomain.

   This problem can be mitigated by encoding explicit multicast trees in
   packet headers with bitstrings that have only node-local
   significance.  A drawback of this method is that any hop on the path
   needs to be encoded so that long paths consume lots of header space.

   This document presents the idea of Segment Routed Recursive Tree
   Structures (RTS), a unifying approach in which a packet header
   representing a multicast distribution tree is constructed from a tree
   structure of vertices ("so called Recursive Units") that support
   replication to their next-hop neighbors either via local bitstrings
   or via sequence of next-hop neighbor identifiers called SIDs.

   RTS, like RBS is intended to expand the applicability of deployment
   for stateless multicast replication beyond what BIER and BIER-TE
   support and expect: larger networks, less operational complexity, and
   utilization of more modern forwarding planes as those expected to be
   possible when BIER was designed (ca. 2010).

   This document only specifies the forwarding plane but discusses
   possible architectural options, which are primarily determined
   through the future definition/mapping to encapsulation headers and
   controller-plane functions.







Eckert, et al.          Expires 5 September 2024                [Page 1]

Internet-Draft                   pim-rts                      March 2024


Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 5 September 2024.

Copyright Notice

   Copyright (c) 2024 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
   2.  Overview  . . . . . . . . . . . . . . . . . . . . . . . . . .   4
     2.1.  From BIER to RTS  . . . . . . . . . . . . . . . . . . . .   4
       2.1.1.  Example topology and tree . . . . . . . . . . . . . .   4
       2.1.2.  IP Multicast  . . . . . . . . . . . . . . . . . . . .   4
       2.1.3.  BIER  . . . . . . . . . . . . . . . . . . . . . . . .   5
       2.1.4.  BIER-TE . . . . . . . . . . . . . . . . . . . . . . .   6
       2.1.5.  RTS . . . . . . . . . . . . . . . . . . . . . . . . .   6
       2.1.6.  Summary and Benefits of RTS . . . . . . . . . . . . .   8
   3.  Architecture  . . . . . . . . . . . . . . . . . . . . . . . .   9
   4.  Specification . . . . . . . . . . . . . . . . . . . . . . . .  10
     4.1.  RTS Encapsulation . . . . . . . . . . . . . . . . . . . .  10
     4.2.  RTS Addressing  . . . . . . . . . . . . . . . . . . . . .  11
     4.3.  RTS Header  . . . . . . . . . . . . . . . . . . . . . . .  12
       4.3.1.  RU-Params . . . . . . . . . . . . . . . . . . . . . .  12



Eckert, et al.          Expires 5 September 2024                [Page 2]

Internet-Draft                   pim-rts                      March 2024


       4.3.2.  RU-Params Parsing and Semantics . . . . . . . . . . .  14
       4.3.3.  Creating and Receiving copies . . . . . . . . . . . .  18
       4.3.4.  Creating copies because of RTS Header d=1 . . . . . .  18
       4.3.5.  Creating copies because of RTS Header b=1 . . . . . .  18
   5.  Discussion  . . . . . . . . . . . . . . . . . . . . . . . . .  19
     5.1.  Encoding Efficiency vs. decoding challenges . . . . . . .  19
     5.2.  Encapsulation considerations  . . . . . . . . . . . . . .  20
       5.2.1.  Comparison with BIER header and forwarding  . . . . .  20
       5.2.2.  Comparison with IPv6 extension headers  . . . . . . .  21
     5.3.  Encoding choices and complexity . . . . . . . . . . . . .  21
       5.3.1.  SID vs bitstrings in recursive trees  . . . . . . . .  21
       5.3.2.  Limited per-node functionality  . . . . . . . . . . .  22
     5.4.  Differences over prior Recursive BitString (RBS) encodings
           proposal  . . . . . . . . . . . . . . . . . . . . . . . .  23
   6.  Security considerations . . . . . . . . . . . . . . . . . . .  24
   7.  Acknowledgments . . . . . . . . . . . . . . . . . . . . . . .  24
   8.  Changelog . . . . . . . . . . . . . . . . . . . . . . . . . .  24
     8.1.  -00 . . . . . . . . . . . . . . . . . . . . . . . . . . .  24
     8.2.  -01 . . . . . . . . . . . . . . . . . . . . . . . . . . .  24
   9.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  25
     9.1.  Normative References  . . . . . . . . . . . . . . . . . .  25
     9.2.  Informative References  . . . . . . . . . . . . . . . . .  26
   Appendix A.  Evolution to RTS . . . . . . . . . . . . . . . . . .  28
     A.1.  Research work on BIER . . . . . . . . . . . . . . . . . .  29
     A.2.  Initial RBS from CGM2 . . . . . . . . . . . . . . . . . .  29
     A.3.  RBS scalability compared to BIER  . . . . . . . . . . . .  30
     A.4.  Discarding versus offset pointers . . . . . . . . . . . .  30
     A.5.  Encapsulations for IPv6-only networks . . . . . . . . . .  31
     A.6.  SEET and BEET . . . . . . . . . . . . . . . . . . . . . .  31
   Contributors  . . . . . . . . . . . . . . . . . . . . . . . . . .  31
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  32

1.  Introduction

   Please see {#version01} for a summary of changes over the prior
   version of this document.

   This draft expands on prior experimental work called "Recursive
   BitString Structure" (RBS) for stateless multicast replication with
   source routed data structures in the header of multicast data
   packets.  Its changes and enhancements over RBS are a result from
   further scalability analysis and further matching against different
   use cases.  Its proposed design also includes Proof of Concept work
   on high-speed, limited functionality programmable forwarding plane
   engines in the P4 programming language.






Eckert, et al.          Expires 5 September 2024                [Page 3]

Internet-Draft                   pim-rts                      March 2024


   RTS, like RBS is intended to expand the applicability of deployment
   for stateless multicast replication beyond what BIER and BIER-TE
   support and expect: larger networks, less operational setup
   complexity, and utilization of more flexible programmable forwarding
   planes as those expected to be possible when BIER was designed (ca.
   2010).

   Unlike RBS, RTS does not limit itself to a design that is only based
   on the use of local bitstrings but instead offers both bitstring and
   SID based addressing inside the recursive tree structure to support
   to allow more scalability for a wider range of use cases.

2.  Overview

2.1.  From BIER to RTS

2.1.1.  Example topology and tree

             Src                         Src
              |                           ||
              R1                          R1
             /  \                       //  \\
            R2   R3                     R2   R3
           /  \ /  \                  //  \ /  \\
          R5  R6    R7                R5  R6    R7
         /  \ | \  /  \             // \\ | \ //  \\
       R8    R9  R10  R11          R8    R9  R10  R11
       |     |    |    |           ||    ||   ||   ||
      Rcv1 Rcv2  Rcv3 Rcv4        Rcv1 Rcv2  Rcv3 Rcv4

       Example Network            Example BIER-TE / RTS Tree,
         Topology               // and || indicate tree segments

                    Figure 1: Example topology and tree

   The following explanations use above example topology in Figure 1 on
   the left, and example tree on the right.

2.1.2.  IP Multicast

   Assume a multicast packet is originated by Src and needs to be
   replicated and forwarded to be received by Rcv1...Rcv4.  In IP
   Multicast with PIM multicast routing, router R1...R11 will have so-
   called PIM multicast tree state, especially the intermediate routers
   R2...R7.  Whenever an IP Multicast router has multiple upstream
   routers to choose from, then the path election is based on routing
   RPF, so the routing protocol on R9 would need to route Src via R5,
   and R10 would need to route Src via R7 to arrive at the tree shown in



Eckert, et al.          Expires 5 September 2024                [Page 4]

Internet-Draft                   pim-rts                      March 2024


   the example.

2.1.3.  BIER

   In stateless multicast forwarding with Bit Index Explicit Replication
   (BIER), [RFC8279], a packet has a header with a bitstring, and each
   bit in the bitstring indicates one receiver side BIER router (BFER).

   [R8:5 R9:9 R10:11 R11:17] =

   00001000001000001000000000000000000000000

                      Figure 2: Example BIER bitstring

   In Figure 2, the term [Ri:bi...] (i=5,9,10,11; bi=5,9,11,17)
   indicates the routers "Ri" that have their associated bit in the
   bitstring number "bi" set.  In this example, the bitstring is assumed
   to be 42 bit long.  The actual length of bitstring supported depends
   on the header, such as [RFC8296] and implementation.  The assignment
   of routers to bits in this example is random.

   With BIER, there is no tree state in R2...R7, but the packet is
   forwarded from R2 across these routers based on those "destination"
   bits bi and information of the hop-by-hop IP routing protocol, e.g.:
   IS-IS or OSPF.  The intervening routers traversed therefore also
   solely depend on that routing protocols routing table, and as in IP
   multicast, there is no guarantee that the shown intermediate hops in
   the example picture are chosen if, as shown there are multiple equal
   cost paths (e.g.: src via R10->R6->R3 and R10->R7->R3).

   The header and hence bitstring size is a limiting factor for BIER and
   any source-routing.  When the network becomes larger, not all
   receiver side routers or all links in the topology can be expressed
   by this number of bits.  A network with 10,000 receivers for example
   would require at least 40 different bitstrings of 256 bits to
   represent all receiver routers with separate bits.  In addition, the
   packet header needs to indicate which of those 40 bitstrings is
   contained in the packet header.

   When then receiver routers in close proximity in the topology are
   assigned to different bitstrings, then the path to these receivers
   will need to carry multiple copies of the same packet payload,
   because each copy is required to carry a different bitstring.  In the
   worst case, even as few as 40 receivers may require still 40 separate
   copies, as if unicast was used - because each of the 40 bits is
   represented in a different bitstring.





Eckert, et al.          Expires 5 September 2024                [Page 5]

Internet-Draft                   pim-rts                      March 2024


2.1.4.  BIER-TE

   In BIER with Tree Engineering (BIER-TE), [RFC9262], the bits in the
   bitstring do not only indicate the receiver side routers, but also
   the intermediate links in the topology, hence allowing to explicitly
   "engineer" the tree, for purposes such as load-splitting or bandwidth
   guarantees on the tree.

   [R1R2:4 R2R5:10 R5R8:15 R5R9:16 R1R3:25 R3R7:32 R7R10:39 R7R11:42]

   000100000100001100000000100000010000001001

                    Figure 3: Example BIER-TE bitstring

   In Figure 3, the list of [RxRy:bi...] indicates the set of bits
   needed to describe the tree in Figure 1, using the same notation as
   in Figure 2.

   Each RxRy indicates one bit in the bitstring for the link Rx->Ry.
   The need to express every link in a topology as a separate bit makes
   scaling even more challenging and requiring more bitstrings to
   represent a network than BIER does, but in result of this
   representation, BIER-TE allows to explicitly steer copies along the
   engineered path, something required for services that provide traffic
   engineering, or when non-equal-cost load splitting is required
   (without strict guarantees).

2.1.5.  RTS

   With Recursive Tree Structure (RTS) encoding, the concept of steered
   forwarding from BIER-TE is modified to actually encode the tree
   structure in the header as opposed to just one single "flat"
   bitstring out of a large number of such bitstrings (in a large
   network).  For the same tree as above, the structure in the header
   will logically look as follows.
















Eckert, et al.          Expires 5 September 2024                [Page 6]

Internet-Draft                   pim-rts                      March 2024


   Syntax:
     RU  = SID { :[  NHi+ ] }
     NHi = SID
     SID = Ri

   Example tree with SID list on R1:
     R1 :[ R2 :[ R5 :[ R8   ,R9   ]], R3 :[R7 :[R10,  R11]]]

   Semantic:
     R1 replicates to neighbors R2, R3.
     R2 replicates to R5
     R3 replicates to R7
     ...

   Encoding structure:
     1 byte SID always followed by
     1 byte length of recursive structure length (":[" in example)
       If no recursive structure follows, length is 0.

   Example SID list serialization (decimal):

     R1 :[ R2 :[ R5 :[ R8   ,R9   ]], R3 :[ R7 :[R10,  R11 ]]]
      |  |  |  |  |  |  | |   | |      | |   | |   | |   | |
      v  v  v  v  v  v  v v   v v      v v   v v   v v   v v

      ..........SIDs according to above example..........
      |     |     |     |     |        |     |     |     |
     01 16 02 06 05 04 08 00 09 00    03 06 07 04 10 00 11 00
         |     |     |    |     |        |     |     |     |
         ......................Length fields................

   Tree with SID list on R2:
     R2 :[ R5 :[ R8   ,R9   ]]

                 Figure 4: Example RTS structure with SIDs

   In the example the simplified RTS tree representation in Figure 4,
   Rx:[NH1,... NHn] indicates that Rx needs to replicate the packet to
   NH1, NH2 up to NHn.  This [NH1,... NHn] list is called the SID-list.
   Each NH can again be a "recursive" structure Rx:[NH1',...NHn'], such
   as R5, or a leaf, such as R8, R9, Ro10, R11.

   A simplified RTS serialization of this structure for the packet
   header is also shown: Each router Ri is represented by am 8-bit SID
   i.  The length of the following SID list, :[NHi,...NHn], is also
   encoded in one byte.  If no SID list follows, it is 00.





Eckert, et al.          Expires 5 September 2024                [Page 7]

Internet-Draft                   pim-rts                      March 2024


   When a packet copy is made for a next-hop, only the relevant part of
   the structure is kept in the header as shown for R2.

   Example tree with bitstrings on R1:
     BS1 :[ BS2 :[ BS5 :[ BS8,  BS9  ]], BS3  :[BS7 :[BS10, BS11]]]

   Example bitstring serialization (decimal):

      ....List of next-hops indicated by the BitStrings.........
      |       |    |       |     |        |      |       |     |
     R2,R3   R5   R8,R9   Rcv   Rcv      R7     R10,R11 Rcv   Rcv
      |       |    |       |     |        |      |       |     |
     06 16   02 06 05 04  01 00 01 00    02  06 06  04  01 00 11 00
         |       |     |      |     |         |      |      |     |
         ......................Length fields.......................

   Example tree with bitstrings on R2:
     BS2 :[ BS5 :[ BS8,  BS9  ]]

              Figure 5: Example RTS structure with bitstrings

   Instead of enumerating for each router the list of next-hop neighbors
   by their number (SID), RTS can also use a bitstring on each router,
   resulting in a potentially more compact encoding.  Scalability
   comparison of the two encoding options is discussed later in the
   document.  Unlike BIER/BIER-TE bitstrings, each of these bitstring
   will be small, as it only needs to indicate the direct neighbors of
   the router for which the bitstring is intended.

   In Figure 5, the example tree is shown with this bitstring encoding,
   also simplified over the actual RTS encoding.  BSi indicates the
   bitstring for Ri as an 8-bit bitstring.  On R8, R9, R10, R11 this
   bitstring has bit 1 set, which is indicating that these routers
   should receive ("Rcv") and decapsulate the packet.

2.1.6.  Summary and Benefits of RTS

   In BIER for large networks, even small number of receivers may not
   fit into a single packet header, such as aforementioned when having
   10,000 receiver routers with a bitstring size of 256.  BIER always
   requires to process the whole bitstring, bit-by-bit, so longer
   bitstrings may cause issues in the ability of routers to process
   them, even if the actual length of the bitstring would fit into
   processable packet header memory in the router.

   In BIER-TE, these problems are even more pronounced because the
   bitstrings now need to also carry bits for the intermediate node
   hops, which are necessary whenever the path for a packet need to be



Eckert, et al.          Expires 5 September 2024                [Page 8]

Internet-Draft                   pim-rts                      March 2024


   explicitly predetermined such as in traffic engineering and global
   network capacity optimization through non-equal cost load-balancing,
   which in unicast is also a prime reason for deployment of Segment
   Routing.

   These scalability problems in BIER and BIER-TE can be reduced by
   intelligent allocation of bits to bitstrings, but this requires
   global coordination, and for best results good predictions of the
   most important required future multicast trees.

   In RTS, no such network wide intelligent assignment of addresses is
   required, and any combination of receiver routers can be put into a
   single packet header as long as the maximum size of the header is not
   exceeded (including of course the intermediate nodes along the path).

   Unlike Bier/BIER-TE, the RTS header can likely on many platforms be
   larger than a BIER/BIER-TE bitstring, because the router never needs
   to examine every bit in the header, but only the (local) bitstring or
   list of SIDs for this router itself and then for each copy to a
   neighbor, it only needs to copy the recursive structure for that
   neighbor.  The only significant limit for RTS in processing is hence
   the maximum amount of bytes in a header that can be addressed.

3.  Architecture

   This version of the document does not specify an architecture for
   RTS.

   The forwarding described in this document can allow different
   architectures, also depending on the encapsulation chosen.  The
   following high-level architectural considerations and possible goals/
   benefits apply:

   (A) If embedding RTS in an IP or IPv6 source-routing extension
   header, RTS can provide source-routing to eliminate stateful (IP)
   Multicast hop-by-hop tree building protocols such as PIM.  This can
   be specifically attractive in use cases that previously used end-to-
   end IP Multicast without a more complex P/PE architecture, such as
   enterprises, industrial and other non-SP networks.

   (B) The encoding of the RTS multicast tree in the packet header makes
   it natural to think about RTS providing a multicast "Segment Routing"
   architecture style service with stateless replication segments: Each
   recursive structure is an RTS segment.

   This too can be a very attractive type of architecture to support,
   especially for networks that already use MPLS or IPv6 Segment Routing
   for unicast.  Nevertheless, RTS can also be beneficial in SP networks



Eckert, et al.          Expires 5 September 2024                [Page 9]

Internet-Draft                   pim-rts                      March 2024


   not using unicast Segment Routing, and there are no dependencies for
   networks running RTS to also support unicast SR, other than sharing
   architecture concepts.

   (C) RTS naturally aligns with many goals and benefits of BIER and
   even more so BIER-TE, which it could most easily supersede for better
   scalability and ease of operations.

   In one possible option, the RTS header specified in this document
   could even replace the bitstring of the BIER [RFC8296] header,
   keeping all other aspects of BIER/BIER-TE reusable.  In such an
   option, the architectural aspects of RTS would be derived and
   simplified from [RFC9262], similar to details described in
   [I-D.eckert-bier-cgm2-rbs-01].

4.  Specification

4.1.  RTS Encapsulation

   +----------+--------+------------+
   | Encap    | RTS    | Next Proto |
   | Header(s)| Header | Payload    |
   +----------+--------+------------+

                        Figure 6: RTS encapsulation

   This document specifies the formatting and functionality of the
   "Recursive Tree Structure" (RTS) Header, which is assumed to be
   located in a packet between some Encap Header and some Next Proto /
   Payload.

   The RTS header contains only elements to support replication to next-
   hops, not any element for forwarding to next-hop.  This is left as a
   task for the Encap Header so that RTS can most easily be combined
   with potentially multiple alternative Encapsulation Header(s) for
   different type of network protocols or deployment use cases.  Common
   Encap Headers will also require an Encap Header specific description
   of the total length of the RTS Header.

   In a minimum (theoretical) example, RTS could be used on top of
   Ethernet with an ethertype of RTS+Payload, which indicates not only
   that an RTS Header follows, but also the type of the Next Proto
   Payload.

   See the encap discussions in Section 5.2 for considerations regarding
   BIER or IPv6 extension headers as Encap Headers.





Eckert, et al.          Expires 5 September 2024               [Page 10]

Internet-Draft                   pim-rts                      March 2024


4.2.  RTS Addressing

   Addresses of next-hops to which RTS can replicate are called RTS
   Segment IDentifiers (SIDs).  This is re-using the terminology
   established by [RFC8402] to be agnostic of the addressing of the
   routing underlay used for forwarding to next-hops and obtaining
   routing information for those routing underlay addresses.  Specifying
   an encapsulation for RTS requires specifying how to map RTS SIDs to
   addresses of the addresses used by that (unicast) forwarding
   mechanism.

   RTS SIDs are more accurately called RTS replication SIDs.  They are
   assigned to RTS nodes.  When a packet is directed to a particular RTS
   SID of an RTS node it means that that node needs then to process the
   RTS Header and perform replication according to it.

   Using the SR terminology does not mean that RTS is constrained to be
   used with forwarding planes for which (unicast) SR mappings exist:
   IPv6 and MPLS, but it means that for other forwarding planes,
   mappings need to be defined.  For example, when using RTS with
   [RFC8296] encapsulation, and hence BIER addressing, which is relying
   on 16-bit BFR-id addressing (especially the BFIR-id in the [RFC8296]
   header), then RTS SIDs need to map to these BFR-ids.

   If instead RTS is to be deployed with (only) an IPv6 extension header
   as the Encap Header, then RTS SIDs need to be mapped to IPv6 SIDs.

   This document uses three types of RTS SIDs to support three type of
   encoding of next-hops in an RTS Header: Global, Local and Local
   bitstring RTS SIDs.

   All SIDs map to a unicast address or unicast SID of the node which
   the RTS SID addresses.  This unicast address or SID is used in an
   Encap Header when sending an RTS packet to that node.

   SIDs can be local or global in scope.  All nodes only have one RTS
   SID table, and it is purely a matter or the semantic assigned to a
   SID whether it is local or global (as it is in SR).

   There are two encoding lengths for SIDs, 10 and 18 bit.  It may hence
   be beneficial to fit all local SID into the lower 10 bit of the SID
   address table so they can use the so-called short SID encoding (10
   bit).

   The network is expected to make SID information available to the
   creators of RTS headers so they can create one or more RTS headers to
   achieve the desired replication tree(s) for a payload.  This
   includes:



Eckert, et al.          Expires 5 September 2024               [Page 11]

Internet-Draft                   pim-rts                      March 2024


   *  global SIDs and the nodes they map to.

   *  Semantic of local SIDs on each node to optimize RTS headers by use
      of local SIDs

   *  Capabilities of each node to understand which subset of the RTS
      syntax can be encoded in the so-called "Recursive Unit" for this
      node.

   *  For each node its "all-leaf-neighbors" list of global SIDs (see
      {#all-leaf-neighbors})

4.3.  RTS Header

   +--------------------------------------------+
   | RU0                                        |
   |+---------++--------++-------+     +-------+|
   ||RU-Params|| RU-NH1 ||RU-NH2 | ....|RU-NHn ||
   |+---------++--------++-------+     +-------+|
   |           |<------- RU-list (optional) -->||
   +--------------------------------------------+

                            Figure 7: RTS Header

   The RTS Header consists of a recursive structure of "Recursive Units"
   abbreviated "RU"s.  The whole header itself is called RU0.  Every RU
   is composed of RU-Params and an optional list of RU for next-hops
   called the RU-list.

4.3.1.  RU-Params

 0 1 2 3 4 5                 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7   BSL * 8    rle(RULL)
+-+-+-+-+-+-+- ............ +- ........... -+- ........... -+- ....... -+- ...
|b|d|S|L|B|R| SID (3/10/18) | RULL (0/8)    |    BSL  | SD  | BitString | RU-List
+-+-+-+-+-+-+- ............ +- ........... -+- ........... -+- ....... -+- ...
 |<-Flags ->|

                         Figure 8: RU-Format

   Flags:

   b)roadcast: The node should broadcast copies of this packet to all
   direct leaf neighbors if set to 1, else not

   d)eliver: The node should receive a copy of the packet itself if set
   to 1, else not.





Eckert, et al.          Expires 5 September 2024               [Page 12]

Internet-Draft                   pim-rts                      March 2024


   S)ID: A SID is present if set to 1, else not.  If no SID is present,
   the flags are followed by pad bits so that the following field will
   start at a byte boundary.

   L)ength: The SID is "Long" (18 bit), else short (10 bit).

   B)itString: If set to 1, a BitString is present as well as the BLE/SD
   fields, else all three fields are not present.

   R)U-List: An RU-List and its length field, the RULL are present.  If
   0, both are not present.

   SIDs:

   As described by the S)ID and L)ength flags, when present, the SID can
   be 10 or 18 bit long.  These are called "short" and "long" SIDs.
   They both address the same single SID table in the node.  Short SIDs
   are just an encoding optimization.  Any SID may be "local" or
   "global".  A SID is "global" when it points to the same node
   consistently.  It is "local" when each node has it's own, potentially
   different meaning for it.  Typically, SIDs fitting into a short SID
   will preferably be local SIDs such as those pointing to direct
   (subnet) neighbors.

   BitString / BSL / SD:

   The BitString Length field (BSL) indicates the length of the
   BitString field in bytes.  It permits length of 0-32 bytes.  SD is a
   SubDomain, and allows up to 8 different BitStrings for each length.

   RU-List / RULL:

   RU-List Length (RULL) is the length of the RU-List field.  To allow
   RU-Lists longer than 256 bytes, the encoding of RULL uses a simple
   encoding: RULL values <= 127 indicate the length of RU-List in bytes,
   values >= 128 indicate a length of (127 + (RULL - 127) * 4).  This
   allows RU-List length of up to 640 bytes at the expense of up to 3
   padding bytes at the end of an RU-List required when encoding RU-
   Lists with length > 127 bytes.

   Note: RULL is placed before BSL/SD and BitString so that the offset
   of RULL inside of RU is fixed and does not depend on the length of
   BitString (if present).  This is beneficial because both SID and RULL
   need to be parsed by prior-hop nodes when this RU is in an RU-list.
   See the following explanations for more details.






Eckert, et al.          Expires 5 September 2024               [Page 13]

Internet-Draft                   pim-rts                      March 2024


4.3.2.  RU-Params Parsing and Semantics

   An RU representing some specific nodeRU is parsed up to two times.
   The first time, the RU may be present in the RU-List of the prior-hop
   node to nodeRU, and the second time it is parsed by nodeRU itself.

4.3.2.1.  Bitstring replication to leaf nodes

   Figure 9 shows the most simple example with bitstrings, the RU for a
   node that should use a local BitString to replicate to only lead
   nodes

    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5  BSL * 8
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- ....... -+
   |0|0|0|0|1|0|0|0|    BSL  | SD  | BitString |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- ....... -+
    b d S L B R p p
    |<-Flags ->|

               Figure 9: BitString replication to leaf nodes

   We assume the node is just transit node, so the packet is neither to
   be broadcast to all leaf neighbors nor received, hence b=0, d=0.

   The RU has no SID, so S=0, L=0, and hence the flags are followed by
   padding bits set to 0.

   B=1 indicates a bitstring follows and R=0 indicates that the RU has
   no RU-list, and hence no RULL.  If the node's packet parser requires
   any non-required field to be present, then those would have to be
   included and set to 0.

   Note that this RU can only be used if this is the the root of the
   tree or the prior hop node was also replicating to this node via a
   BitString.  If the prior hop node was replicating via a SID-List,
   then this RU would include a SID, but that SID would be ignored,
   because it was only included for the benefit of the prior hop node.

   BSL and SD indicate the BitString and its length.

   When the node replicates to the bits in the BitString, it operates
   pretty much like BIER-TE, aka: each bit indicates an adjacency, most
   likely direct, subnet-local neighbors.  For every bit, a copy is
   being made.

   For the copy to a leaf neighbor, the RTS packet is rewritten to a
   simple form shown in Figure 10.




Eckert, et al.          Expires 5 September 2024               [Page 14]

Internet-Draft                   pim-rts                      March 2024


    0 1 2 3 4 5 6 7
   +-+-+-+-+-+-+-+-+
   |b|d|0|0|0|0|0|0|
   +-+-+-+-+-+-+-+-+
    b d S L B R p p
    |<-Flags ->|

                           Figure 10: RU for leaf

   The whole RTS header consists of a single by RU indicating simply
   whether the receiving node should b)roadcast and/or d)eliver the
   packet.  Both bits are derived from the BIFT (Bit Index Forwarding
   Table) of the replicating node.

4.3.2.2.  SID-list replication to leaf nodes

   Replication to a list of SID on a node requires an RU on the node as
   shown in Figure 11.

                      1                   2
    0 1 2 3 4 5 6 7 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ...........
   |0|0|0|0|0|1|0|0| RULL (8)      | RU-List
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ...........
    b d S L B R p p
    |<-Flags ->|

                Figure 11: RU-List replication to leaf nodes

   The SIDs of nodes to replicate to are always inside the neighbors RU,
   hence SID-list replication means replication with an RU-List and
   hence also an RULL field.

   The b and d field for the replicating nodes RU itself determine only
   whether it should also broadcast and/or receive the packet, so their
   setting is irrelevant to the RU-List operation, so we assume in this
   example both are 0.

   If the prior hop node was also replicating via an RU-List, this RU
   would also require a SID, but if this node is the root of the tree or
   the prior hop was replicating via a BitString, it does not require a
   SID.  We show this simpler case.  S=0, L=0 mean that instead of a
   SID, there are just two padding bits p before RULL.

   Assume all the neighbors to replicate to are direct neighbors, so we
   encode local SIDs that fit into short, 10-bit long SIDs.  In result,
   the RU-list is a sequence of 16-bit long RUs, one each for each
   neighbor to replicate to, as shown in Figure 12.  And if the neighbor



Eckert, et al.          Expires 5 September 2024               [Page 15]

Internet-Draft                   pim-rts                      March 2024


   was a couple of hops in the network topology away, one would use a
   global SID, in which case it likely might require a long SID
   encoding, and hence 24-bit for the RU.

                        1
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |0|1|1|0|0|0| SID (10)          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    b d S L B R
    |<-Flags ->|

                      Figure 12: Leaf-node RU with SID

   When the packet with the neighbors RU reaches that neighor, the RU is
   RU0, aka: the complete RTS header.

   In this encoding, unlike the encoding of the prior version of this
   document, the SID itself carries no other semantic than the node it
   is targeted for.

   When RTS is used (like BIER) as a standalone L3 network layer, this
   means that intermediate RTS capable, but not replicating nodes could
   simply unicast forward the packet to the node indicated by the RU0,
   similar to how loose hops operate in IPv6 with SRH.  But that only
   works when the SID of RU0 is a global SID.  So the replicating node
   may want to rewrite the SID in the RU to a global SID (including
   adjusting the length) when such a mode of operation is desirable.
   However, when operating RTS inside of IPv6, this loose-hop role would
   be served by an outer IPv6 header, and the RTS header would become
   another routing header.  And the SID serves no value anymore, but can
   be replaced by a special value such as 0, or the RU be rewritten to
   not include a SID but just 2 bits of padding.

   Likewise, if the SID used in the RU encoding is a local SID, then its
   semantic will likely be different on the receiving node, so if its
   needed for loose hop routing, it would need to be rewritten to a
   global SID.  If used inside IPv6 or another network layer, local SIDs
   in RU0 should be removed or replaced by 0.

4.3.2.3.  RU with SID-List and BitString

   When would an RU require a SID and a BitString ? This is exactly the
   case when the prior-hop node was replicating based on RU-List,
   because this requires a SID in the next-hop RU, but that next-hop RU
   itself should replicate with a BitString.  In this case, the RU for
   that node would include both a SID for the benefit of processing by
   the prior-hop node, and a BitString (including BSL/SD) for its own



Eckert, et al.          Expires 5 September 2024               [Page 16]

Internet-Draft                   pim-rts                      March 2024


   replication.

   As mentioned before, the SID itself has no relevance anymore after
   having been processed by the prior-hop replication engine, and might
   be removed in the packet copy to the target node, but in the packet
   header as encoded in the packet header for the root of the RTS tree,
   both SID and BitString would need to be in the RU.

4.3.2.4.  BitString replication to non-leaf neighbors

   When replicating to non-leaf neighbors via BitString, an RU-List
   parameter is necessary in the RU of the replicating node.  Which
   raises the question, how the replicating node can know whether to
   replicate to a leaf or a non-leaf neighbor.

   There are two options for this:

   1.  In the BIFT of the replicating node, each bit has a flag
       indicating whether or not the node indicated by the bit is always
       a leaf node or potentially also a non-leaf node.  In this option,
       there needs to be an RU-list in the RU with an entry for each bit
       set in the BitString that indicates to not always be a lead node
       in the BIFT.  Each such RU-list entry can be a simple 8-bit RU
       without SID or BitString when that neighbor is addressed without
       the need for a larger RU.

   2.  There is no such bit in the BIFT, then all neighbors always
       require an RU, which equally can be as short as 8-bit.

4.3.2.5.  Functional subsets on individual nodes

   Less capable forwarding engines may support only a subset of the
   encoding and then need to be able to discover from the flag-bits
   whether the RU destined for it contains an unsupported option and
   then stop processing the packet and raise an error, e.g.: ICMP.

   One set of limitations that seems to be necessary to fit functionally
   limited forwarding engines could include all or a subset of the
   following:

   o Only one size of SID can be supported.  In this case, the long SID
   format should be supported, unless the type of device is known to
   always be leaf nodes in topologies, in which case only short SIDs may
   be supported.

   o The variability of including an RU-list can not well be supported.
   In this case, the RU for such a node may need to always include a
   RUST field with a value of 0.



Eckert, et al.          Expires 5 September 2024               [Page 17]

Internet-Draft                   pim-rts                      March 2024


   o The node can not support BitString replication for non-leaf
   neighbors.  In this case, the RU with a BitString may not have the
   R-bit set and no RU-List.  For common cases where this would be an
   issue, workarounds may be devised, for example, the node may have a
   local SID pointing back to it, so that the node needs to be encoded
   as 2 RU: The first RU uses RU-List replication, and one of the RU is
   for the local SID of the node itself, causing the packet to
   recirculate, and that RU would then use a BitString (without RU-
   List).

   Because there are no "global" aspects to the parsing of RUs, it
   limitations on one type of node only have limited impact on the
   network-wide ability to construct tree.

4.3.3.  Creating and Receiving copies

   RTS relies on unicast forwarding procedures using the Encap Header(s)
   to receive packets and send copies of packets.  Every copy of a
   packet created, except for those that are for local reception by a
   node, is sent towards a unicast address/SID according to the RTS SID
   it addresses.

   The RU0 Flags are responsible for distinguishing the encoding of the
   following RU parameters but also provides the bits used for
   processing by so-called "leaves" of an RTS tree, where packets need
   to be delivered and/or broadcast to all "leaf" neighbors (where they
   are then delivered).

4.3.4.  Creating copies because of RTS Header d=1

   When d=1 is encountered in RU0, an (internal) copy of the packet is
   created in which the headers up to the RTS Header are disposed of
   according to the procedures specified for Encap Header(s) so that the
   Next Proto Payload after the RTS Header is processed.

4.3.5.  Creating copies because of RTS Header b=1

   When b=1 is set in RU0 flags, a list of unicast addresses/SIDs called
   the "all leaf neighbors" is used to create a separate copy of the
   packet for each element in that list.  Each RTS node MAY have such a
   list.

   For each packet copy made because of b=1, the RU0 for that neighbor
   is set with all Flags to 0 except for d=0.  The RU0 for each such
   neighbor is thus 8 bit long.  Typically, the "all-leaf-neighbors"
   list is (auto-)configured with the list of RTS L2 neighbors that are
   known to be leaves of the RTS domain.




Eckert, et al.          Expires 5 September 2024               [Page 18]

Internet-Draft                   pim-rts                      March 2024


5.  Discussion

5.1.  Encoding Efficiency vs. decoding challenges

   One core challenge for RTS is encoding efficiency, aka: the size of
   the header required to address a particular number of leaves.  Or
   with a defined maximum header size the maximum number of leaves that
   can be addressed.

   One core encoding trick to increase encoding efficiency is to not
   indicate the semantic of variable fields but derive encoding from
   prior field or state knowledge.

   One example for this is not to encode the length of a local bitstring
   but assume that this bitstring length is consistently known between
   the processing node and the node encoding the bitstring, such as
   ingress-PE, controller or sender application.

   Consistent understanding of such control plane information to
   correctly encode packet headers already exists as a requirement in
   other headers, specifically MPLS and SR-MPLS and SRv6/SRH.  The
   semantic of labels/SID is not necessarily global and static in nature
   but often also local and learned through control plane.  If that
   control plane information is inconsistent, then encoding nodes may
   create incorrect headers that will be incorrectly processed.

   For RTS, one additional challenge is when such consistent control
   plane information implies the length of fields, such as in the
   bitstring example.  To reduce the problem space to what has been
   accepted in unicast solutions such as MPLS/SR, it may hence be
   necessary to explicitly include lengths for all variable lengths
   fields.  But this is in the opinion of the authors an open issue to
   investigate further.

   It may be possible and beneficial to instead of expanding the size of
   headers because of this issue, to look into control plane solutions
   to avoid this requirement.  This could be based on the following
   mechanisms:

   o Packets are forwarded only as L2 unicast with explicit addressing
   of destinations to prohibit hop-by-hop mis-delivery.  Such as using
   L2 ethernet with globally unique L2 ethernet MAC destination
   addresses.  This is what the solution currently assumes.








Eckert, et al.          Expires 5 September 2024               [Page 19]

Internet-Draft                   pim-rts                      March 2024


   o Mechanisms to the control plane distributing the relevant
   information (such as lengths of bitstrings, semantics of SIDs) not
   only in an "eventual consistency" way, but in a "strict consistency"
   way.  Aka: all nodes in the domain that can be addressed by RTS trees
   are known to have consistent control plane information relevant to
   the consistent encoding/decoding.

   o Mapping this "strict consistency" encoding to a numeric "control
   plane version" value that can be carried as a new field in headers.

   Such an approach may not be sufficient to answer all questions, such
   as how to change control plane information upon planned topology
   changes, but it should suffice to ensure that whoever encodes a
   packet can add the "control plane version" field to the header, and
   any node potentially parsing this header will have either consistent
   information or not accept the indicated "control plane version".

   Short of deriving on such a solution, the encoding will become longer
   due to the need of explicitly including fields for the semantic of
   following fields.  The encoding described has a range of cases where
   this option is already defined as an alternative.

   Because of these additional, not currently standard control plane
   requirements, this version of the document includes now all variable
   aspects explicitly in the encoding.

5.2.  Encapsulation considerations

5.2.1.  Comparison with BIER header and forwarding

   The RTS header is equivalent to the elements of a BIER/BIER-TE header
   required for BIER and BIER-TE replication.

   (SI, SD, BSL, Entropy, Bitstring)

   RTS currently does not specify an ECMP procedure to next-hop SIDs
   because it is part of the (unicast) forwarding to next-hops, but not
   to RTS replication.

   Note that this is not the same set of header fields as [RFC8296],
   because that header contains more and different fields for additional
   functionality, which RTS would require to be in an Encap Header.

   For the same reason, the RTS Header does also not include the
   [RFC8296] fields TC/DSCP for QoS, OAM, Proto (for next proto
   identification) and BFIR-id.  Note that BFIR-id is not used by BIER
   forwarding either, but by BIER overlay-flow forwarding on BFIR and
   BFER.



Eckert, et al.          Expires 5 September 2024               [Page 20]

Internet-Draft                   pim-rts                      March 2024


   Constraining the RTS header to only the necessary fields was chosen
   to make it most easy to combine it with any desirable encapsulation
   header.

   RTS could use [RFC8296] as an Encap Header and BIER/[RFC8296]
   forwarding procedures, replacing only BIER bitstring replication to
   next-hop functionality with RTS replication.

   In this case, the RTS Header could take the place of the bitstring
   field in the [RFC8296] header, using the next largest size allowed by
   BIER to fit the RTS header.  SI would be unused, and SD could be used
   to run RTS, BIER and even BIER-TE in parallel through different
   values of SD, and all BIER forwarding procedures including ECMP to
   next-hop SIDs could be used in conjunction with RTS replication.

5.2.2.  Comparison with IPv6 extension headers

   The RTS header could be used as a payload of an an IPv6 extension
   header as similarly proposed for RBS in [I-D.eckert-msr6-rbs].  Note
   that the RTS header itself does not contain a simple length field
   that allows to completely skip across it.  This is done because such
   functionality may not be required by all encapsulation headers /
   forwarding planes, or the format in which such a length is expected
   (unit) may be different for different forwarding planes.  If
   required, such as when using the RTS header in an IPv6 extension
   header, then such a total-length field would have to be added to the
   Encap Header.

5.3.  Encoding choices and complexity

5.3.1.  SID vs bitstrings in recursive trees

   The use of SID-lists in the encoding is a natural fit when the target
   tree is one that does not require replication on many of the hops
   through which it passes, such as when doing non-equal-cost load-
   splitting, such as in capacity optimization in service provider
   networks.  In [RFC9262], Figure 2, such an example is called an
   "Overlay" (tree).  In the SID list, each of the SID can easily be
   global, making it possible for a next-hop to be anywhere in the
   network.  While it is possible to also use global SIDs in a
   bitstring, the decision to include any global (remote) SID as a bit
   in a bitstring introduces additional encoding size cost for every
   tree, and not only the ones that would need this bit.  This is also
   the main issue of using such global SIDs in BIER-TE (where they are
   represented as forward_routed()) adjacencies.






Eckert, et al.          Expires 5 September 2024               [Page 21]

Internet-Draft                   pim-rts                      March 2024


   When replicating to direct neighbors, SID lists may be efficient for
   sparse trees.  In the RTS encoding, up to 127 direct neighbors could
   be encoded in 8 bit for each SID, so it is easy to compare the
   encoding efficiency to that of a bitstring.  A router with 32
   neighbors (assume leaf neighbors for simplicity) requires 32 bits to
   represent all possible neighbors, if 4 or fewer neighbors need to
   receive a copy, a SID-list encoding requires equal or fewer bytes to
   encode.

   Use of the broadcast option is equally possible with SID-list or
   bitstrings.  An initial scalability test with such an option was
   shown in slide 6 of [RBSatIETF115], but not included in any prior
   proposed encoding option; a better analysis of this option is subject
   to future work.

   The ability of the RTS encoding to mix for the initial part(s) of a
   tree SID lists and then for leaves or tail parts of the tree
   bitstrings specifically addresses the common understanding that
   multicast trees typically are sparse at the root and may also be more
   "overlay" type, but then tend to become denser towards the leaves.
   Even if there is the opportunity to create more replications within
   the first hops of a tree, that approach may often not result in the
   most costs-effective ("steiner") trees.

5.3.2.  Limited per-node functionality

   The encoding proposed in this (version of the) draft is based on P4
   development of a reference proof of concept called SEET+. It does not
   exactly follow the encoding, but attempts to generalize it to support
   both more flexible and more constrained forwarding platforms.

   Because in a recursive tree an individual node only needs to parse
   one part of the tree that is designated for it, this type of encoding
   allows it to support different flexible forwarding engines in the
   same network: The recursive units that are to be parsed by a less
   flexible nodes forwarding engine simply can not use all the possible
   options, but only those options possible on that forwarding engine.

   To then support such a mix of forwarding engines, the following
   architectural elements are necessary:

   1.  The control plane of every node needs to signal the subset of
       functionality so that the place where the control plane
       constructs an RTS will know what limitations it has.







Eckert, et al.          Expires 5 September 2024               [Page 22]

Internet-Draft                   pim-rts                      March 2024


   2.  The forwarding plane of nodes not supporting full functionality
       needs to be able to reliably recognize when the encoding utilizes
       a feature not supported, stop parsing/replicating the packet and
       raise an error (ICMP or similar) as necessary.

   For example, some type of forwarding engines could have the following
   set of limitations.

   The forwarding engine might not be possible to process the recursive
   bitstring structure because there could be insufficient code space
   for both recursive SID and recursive bitstring option.  In this case
   the limitation would be that a bitstring type RU to be processed by
   this node does have to be a leaf which is not followed by a set of RU
   for downstream neighbors.

   Consider such a node is located in a distribution ring serving as an
   aggregation node to a large number of lead neighbors.  In this
   topological case, a local bitstring would be ideal to indicate which
   of the leaf neighbors the packet has to be replicated.  However, the
   packet would equally need to be forwarded to the next ring neighbor,
   and that ring neighbor would need its own RU.  Which would not be
   directly supportable on this type of node.

   To handle this situation, such a limited functionality node would
   assign a local SID to itself and trees that pass through this node
   would need to encode first a SID type RU indicating the ring neighbor
   as well as the node itself.  The RU for this nodes SID itself would
   then be a local bitstring RU.  Likely the node would then also
   process the packet twice through so-called recirculation.  In
   addition, this approach increases the side of the RTS encoding.

5.4.  Differences over prior Recursive BitString (RBS) encodings
      proposal

   The encoding for bitstrings proposed in this draft relies again on
   discarding of unnecessary RU instead of using offset pointers in the
   header to allow parsing only the relevant RU.

   Discarding unnecessary RU has the benefit, that the total size of the
   header can be larger than if offset pointers where used.  Forwarding
   engines have a maximum amount of header that they can inspect.  With
   offset pointers, the furthest a node has to look into the RTS header
   is the actual size of the RTS header.  With discarding of unnecessary
   RU, this maximum size for inspection can be significantly less than
   the maximum RTS header size.  Consider the root of tree has two
   neighbors to copy to and both have equal size RU, then this root of
   the tree only needs to inspect up to the beginning of the second RU
   (the SID or bitstring in it).



Eckert, et al.          Expires 5 September 2024               [Page 23]

Internet-Draft                   pim-rts                      March 2024


6.  Security considerations

   TBD

7.  Acknowledgments

   The local bitstrings part of this work is based on the design
   published by Sheng Jiang, Xu Bing, Yan Shen, Meng Rui, Wan Junjie and
   Wang Chuang {jiangsheng|bing.xu|yanshen|mengrui|wanjunjie2|wangchuang
   }@huawei.com, see [CGM2Design].  Many thanks for Bing Xu
   (bing.xu@huawei.com) for editorial work on the prior variation of
   that work [I-D.xu-msr6-rbs].

8.  Changelog

8.1.  -00

   This version was derived from merging the [SEET] and RBS proposals
   (called BEET in the [SEET] presentation) into a single encoding
   mechanism.  SEET is a proposal for per-tree-hop replication with SID-
   Lists, RBS is a proposal for replication with per-hop local
   BitStrings.  Both embody the same idea of Recursive Units to
   represent each hop in the tree, but the content of these recursive
   Units is different whether SID-Lists or BitStrings are used.

   Because the processing of recursive structures are directly impacted
   and limited by the capabilities of forwarding planes, and because by
   the time of writing this -00 draft, only separate SEET and separate
   RBS reference Proof Of Concept implementations existed, this version
   does propose to support only separate trees: A tree only constructed
   from SID-Lists, or a Tree only supported from local BitStrings.  Each
   packet needs to choose which format to use.

8.2.  -01

   After -00 version, reference Proof Of Concept work was done to
   investigate whether it was possible in a single forwarding plane to
   support trees that can have both SID-lists and local BitStrings.  The
   main reason for this was that supporting both options as ships in the
   night turned out to be costing too much code space on limited forward
   plane engines.

   In result, the reference limited forwarding plane engine could be
   made to support trees with a limited mix of SID-list replication and
   local BitStrings.  Local BitStrings could only be used on hops prior
   to leaves, e.g.: no further local BitString replication was possible
   (single stage).




Eckert, et al.          Expires 5 September 2024               [Page 24]

Internet-Draft                   pim-rts                      March 2024


   RTS -01 is now a proposal that generalizes on these PoC experiences
   by proposing a format for recursive units that allows to implement
   the limited functionality possible on limited capability forwarding
   engines, but that also allow to implement arbitrary mixing of per-hop
   use of SID-List and BitString replication on forwarding planes that
   are more capable.  The encoding is also chosen so that nodes can
   easily recognize if the packet embodies an encoding option not
   supported by it and reject the packet.

   In addition, the -00 version of RTS did embody a range of
   optimizations for shorter encoding length by avoiding packet header
   fields that contain information such as parameter length that could
   be assumed to be known by both the generator and consumer of the
   header element (RU).  This has been removed in this version of the
   proposal and replaced by explicitly integrating all variable length
   field information as well as other interpretation semantics
   explicitly into the header encoding.  This change is not necessarily
   the final conclusion of thi issue, but the document lays out what
   other control plane requirements would first have to be built to be
   able to have more compact encodings - and those requirements are not
   commonly support by the most widely used control planes.

9.  References

9.1.  Normative References

   [RFC6554]  Hui, J., Vasseur, JP., Culler, D., and V. Manral, "An IPv6
              Routing Header for Source Routes with the Routing Protocol
              for Low-Power and Lossy Networks (RPL)", RFC 6554,
              DOI 10.17487/RFC6554, March 2012,
              <https://www.rfc-editor.org/rfc/rfc6554>.

   [RFC8200]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
              (IPv6) Specification", STD 86, RFC 8200,
              DOI 10.17487/RFC8200, July 2017,
              <https://www.rfc-editor.org/rfc/rfc8200>.

   [RFC8279]  Wijnands, IJ., Ed., Rosen, E., Ed., Dolganow, A.,
              Przygienda, T., and S. Aldrin, "Multicast Using Bit Index
              Explicit Replication (BIER)", RFC 8279,
              DOI 10.17487/RFC8279, November 2017,
              <https://www.rfc-editor.org/rfc/rfc8279>.

   [RFC8296]  Wijnands, IJ., Ed., Rosen, E., Ed., Dolganow, A.,
              Tantsura, J., Aldrin, S., and I. Meilik, "Encapsulation
              for Bit Index Explicit Replication (BIER) in MPLS and Non-
              MPLS Networks", RFC 8296, DOI 10.17487/RFC8296, January
              2018, <https://www.rfc-editor.org/rfc/rfc8296>.



Eckert, et al.          Expires 5 September 2024               [Page 25]

Internet-Draft                   pim-rts                      March 2024


   [RFC8402]  Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L.,
              Decraene, B., Litkowski, S., and R. Shakir, "Segment
              Routing Architecture", RFC 8402, DOI 10.17487/RFC8402,
              July 2018, <https://www.rfc-editor.org/rfc/rfc8402>.

   [RFC8754]  Filsfils, C., Ed., Dukes, D., Ed., Previdi, S., Leddy, J.,
              Matsushima, S., and D. Voyer, "IPv6 Segment Routing Header
              (SRH)", RFC 8754, DOI 10.17487/RFC8754, March 2020,
              <https://www.rfc-editor.org/rfc/rfc8754>.

   [RFC9262]  Eckert, T., Ed., Menth, M., and G. Cauchie, "Tree
              Engineering for Bit Index Explicit Replication (BIER-TE)",
              RFC 9262, DOI 10.17487/RFC9262, October 2022,
              <https://www.rfc-editor.org/rfc/rfc9262>.

9.2.  Informative References

   [CGM2Design]
              Jiang, S., Xu, B. (Robin)., Shen, Y., Rui, M., Junjie, W.,
              and W. Chuang, "Novel Multicast Protocol Proposal
              Introduction", 10 October 2021,
              <<https://github.com/BingXu1112/CGMM/blob/main/Novel%20Mul
              ticast%20Protocol%20Proposal%20Introduction.pptx>>.

   [CGM2report]
              "Carrier Grade Minimalist Multicast CENI Networking Test
              Report", 1 August 2022,
              <<https://raw.githubusercontent.com/network2030/
              publications/main/CENI_Carrier_Grade_Minimalist_Multicast_
              Networking_Test_Report.pdf>>.

   [I-D.eckert-bier-cgm2-rbs]
              Eckert, T. T. and B. Xu, "Carrier Grade Minimalist
              Multicast (CGM2) using Bit Index Explicit Replication
              (BIER) with Recursive BitString Structure (RBS)
              Addresses", Work in Progress, Internet-Draft, draft-
              eckert-bier-cgm2-rbs-01, 9 February 2022,
              <https://datatracker.ietf.org/doc/html/draft-eckert-bier-
              cgm2-rbs-01>.

   [I-D.eckert-bier-cgm2-rbs-00]
              Eckert, T. T., "Carrier Grade Minimalist Multicast (CGM2)
              using Bit Index Explicit Replication (BIER) with Recursive
              BitString Structure (RBS) Addresses", Work in Progress,
              Internet-Draft, draft-eckert-bier-cgm2-rbs-00, 25 October
              2021, <https://datatracker.ietf.org/doc/html/draft-eckert-
              bier-cgm2-rbs-00>.




Eckert, et al.          Expires 5 September 2024               [Page 26]

Internet-Draft                   pim-rts                      March 2024


   [I-D.eckert-bier-cgm2-rbs-01]
              Eckert, T. T. and B. Xu, "Carrier Grade Minimalist
              Multicast (CGM2) using Bit Index Explicit Replication
              (BIER) with Recursive BitString Structure (RBS)
              Addresses", Work in Progress, Internet-Draft, draft-
              eckert-bier-cgm2-rbs-01, 9 February 2022,
              <https://datatracker.ietf.org/doc/html/draft-eckert-bier-
              cgm2-rbs-01>.

   [I-D.eckert-bier-rbs]
              Eckert, T. T., Menth, M., Geng, X., Zheng, X., Meng, R.,
              and F. Li, "Recursive BitString Structure (RBS) Addresses
              for BIER and MSR6", Work in Progress, Internet-Draft,
              draft-eckert-bier-rbs-00, 24 October 2022,
              <https://datatracker.ietf.org/doc/html/draft-eckert-bier-
              rbs-00>.

   [I-D.eckert-msr6-rbs]
              Eckert, T. T., Geng, X., Zheng, X., Meng, R., and F. Li,
              "Recursive Bitstring Structure (RBS) for Multicast Source
              Routing over IPv6 (MSR6)", Work in Progress, Internet-
              Draft, draft-eckert-msr6-rbs-01, 24 October 2022,
              <https://datatracker.ietf.org/doc/html/draft-eckert-msr6-
              rbs-01>.

   [I-D.xu-msr6-rbs]
              Xu, B., Geng, X., and T. T. Eckert, "RBS(Recursive
              BitString Structure) for Multicast Source Routing over
              IPv6", Work in Progress, Internet-Draft, draft-xu-msr6-
              rbs-01, 30 March 2022,
              <https://datatracker.ietf.org/doc/html/draft-xu-msr6-rbs-
              01>.

   [Menth20h] Merling, D., Lindner, S., and M. Menth, "P4-Based
              Implementation of BIER and BIER-FRR for Scalable and
              Resilient Multicast", IEEE in "Journal of Network and
              Computer Applications" (JNCA), vol. 196, Nov. 2020,
              preprint https://atlas.informatik.uni-
              tuebingen.de/~menth/papers/Menth20h.pdf,
              DOI 10.1016/j.jnca.2020.102764, n.d.,
              <https://doi.org/10.1016/j.jnca.2020.102764>.










Eckert, et al.          Expires 5 September 2024               [Page 27]

Internet-Draft                   pim-rts                      March 2024


   [Menth21]  Merling, D., Lindner, S., and M. Menth, "Hardware-based
              Evaluation of Scalable and Resilient Multicast with BIER
              in P4", IEEE in "IEEE Access",
              <https://ieeexplore.ieee.org/xpl/
              RecentIssue.jsp?punumber=6287639>, vol. 9, p. 34500 -
              34514, March 2021, <https://ieeexplore.ieee.org/stamp/
              stamp.jsp?tp=&arnumber=9361548>, n.d..

   [Menth23]  Merling, D., Stüber, T., and M. Menth, "Efficiency of BIER
              Multicast in Large Networks", IEEE accepted for "IEEE
              Transactions on Network and Service Managment", preprint 
              <https://atlas.cs.uni-tuebingen.de/~menth/papers/Menth21-
              Sub-5.pdf>, n.d..  preprint

   [Menth23f] Lindner, S., Merling, D., and M. Menth, "Learning
              Multicast Patterns for Efficient BIER Forwarding with P4",
              IEEE in "IEEE Transactions on Network and Service
              Managment", vol. 20, no. 2, June 2023, preprint 
              https://atlas.cs.uni-tuebingen.de/~menth/papers/Menth22-
              Sub-2.pdf, n.d..

   [RBSatIETF115]
              Eckert, T., Menth, M., Gend, X., Zhen, X., Meng, R., and
              F. Li, "RBS (Recursive BitString Structure) to improve
              scalability beyond BIER/BIER-TE, IETF115", November 2022,
              <<https://datatracker.ietf.org/meeting/115/materials/
              slides-115-bier-recursive-bitstring-structure-rbs-beyond-
              bierbier-te-00>>.

   [RFC791]   Postel, J., "Internet Protocol", STD 5, RFC 791,
              DOI 10.17487/RFC0791, September 1981,
              <https://www.rfc-editor.org/rfc/rfc791>.

   [SEET]     Lindner, S., Menth, M., and T. Eckert, "P4 Tofino
              Implementation Experiences with Advanced Stateless
              Multicast Source Routing", 1 November 2023,
              <<https://datatracker.ietf.org/meeting/118/materials/
              slides-118-bier-02-ietf118-bier-p4-02.pdf>>.

Appendix A.  Evolution to RTS

   The following history review of RBS explains key aspects of the road
   towards RTS and how prior document work is included (or not) in this
   RTS work.







Eckert, et al.          Expires 5 September 2024               [Page 28]

Internet-Draft                   pim-rts                      March 2024


A.1.  Research work on BIER

   Initial experience implementation with implementation of BIER in PE
   was gained through "P4-Based Implementation of BIER and BIER-FRR for
   Scalable and Resilient Multicast", [Menth20h], from which experience
   was gained that processing of large BIER bitstring requires
   significantly complex programming for efficient forwarding, as
   described in "Learning Multicast Patterns for Efficient BIER
   Forwarding ith P4", [Menth23f].  Further evaluations where researched
   through "Hardware-based Evaluation of Scalable and Resilient
   Multicast with BIER in P4", [Menth21] and "Efficiency of BIER
   Multicast in Large Networks", [Menth23].

A.2.  Initial RBS from CGM2

   The initial, 2021 [I-D.eckert-bier-cgm2-rbs-00] introduces the
   concept of Recursive Bitstring Forwarding (RBS) in which a single
   bitstring in a source routing header for stateless multicast
   replication as introduced by BIER and re-used by BIER-TE is replaced
   by a recursive structure representing each node of a multicast tree
   and in each node the list of neighbors to which to replicate to is
   represented by a bitstring.

   Routers processing this recursive structure do not need to process
   the whole structure, instead, they only need to examine their own
   local bitstring, and replicate copies to each of the neighbors for
   which a bit is set in the bitstring for this node.  For each copy the
   recursive structure is rewritten so that only the remaining subtree
   behind the neighbor remains in the packet header.  By only having to
   examine a "local" (and hence short) bitstring, RBS processing can
   arguably be simpler than that of BIER/BIER-TE.  By discarding the
   parts of the tree structure not needed anymore, there is also no need
   to change bits in the bitstring as done in BIER/BIER-TE to avoid
   loops.

   This initial version of RBS encoding is based on a design originally
   called "Carrier Grade Minimalist Multicast" (CGM2), and which started
   as a research project whose design is summarized in [CGM2Design].  A
   vendor high-speed router implementation proof-of-concept was done, as
   well as a wide-area proof-of-concept research network deployment,
   which was documented for the 2022 Nanjing "6th future Network
   Development Conference".  An english translation of the report can be
   found at [CGM2report].








Eckert, et al.          Expires 5 September 2024               [Page 29]

Internet-Draft                   pim-rts                      March 2024


A.3.  RBS scalability compared to BIER

   The 2022 [I-D.eckert-bier-cgm2-rbs-01] version of the document adds
   topology and testing information about a simulation comparing RBS
   with BIER performance in a dense, high-speed network topology.  It is
   showing that the number of replications required to reach an
   increasing number of receivers does grow slower with RBS than with
   BIER, because in BIER, it is necessary to send another packet copy
   from the source whenever receivers in a different Set Identifier
   Bitstring (SI) are required, whereas RBS requires to only create
   multiple copies of a packet at the source to reach more receivers
   whenever the RBS packet header size for one packet is exhausted.  The
   results of this simulation are shown in slide 6 of [RBSatIETF115].

   While RBS with its explicit description of the whole multicast tree
   structure seems immediately like (only) a replacement for BIER-TE,
   which does the same, but encodes it in a "flat"BIER bitstring (and
   incurring more severe scalability limitations because of this), this
   simulation shows that the RBS approach may also compete with BIER
   itself, even though this may initially look counter-intuitive because
   information not needed in the BIER encoding - intermediate hops - is
   encoded in RBS.

   The scalability analysis also assumes one novel encoding
   optimization, indicating replication to all leaf neighbors on a node.
   This allow to even further compact the RBS encoding for dense trees,
   such as in applications like IPTV.  Note that this optimization was
   not included in any of the RBS proposal specifications, but it is
   included in this RTS specification.This optimization leads to the
   actual reduction in packet copies sent for denser trees in the
   simulation results.

A.4.  Discarding versus offset pointers

   [I-D.eckert-bier-rbs] re-focusses the work of the prior
   [I-D.eckert-bier-cgm2-rbs] to focus only on the forwarding plane
   aspects, removing simulation results and architectural considerations
   beyond the forwarding plane.

   It also proposes one then considered to be interesting alternative to
   the encoding.  Instead of discarding unnecessary parts of the tree
   structure for every copy of a packet made along the tree, its
   forwarding machinery instead uses two offset pointers in the header
   to point to the relevant sub-structure for the next-hop, so that only
   a rewrite of these two pointers is needed.  This replicates the
   offset-rewrite used in unicast source-routing headers such as in IP,
   [RFC791], or IPv6, [RFC6554] and [RFC8754].




Eckert, et al.          Expires 5 September 2024               [Page 30]

Internet-Draft                   pim-rts                      March 2024


   Discussions about discarding vs. changing of offset since then seems
   to indicate that changing offsets may be beneficial for forwarders
   that can save memory bandwidth when not having to rewrite complete
   packet headers, such as specifically systems with so-called scatter-
   gather I/O, whereas discarding of data is more beneficial when
   forwards do have an equivalent of scatter/gather I/O, something which
   all modern high-speed routers seem to have, including the platform
   used for validation of the approach described in this document.

A.5.  Encapsulations for IPv6-only networks

   Whereas all initial RBS proposal did either not propose specific
   encapsulations for the RBS structure and/or discussed how to use RBS
   with the existing BIER encapsulation [RFC8296], the 2022
   [I-D.xu-msr6-rbs] describes the encapsulation of RBS into an IPv6
   extension header, in support of a forwarding plane where all packets
   on the wire are IPv6 packets, rewriting per-RBS-hop the destination
   IPv6 address of the outer IPv6 header like pre-existing unicast IPv6
   stateless source routing solutions too ([RFC6554], [RFC8754]).

   This approach was based on the express preference desire of IPv6
   operators to have a common encapsulation of all packets on the wire
   for operation reasons ("IPv6 only network design") and to share a
   common source-routing mechanism operating on the principle of per-
   steering-hop IPv6 destination address rewrite.

   [I-D.eckert-msr6-rbs] extends this approach by adding the offset-
   pointer rewrite of [I-D.eckert-bier-rbs] to the extension header to
   avoid any change in length of the extension header, but it also
   includes another, RBS independent field, the IPv6 multicast
   destination address to the extension header.  Only this additional
   would allow for RBS with a single extension header to be a complete
   IPv6 multicast source-routing solution.  BIER/BIER-TE or any
   encapsulation variations of RBS without such a header field would
   always require to carry a full IPv6 header as a payload to provide
   end-to-end IPv6 multicast service to applications.

A.6.  SEET and BEET

   To validate concepts for recursive trees, high-speed reference
   platform proof of concept validation was done for booth SID-list and
   local-bitstring based recursive trees in 2023.  This is called in the
   research work SEET and BEET.  See [SEET].  Paper to follow.

Contributors






Eckert, et al.          Expires 5 September 2024               [Page 31]

Internet-Draft                   pim-rts                      March 2024


   Xuesong Geng
   Huawei
   China
   Email: gengxuesong@huawei.com


   Xiuli Zheng
   Huawei
   China
   Email: zhengxiuli@huawei.com


   Rui Meng
   Huawei
   China
   Email: mengrui@huawei.com


   Fengkai Li
   Huawei
   China
   Email: lifengkai@huawei.com


Authors' Addresses

   Toerless Eckert (editor)
   Futurewei Technologies USA
   2220 Central Expressway
   Santa Clara,  CA 95050
   United States of America
   Email: tte@cs.fau.de


   Michael Menth
   University of Tuebingen
   Germany
   Email: menth@uni-tuebingen.de


   Steffen Lindner
   University of Tuebingen
   Germany
   Email: steffen.lindner@uni-tuebingen.de







Eckert, et al.          Expires 5 September 2024               [Page 32]