Internet DRAFT - draft-zeng-turn-cluster

draft-zeng-turn-cluster







tram                                                             W. Zeng
Internet-Draft                                                 Ant Group
Intended status: Informational                                9 May 2022
Expires: 10 November 2022


    TURN Cluster: Scale out TURN cluster by routable transaction id
                       draft-zeng-turn-cluster-03

Abstract

   The TURN protocol is designed to solve the connectivity problem of
   Peer-to-Peer Communication when NAT devices exist, by allowing each
   peer to establish a data channel on TURN servers.  Since there are
   some specific requirements in the use of TURN, such as RTP/RTCP
   connection pairs must be sent to the same TURN server, it is not easy
   to scale a single TURN server into a TURN cluster.  In addition, a
   TURN service cluster also needs to consider how to achieve good load
   balancing and how to protect internal information security.  Based on
   these demands, this specification provides several standard means to
   implement a functional and secure TURN cluster, and this
   specification also provides an overview and rationale of the cluster
   architecture.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 10 November 2022.

Copyright Notice

   Copyright (c) 2022 IETF Trust and the persons identified as the
   document authors.  All rights reserved.






Zeng                    Expires 10 November 2022                [Page 1]

Internet-Draft                TURN-Cluster                      May 2022


   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
     1.1.  Terminology . . . . . . . . . . . . . . . . . . . . . . .   4
     1.2.  Notation  . . . . . . . . . . . . . . . . . . . . . . . .   5
   2.  Overview of an TURN ICE process . . . . . . . . . . . . . . .   5
   3.  Architectural and Interactive Process . . . . . . . . . . . .   9
     3.1.  Overview of the Architectural . . . . . . . . . . . . . .   9
     3.2.  Overview of interaction process . . . . . . . . . . . . .  11
       3.2.1.  ClientA Behavior  . . . . . . . . . . . . . . . . . .  12
       3.2.2.  ClientB Behavior  . . . . . . . . . . . . . . . . . .  13
       3.2.3.  TURN Cluster Behavior . . . . . . . . . . . . . . . .  13
   4.  Routing Mechanism . . . . . . . . . . . . . . . . . . . . . .  14
     4.1.  Server Generate ENCRYPTED-RELAYED-ADDRESS . . . . . . . .  15
       4.1.1.  Preparation Phase . . . . . . . . . . . . . . . . . .  15
       4.1.2.  Obfuscation Phase . . . . . . . . . . . . . . . . . .  16
       4.1.3.  Encryption Phase  . . . . . . . . . . . . . . . . . .  16
     4.2.  Generation of Routable Transaction ID . . . . . . . . . .  17
       4.2.1.  Arbitrary Mode  . . . . . . . . . . . . . . . . . . .  17
       4.2.2.  Specific Server Mode  . . . . . . . . . . . . . . . .  18
       4.2.3.  Specific Address Mode . . . . . . . . . . . . . . . .  18
       4.2.4.  Uniqueness of Transaction ID  . . . . . . . . . . . .  19
     4.3.  TURN LB Process Transaction ID  . . . . . . . . . . . . .  19
     4.4.  ENCRYPTED-PEER-ADDRESS  . . . . . . . . . . . . . . . . .  20
     4.5.  TLS Consideration . . . . . . . . . . . . . . . . . . . .  21
   5.  Security Consideration  . . . . . . . . . . . . . . . . . . .  21
     5.1.  DoS Against TURN Cluster  . . . . . . . . . . . . . . . .  22
     5.2.  DoS Against a Single TURN Server  . . . . . . . . . . . .  22
   6.  IANA Consideration  . . . . . . . . . . . . . . . . . . . . .  22
   7.  Contributors  . . . . . . . . . . . . . . . . . . . . . . . .  23
   8.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  23
     8.1.  Normative References  . . . . . . . . . . . . . . . . . .  23
     8.2.  Informative References  . . . . . . . . . . . . . . . . .  24
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  24








Zeng                    Expires 10 November 2022                [Page 2]

Internet-Draft                TURN-Cluster                      May 2022


1.  Introduction

   Interactive Connectivity Establishment(ICE)(described in [RFC8445]
   gives a standard way for peers exchanging information and
   establishing a data channel between each others, in the channel
   establishing progress, if a peer is located behind a NAT, then it's
   impossible for that peer to communicate directly with other peers,
   [RFC8656] proposal the TURN protocol to solve this problem by
   offering a standard way to establish relayed channel between peers.

   TURN and ICE are widely used and the most typical scenario is
   webrtc(described in [RFC7478]).  Imagine a webrtc scenario with a
   large number of users, when most users need to use relay service, a
   single TURN server would become the bottleneck of the system.
   Setting a networking load-balancing equipment that forwards the
   requests to a member of the TURN servers group is the best and most
   efficient performance tuning approach, it allows near-linear
   performance improvement.  However, TURN servers with a simple
   networking load-balancing equipment are not enough to build a fully
   functional cluster, since a TURN cluster still meet these
   requirements:

   *  For RTP/RTCP connection pairs and TCP relayed, client requests
      with different source addresses must be forwarded to the same
      server, a TURN cluster SHOULD achieve this condition.

   *  The recommended ICE candidate priority calculation formula is
      designed for all clients connected to the same TURN server.  When
      clients are connected to different TURN servers in the cluster,
      there may be one more hop between TURN servers of the relayed
      channel, then the formula is unreliable. a TURN cluster SHOULD
      avoid this problem.

   *  A TURN cluster SHOULD achieve good load balancing for all members
      of the cluster.

   [TURN-Load-balance] give some suggestions to solve these problem: (1)
   DNS based load balancing (2) Using ALTERNATE-SERVER(defined in
   Section 10 of [RFC8489]) to redirect requests to right server, while
   the DNS based load balancing is unreliable and the ALTERNATE-SERVER
   mechanism is inefficient.  Moreover, these solutions are expensive
   and insecure, and are not suitable for large-scale deployment in
   Internet Data Center(IDC) environments, because they require that
   each TURN server in the cluster MUST have their own public network IP
   address and expose a considerable number of ports to the outside
   network.  In general, a TURN cluster SHOULD meet the following
   requirements:




Zeng                    Expires 10 November 2022                [Page 3]

Internet-Draft                TURN-Cluster                      May 2022


   *  Meet the basic requirements for the use of all TURN protocols,
      including the specific scenarios such as RTP/RTCP connection
      pairs.

   *  Easy to scale in/out the size of the cluster.

   *  The cluster SHOULD have a unified access portal, and the internal
      network information MUST be hidden.

   *  Easy to set up network security policies to defend against
      potential attacks.

   This specification provides an architecture and corresponding
   interaction process for easily building a TURN cluster that meets all
   above requirements.  Since TURN is always used in ICE, this
   specification introduces related processes based on ICE for better
   illustration.  The remainder of this document is organized as
   follows: Section 2 briefly introduces how the relayed channel is
   established in the ICE process; Section 3 describes the overview of
   the architecture and the interaction process between client and TURN
   cluster; Section 4 introduce the generation and processing of routing
   message, including:(1)How does a TURN server transmit routing message
   in a secure manner; (2)How does a client generate routable
   transaction ID with the routing message; (3) How the TURN cluster
   handles the transaction ID and corresponding packet.

1.1.  Terminology

   Although this document is not an IETF Standards Track publication it
   adopts the conventions for normative language to provide clarity of
   instructions to the implementer.

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in BCP 14 [RFC2119]
   [RFC8174] when, and only when, they appear in all capitals, as shown
   here.

   The following terms are used in this document:

   concat(x0, ..., xN): Concatenation of byte strings.  "concat(0x01,
   0x0203, 0x040506) = 0x010203040506".









Zeng                    Expires 10 November 2022                [Page 4]

Internet-Draft                TURN-Cluster                      May 2022


1.2.  Notation

   All wire formats will be depicted using the notation defined in
   Section 1.3 of [RFC9000].  There is one addition: the function len()
   refers to the length of a field which can serve as a limit on a
   different field, so that the lengths of two fields can be concisely
   defined as limited to a sum, for example:

   x(A..B) y(C..B-len(x))

   indicates that x can be of any length between A and B, and y can be
   of any length between C and B provided that (len(x) + len(y)) does
   not exceed B.

   The example below illustrates the basic framework:

   Example Structure {
     One-bit Field (1),
     7-bit Field with Fixed Value (7) = 61,
     Field with Variable-Length Integer (i),
     Arbitrary-Length Field (..),
     Variable-Length Field (8..24),
     Field With Minimum Length (16..),
     Field With Maximum Length (..128),
     [Optional Field (64)],
     Repeated Field (8) ...,
   }

                          Figure 1: Example Format

2.  Overview of an TURN ICE process

   This section we would use an example to illustrate how clients set up
   relayed channel through ICE and TURN, in the example, clientA and
   clientB are all behind a symmetric NAT device, their network topology
   is shown in figure below:















Zeng                    Expires 10 November 2022                [Page 5]

Internet-Draft                TURN-Cluster                      May 2022


                          +-------------+
                          | Turn Server |
                          +-------------+
                         10.11.252.43:3478
                             ^       ^
                             |       |
            +----------------+       +-----------------+
            |                                          |
   10.243.22.200:23768                        10.243.21.133:12371
    +---------------+                          +---------------+
    | Symmetric NAT |                          | Symmetric NAT |
    +---------------+                          +---------------+
            ^                                          ^
            |                                          |
   192.168.1.0:6677                          192.168.110.121:11202
       +---------+                                +---------+
       | clientA |                                | clientB |
       +---------+                                +---------+

                     Figure 2: Example network topology

   Although in this example, the P2P data channel built based on STUN
   protocol cannot be used because of the existence of symmetric NAT,
   this document does not omit the STUN process of ICE, so that readers
   can more clearly understand the whole ice process.  A simplified TURN
   ICE relayed channel establishing processing is depicted in Figure 3.

   clientA                 TURN server                clientB
     |                         |                         |
     |------STUN/TURN Req----->|                         |
     |                         |                         |
     |<-----STUN/TURN Resp-----|                         |
     |                         |                         |
     |--ClientA ICE Candidate Info---------------------->|
     |                         |                         |
     |                         |<-----STUN/TURN Req------|
     |                         |                         |
     |                         |------STUN/TURN Resp---->|
     |                         |                         |
     |<----------------------ClientB ICE Candidate Info--|
     |                         |                         |
     |<--Connectivity Checks-->|<--Connectivity Checks-->|
     |                         |                         |
     |<---------Data---------->|<--------Data----------->|
     |                         |                         |

              Figure 3: Example relayed channel establishment




Zeng                    Expires 10 November 2022                [Page 6]

Internet-Draft                TURN-Cluster                      May 2022


   The related behavior in the Figure 1 are explained as follows:

   STUN/TURN Req: The STUN requests send by clientA/clientB, which
   SHOULD be Allocate request(defined in Section 7 of [RFC8656]) or Bind
   request(defined in Section 2 of [RFC8489]) to TURN server.

   STUN/TURN Resp: The STUN responses return by TURN server, which
   SHOULD include these information: (1) XOR-RELAYED-ADDRESS(defined in
   Section 18.5 of [RFC8656]) (2) XOR-MAPPED-ADDRESS(defined in
   Section 14.2 of [RFC8489])

   ClientA/ClientB ICE Candidate Info: The ICE Candidate
   Information(defined in Section 5.3 of [RFC8445]) gathered by client,
   and client synchronizes it to peer by signaling server(defined in
   [RFC8445]).

   Connectivity Checks: The connectivity check processing which is
   defined in Section 2 of [RFC8445].  Take clientA for example, clientA
   first attempts to connect directly to clientB through XOR-MAPPED-
   ADDRESS, because clientA and clientB are all behind a symmetric NAT
   device, this process would fail, then clientA would try relayed
   channel, if clientA and clientB can successfully bind to XOR-RELAYED-
   ADDRESS of peer, then there are 3 available channel:

   *  srflxA2relayB: The channel of server-reflexive address of clientA
      to relayed address of clientB, shown below:

    XOR-RELAYED-ADDRESS   +-------------+
    allocated for clientB | Turn Server |  10.11.252.43:3478
    10.11.252.43:55555    +-------------+
            ^                                     ^
            |                                     |
            v                                     v
    +---------------+                     +---------------+
    | Symmetric NAT |                     | Symmetric NAT |
    +---------------+                     +---------------+
            ^                                     ^
            |                                     |
            v                                     v
       +---------+                           +---------+
       | clientA |                           | clientB |
       +---------+                           +---------+

              Figure 4: Established srflxA2relayB Data Channel

   *  relayA2srflxB: The channel of relayed address of clientA to
      server-reflexive address of clientB, shown below:




Zeng                    Expires 10 November 2022                [Page 7]

Internet-Draft                TURN-Cluster                      May 2022


                      +-------------+  XOR-RELAYED-ADDRESS
    10.11.252.43:3478 | Turn Server |  allocated for clientA
                      +-------------+  10.11.252.43:55666
           ^                                  ^
           |                                  |
           v                                  v
   +---------------+                  +---------------+
   | Symmetric NAT |                  | Symmetric NAT |
   +---------------+                  +---------------+
           ^                                  ^
           |                                  |
           v                                  v
      +---------+                        +---------+
      | clientA |                        | clientB |
      +---------+                        +---------+

              Figure 5: Established relayA2srflxB Data Channel

   *  relayA2relayB: The channel of relayed address of clientA to
      relayed address of clientB, shown below:

    XOR-RELAYED-ADDRESS               XOR-RELAYED-ADDRESS
    allocated for clientA <-------->  allocated for clientB
    10.11.252.43:55555                10.11.252.43:55666
    +-------------+                    +-------------+
    | Turn Server |                    | Turn Server |
    +-------------+                    +-------------+
   10.11.252.43:3478                  10.11.252.43:3478
           ^                                  ^
           |                                  |
           v                                  v
   +---------------+                  +---------------+
   | Symmetric NAT |                  | Symmetric NAT |
   +---------------+                  +---------------+
           ^                                  ^
           |                                  |
           v                                  v
      +---------+                        +---------+
      | clientA |                        | clientB |
      +---------+                        +---------+

              Figure 6: Established relayA2relayB Data Channel

   ICE would have a priority calculation for the 3 channels, and which
   channel is finally selected depends on the calculation results.






Zeng                    Expires 10 November 2022                [Page 8]

Internet-Draft                TURN-Cluster                      May 2022


   For a client, the usage of a TURN cluster SHOULD be like a single
   TURN server, which means that the above 3 channels MUST still can be
   successfully established through TURN cluster, moreover, all requests
   from the peers of one P2P connection SHOULD be forward to the same
   TURN server in the cluster, or the calculation formula would be
   unavailable because of the potential one more hop between the TURN
   server.

3.  Architectural and Interactive Process

   A single TURN server always serves on a default port(e.g., 3478 for
   UDP/TCP, 5349 for TLS), and allocates ports for client relay.  In
   order to be compatible with the existing TURN implementation, a TURN
   server in cluster SHOULD also work in a similar way.  In addition,
   the TURN server requires that all allocated ports can be accessed by
   the client directly.  Since it is hard and insecure for a cluster to
   expose a large number of ports for each server in the cluster, the
   TURN cluster described in the document chooses to provide all
   services on the default port, and ensure the correct routing of
   packets through the routable transaction id(described in
   Section 4.2).  This section will describe the architecture for the
   TURN cluster, and introduces the interaction process between client
   and cluster.

3.1.  Overview of the Architectural

   The structure of the TURN cluster is not complicated, which just has
   a front-end load balancer "TURN LB" as the gateway to forward client
   requests to corresponding TURN server, and the TURN server is the
   equipment that really provides service.  As described in Section 1 of
   [RFC8656], A client using TURN must have some way to communicate the
   relayed information to its peers, and to learn each peer's relay
   information, here we use "signaling server" described in [RFC8445] to
   represent this component, the network topology(including the internal
   architecture of TURN cluster) is depicted in figure below:
















Zeng                    Expires 10 November 2022                [Page 9]

Internet-Draft                TURN-Cluster                      May 2022


                     +------------------+
             +------>| signaling server |<-------+
             |       +------------------+        |
       +----------+                        +----------+
       | client A |                        | client B |
       +----------+                        +----------+
    10.243.22.200:23768                  10.69.127.39:32102
             |                                   |
             +-------------+       +-------------+
                           |       |
   +-----------------------|-------|-----------------------+
   | TURN cluster          |       |                       |
   |                       v       v                       |
   |                    10.11.252.43:3478                  |
   |                      +---------+                      |
   |                      | TURN LB |                      |
   |                      +---------+                      |
   |                       |       |                       |
   |        +--------------+       |                       |
   |        |                      |                       |
   |        v                 +----+                       |
   | 192.168.1.2:3478         |                            |
   | +-------------+          v            +-------------+ |
   | |TURN server 1| 192.168.1.2:61002  ...|TURN server n| |
   | +-------------+                       +-------------+ |
   +-------------------------------------------------------+

           Figure 7: Example Topology of Client and TURN Cluster

   The functions of each component are as follows:

   Client A/B: All peers of one P2P relay connection.

   Signaling server: A server for all clients to exchange TURN
   information with its peers, this specification does not involve its
   specific process and implementation, Implementers can refer to the
   "signaling server" defined in [RFC8445] for implementation

   TURN LB: A device that performs two functions:(1)Ensure the load
   balance of all servers in the cluster; (2)Ensure that data from all
   peers of a P2P connection can be routed to an appropriate TURN
   server.

   TURN server: The real TURN service provider.

   The core of the architecture design is: * Provide TURN services
   through a unified access portal. * Using TURN LB and mechanism
   described in Section 4 to ensure all packets can be routed to the



Zeng                    Expires 10 November 2022               [Page 10]

Internet-Draft                TURN-Cluster                      May 2022


   appropriate backend TURN server. * Each TURN server in the cluster
   just works like a single TURN server, the difference is that the TURN
   server MUST use ENCRYPTED-RELAYED-ADDRESS(defined in Section 4.1) to
   transmit allocation information instead of XOR-RELAYED-ADDRESS, in
   order to avoid the exposing of internal network information.  In
   additional, since the address information is encrypted in ENCRYPTED-
   RELAYED-ADDRESS, and the client cannot extract it directly, client
   MUST use ENCRYPTED-PEER-ADDRESS(defined in Section 4.4) to specify
   the address information of the peer instead of XOR-PEER-ADDRESS.

3.2.  Overview of interaction process

   Since the TURN server in the cluster MUST transmit allocation
   information through ENCRYPTED-RELAYED-ADDRESS to protecting cluster
   internal network information, client can not get the allocated
   address directly, and the establishing of srflxA2relayB and
   relayA2srflxB cannot be the same as usual.  As depicted in Figure 7,
   all requests can only be sent to the unified access portal of
   cluster, in order to ensure the correct forwarding of requests, some
   routing message MUST be carried in a request, when TURN LB receive
   requests, it MUST extract and parse the routing message, and forward
   requests depend on it.  The overall interactive processing is shown
   in the following figure, related address information comes from
   Figure 7 and ERA in the figure corresponds to ENCRYPTED-PEER-
   ADDRESS(defined in Section 4.1):


























Zeng                    Expires 10 November 2022               [Page 11]

Internet-Draft                TURN-Cluster                      May 2022


     clientA                     TURN cluster                  clientB
       |                             |                             |
       |----------TURN Req---------->|                             |
       |   (to 10.11.252.43:3478)    |                             |
       |                             |                             |
       |<---------TURN Resp----------|                             |
       |   (carry routing-info-A     |                             |
       |          in ERA)            |                             |
       |                             |                             |
       |--ClientA ICE Candidate Info------------------------------>|
       |                             |                             |
       |                             |           extract routing-info-A
       |                             |            from clientA's ERA
       |                             |                             |
       |                             |<---------TURN Req-----------|
       |                             |   (to 10.11.252.43:3478,    |
       |                             |    with routing-info-A)     |
       |                             |                             |
       |                             |----------TURN Resp--------->|
       |                             |    (carry routing-info-B    |
       |                             |           in ERA)           |
       |                             |                             |
       |<-----------------------------clientB ICE Candidate Info---|
       |                             |                             |
   extract routing-info-B            |                             |
    from clientB's ERA               |                             |
       |                             |                             |
       |<----Connectivity Checks---->|<----Connectivity Checks---->|
       |   (to 10.11.252.43:3478,    |   (to 10.11.252.43:3478,    |
       |    with routing-info-B)     |    with routing-info-A)     |
       |                             |                             |
       |<------------Data----------->|<-----------Data------------>|
       | (from/to 10.11.252.43:3478) | (from/to 10.11.252.43:3478) |

       Figure 8: Interaction Process Between Client and TURN Cluster

3.2.1.  ClientA Behavior

   When the clientA starts an ICE process, it first sends a STUN/TURN
   request as usual.  Since currently clientA does not have any
   information about the server and clientB, clientA MUST use
   "Arbitrary-mode" defined in Section 4.2 to generate transaction ID
   for requests.  After receiving the Allocate success response, clientA
   will extract ENCRYPTED-RELAYED-ADDRESS from the response and send it
   to clientB in Candidate Information.






Zeng                    Expires 10 November 2022               [Page 12]

Internet-Draft                TURN-Cluster                      May 2022


   Later clientA will receive Candidate Information from clientB, which
   include clientB's ENCRYPTED-RELAYED-ADDRESS, clientA MUST extract
   routing-info-B from it and start connectivity checks.  For
   establishing "srflxA2relayB" data channel, the Bind request of
   clientA SHOULD be sent to the relayed address obtained by clientB
   from the server, then clientA MUST use "Specific-address-mode" to
   generate transaction ID for the Binding request.  For establishing
   "relayA2srflxB" and "relayA2relayB" data channel, related requests
   SHOULD be sent to the TURN server that clientA had accessed before,
   then clientA MUST use "Specific-server-mode" to generate transaction
   ID for these requests.

   Above 3 relayed data channels have their own ways to transmit
   application data, for "srflxA2relayB", clientA can just send UDP
   datagram to the unified access portal of cluster, and the routing
   records left by the previous Binding request can ensure that they can
   be forwarded correctly.  For "relayA2srflxB" and "relayA2relayB",
   there are 2 mechanism for clientA sending application data to
   clientB: (1)Send Indication(defined in Section 11 of [RFC8656]);
   (2)Bind a Channel and send ChannelData message(defined in Section 12
   of [RFC8656]), where these two mechanism MUST use "Specific-server-
   mode" to generate transaction ID for indication(defined in Section 11
   of [RFC8656]), meanwhile, client MUST use ENCRYPTED-PEER-
   ADDRESS(description in Section 4.4) to specify the address of peer
   instead of XOR-PEER-ADDRESS.  For Channel mechanism, after success
   building a channel by Binding request, the later ChannelData message
   will be routed by the routing records left by the Binding request.

3.2.2.  ClientB Behavior

   The behavior of ClientB is just similar to clientA, the difference is
   that when clientB sends STUN/TURN requests for the first time, it
   have already known which server it should access through the routing-
   info-A brought by clientA, so, clientB MUST use "Specific-server-
   mode" to generate transaction ID for these requests.

3.2.3.  TURN Cluster Behavior

   A TURN Service cluster consists of 2 components, TURN LB and TURN
   server, the TURN LB is used to forward all packets to the right TURN
   server, and TURN server is the actual TURN service provider.

3.2.3.1.  TURN LB Behavior

   TURN LB forwards packets through two elements:






Zeng                    Expires 10 November 2022               [Page 13]

Internet-Draft                TURN-Cluster                      May 2022


   *  A self maintained routing-map, whose key is: concat(client source
      IP address, client source Port), and value is: concat(upstream
      TURN server IP address, upstream TURN server port).

   *  Routing information in transaction ID.

   When a packet arrives, a TURN LB SHOULD resolve and process packet as
   below:

   *  TURN LB first determines whether this packet is in STUN format, if
      so, TURN LB will extract the transaction ID from the packet, and
      process this packet through the way described in Section 4.3.

   *  If this packet is not in STUN format, TURN LB will extract the
      source IP address and port of the packet to form the key, and try
      to get the upstream TURN server IP address and port through the
      key and routing-map, if successfully, TURN LB will forward the
      packet to the upstream TURN server directly, and refresh the
      expiration time of the corresponding routing record.  If failed,
      drop this packet silently.

   Moreover, TURN LB SHOULD NOT modify the source IP address and port of
   the packet, for a TURN cluster MAY still provide STUN service.

3.2.3.2.  TURN Server Behavior

   For most STUN/TURN messages, the TURN server processes them as
   defined in [RFC8656], while there are some special requirements for
   XOR-RELAYED-ADDRESS and XOR-PEER-ADDRESS.  Instead of transmitting
   allocation information by XOR-RELAYED-ADDRESS, the TURN server MUST
   use ENCRYPTED-RELAYED-ADDRESS described in Section 4.1 to protect
   internal network information.  And when the TURN server receives an
   ENCRYPTED-PEER-ADDRESS attribute, it MUST process it as described
   inSection 4.4.  In addition, since a TURN server in the cluster MAY
   also provide STUN service, it SHOULD avoid carrying any
   attributes(e.g., RESPONSE-ORIGIN, RESPONSE-PORT defined in [RFC5780])
   that expose internal network information in the stun response

4.  Routing Mechanism

   This section defines the conventions for related components in
   Figure 7 securely generate and transmit routing information.  It
   describes:(1) How does the TURN server generate ENCRYPTED-RELAYED-
   ADDRESS to securely carry routing information; (2) How does the
   client generate routable transaction ID with ENCRYPTED-RELAYED-
   ADDRESS and specify address of peer by ENCRYPTED-PEER-ADDRESS; (3)
   How does the TURN LB process routable transaction ID and forward
   packets.



Zeng                    Expires 10 November 2022               [Page 14]

Internet-Draft                TURN-Cluster                      May 2022


4.1.  Server Generate ENCRYPTED-RELAYED-ADDRESS

   ENCRYPTED-RELAYED-ADDRESS is a new STUN attribute defined in this
   specification, which attribute value is TBD1(IANA is requested to
   assign TBD1 a value in the range 0x000e-0x000f).  The generation of
   ENCRYPTED-RELAYED-ADDRESS is divided into 3 phases:(1) Preparation
   phase; (2) Obfuscated phase; (3) Encryption phase.

4.1.1.  Preparation Phase

   The preparation phase is triggered at the time of preparing for
   cluster establishment or updating the members of the cluster.  In the
   preparation phase, the maintainer of the cluster will generate and
   synchronize configuration to TURN LB and each TURN server inside the
   cluster.  The configuration consists of 4 parts: (1) A 2 bits
   Configuration-ID, which is used to uniquely identify the
   configuration when the cluster configuration rotates; (2) An
   arbitrary nonnegative integer "divisor", which is used to do
   obfuscated calculation, "divisor" MUST be larger than the numbers of
   TURN server; (3) A set of "modulus", which is used to uniquely
   identifies each server in the cluster; (4) A 16 byte "key", which is
   used in encryption phase.  The maintainer of cluster MUST perform the
   following operations in the preparation phase:

   *  Select a configuration ID for the configuration.  The maintainer
      SHOULD ensure that there are no clients that are still using the
      configuration corresponding to the selected ID.

   *  Generate "divisor", "modulus" set and "key" defined in the
      configuration as required.

   *  If the cluster currently has a configuration in use, set its state
      to be "wait to be offline".

   *  Synchronize new configuration ID, "divisor" and "key" to TURN LB
      and each TURN server, then assigned each TURN server its own
      "modulus", and synchronize the mapping between the "modulus" and
      TURN server IP address to TURN LB.

   *  Set the state of the new configuration to be "active".  Note there
      MUST be only one configuration at the "active" state.  TURN server
      MUST NOT generate new ENCRYPTED-RELAYED-ADDRESS using an old
      configuration after receiving a new one.








Zeng                    Expires 10 November 2022               [Page 15]

Internet-Draft                TURN-Cluster                      May 2022


4.1.2.  Obfuscation Phase

   When a TURN server begins to generate ENCRYPTED-RELAYED-ADDRESS for
   Allocate success response, it starts the Obfuscation phase.  In
   Obfuscation phase, TURN server use divisor and its modulus from the
   currently used configuration to generate Obfuscated-address, the
   struct of Obfuscated-address is depicted below:

   Obfuscated-address {
     Configuration-ID(2),
     Obfuscated-value(30)
   }

   Obfuscated-value is calculated by adding an arbitrary nonnegative
   integer multiple of the "divisor" to its "modulus", without exceeding
   the maximum integer value 2^30.

4.1.3.  Encryption Phase

   After getting Obfuscated-address, the TURN server starts the
   Encryption phase, it first server left-padding the magic cookie with
   zeros to a 16Bytes string, and encrypt it with the "key" obtained in
   the preparation phase.  Encryption in the algorithms below uses the
   AES-128-ECB cipher, and the encryption result is recorded as "mask".
   Then, TURN server begin to generate ENCRYPTED-RELAYED-ADDRESS with
   the "mask", the struct of ENCRYPTED-RELAYED-ADDRESS is shown below:

   ENCRYPTED-RELAYED-ADDRESS {
     Attribute-Type(8),
     Reserve-bit(2),
     Encoded-Check-bit(6),
     Encoded-Port(16),
     Encoded-Obfuscated-Address(32)
   }

   ENCRYPTED-RELAYED-ADDRESS has the following fields:

   Attribute-Type: IANA is requested to assign a value for it.

   Reserve-bit: A 2bits value reserved for two special purposes.

   The Encoded-Check-bit, Encoded-Obfuscated-Address,Encoded-
   configuration-ID and Encoded-Port are calculated by the function
   defined below:

   Encoded-Check-bit = mask[0:6] ^ plaintext-check-bit
   Encoded-Port = mask[6:22] ^ allocate-port
   Encoded-Obfuscated-Address = mask[22:54] ^ Obfuscated-Address



Zeng                    Expires 10 November 2022               [Page 16]

Internet-Draft                TURN-Cluster                      May 2022


   While plaintext-check-bit is a 6 bits value with all bits of '1', and
   allocate-port is the 16 bits port value allocated by the TURN server.

4.2.  Generation of Routable Transaction ID

   As described in [RFC8489], The transaction ID is a 96-bit identifier
   generated by the client, to uniquely identify STUN transactions, it
   is always a uniformly and randomly chosen value.  Actually, 96 bits
   is over abundant, we can further design the transaction ID, so that
   it can not only implement the uniqueness, but also securely carry
   some routing information and check information.  The structure of a
   Routable Transaction ID is shown below:

   Routable Transaction ID {
     Mode-bit (2),
     Routing-info (6..54),
     Random-bit (40..88),
   }

   While the Mode-bit correspond to 3 route modes, and each mode has its
   corresponding routing information, 3 modes are depicted below:

   *  Arbitrary mode: Corresponding request can be sent to the default
      port of any TURN server in the cluster.

   *  Specific-server-mode: Corresponding request MUST be sent to the
      default port of the specific TURN server.

   *  Specific-address-mode: Corresponding request MUST be sent to the
      specified port of the specific TURN server

4.2.1.  Arbitrary Mode

   The typical scenario of "Arbitrary-mode" is that when a client send
   the first STUN/TURN request to the cluster at the beginning of ICE
   process, it does not have any information about TURN server, so
   client SHOULD set the Mode-bit to "00", and the routing information
   of transaction ID is just the 6bits check-bit with all bits of '1',
   as depicted below:

   Routing-info {
     Check-bit(6)
   }

   After that, the client will generate a 88bit random string as the
   Random-bit.





Zeng                    Expires 10 November 2022               [Page 17]

Internet-Draft                TURN-Cluster                      May 2022


4.2.2.  Specific Server Mode

   The scenarios suitable for mode B are: The client has received
   ENCRYPTED-XOR-RELAY-ADDRESS from TURN server or peer, and it expects
   to send a request to the TURN server corresponding to the ENCRYPTED-
   XOR-RELAY-ADDRESS.  For example: (1) Client has established a RTP
   relay connection in a TURN server, and wants to establish a RTP/RTCP
   connection pair in the same TURN server; (2) Client has received
   ENCRYPTED-XOR-RELAY-ADDRESS from peer Candidate Information and
   expects to apply for the relay port in the same TURN server.  At this
   mode, client MUST set Mode-bit to "01", and Routing-info struct is
   depicted below:

   Routing-info {
     Encoded-Check-bit(6),
     Encoded-Obfuscated-Address(32),
   }

   Encoded-Check-bit and Encoded-Address Here are obtained directly from
   ENCRYPTED-XOR-RELAY-ADDRESS.  The rest 56bit of transaction ID MUST
   be a cryptographically random value.

4.2.3.  Specific Address Mode

   At Specific Address Mode, client MUST have receive ENCRYPTED-XOR-
   RELAY-ADDRESS and expect to send a request to the specific port of
   the specific TURN server, a typical scenario is that: Client has
   received ENCRYPTED-XOR-RELAY-ADDRESS from peer Candidate Information,
   and expects to send a Bind request to the address of ENCRYPTED-XOR-
   RELAY-ADDRESS.  At this mode, client SHOULD set Mode-bit to "10", and
   Routing-info struct is depicted below:

   Routing-info {
     Encoded-Check-bit(6),
     Encoded-Obfuscated-Address(32),
     Encoded-Port(16),
   }

   Client MUST set Mode-bit to '10', and extract Encoded-Check-bit,
   Encoded-Port and Encoded-Address from ENCRYPTED-XOR-RELAY-ADDRESS.,
   and set it to transaction ID. then generate a 40bit random string to
   fill the rest of the transaction ID.









Zeng                    Expires 10 November 2022               [Page 18]

Internet-Draft                TURN-Cluster                      May 2022


4.2.4.  Uniqueness of Transaction ID

   This section will make a simple analysis of the uniqueness of the
   routable transaction ID, the routable transaction ID still depends on
   a large enough value range and random selection to ensure uniqueness.
   In fact, the routing part in transaction ID reduces the value range
   of transaction ID, in order to avoid the value range being too small,
   this specification suggest the obfuscated way to encode address, then
   the value range of transaction ID is determined by two factors: the
   length of random bit and the number of cluster machines N, and the
   value range of routable transaction ID under the three modes is shown
   in the table below:

   +-------------------------+------------------+
   |           mode          |    value range   |
   +-------------------------+------------------+
   |     Arbitrary Mode      |     0 - 2^88     |
   +-------------------------+------------------+
   |   Specific Server Mode  |   0 - (2^88)/N   |
   +-------------------------+------------------+
   |  Specific Address Mode  |   0 - (2^72)/N   |
   +-------------------------+------------------+

   In production environment, the number of machines in a TURN cluster
   is not particularly large, so the value range of arbitrary mode and
   specific server mode is enough for most scenarios.  As for specific
   address mode, only related peers will use this mode to access the
   same address, so it can work well without a particularly large value
   range.

4.3.  TURN LB Process Transaction ID

   When a TURN LB receives a TURN packet, it first extracts the first 2
   bits of transaction ID, if the first 2 bits are "11", the TURN LB
   will drop this packet silently.  Later TURN LB will determine the
   mode of the client by the first 2 bits.  For arbitrary mode requests,
   TURN LB will check whether the next 6 bits are all '1', if not, TURN
   LB SHOULD drop this packet silently.  If yes, TURN LB will forward
   this packet to a backend TURN server default port depending on each
   server's load condition.

   For specific server Mode and specific address Mode requests, TURN LB
   would first generate "mask" just as defined in encryption phase of
   Section 4.1, and calculate plaintext-check-bit and Obfuscated-Address
   as below:

   plaintext-check-bit = mask[0:6] ^ Encoded-Check-bit
   Obfuscated-Address  = mask[22:54] ^ Encoded-Obfuscated-Address



Zeng                    Expires 10 November 2022               [Page 19]

Internet-Draft                TURN-Cluster                      May 2022


   TURN LB then checks if all bits of plaintext-check-bit are all '1',
   if the check fails, TURN LB will drop this packet silently.  If
   success, TURN LB SHOULD perform the following sequence of steps:

   1.  Extract configuration ID and Obfuscated-value from Obfuscated-
       Address, and get the configuration corresponding to the
       configuration ID.

   2.  Express Obfuscated-value as an unsigned integer, and divide the
       result by the "divisor" to get the modulus of the request.

   3.  Use modulus to get TURN server IP address from the TURN LB self
       maintain map.  If "modulus" cannot be mapped to any TURN server,
       drop this packet silently.

   4.  If the TURN server selected in step3 is offline because of
       configuration rotation, TURN LB SHOULD send an error response to
       the client, with setting the ERR_CODE to be TBD3(IANA is request
       to assign a "4xx" err code for this value, to indicate request is
       failed because of the configuration problem).

   5.  If the TURN server selected in step3 works well, then it will
       forward the packet by the mode, for specific server Mode, TURN LB
       will forward the packet to the default port of the TURN server.
       For specific address Mode, TURN LB will forward the packet to the
       specific-port of the TURN server.

   The specific-port of step5 is calculated as below:

   allocate-port = mask[6:22] ^ Encoded-Port

4.4.  ENCRYPTED-PEER-ADDRESS

   ENCRYPTED-PEER-ADDRESS is a new STUN attribute defined in this
   specification, which attribute value is TBD2(IANA is requested to
   assign TBD1 a value in the range 0x000e-0x000f).  Similar to XOR-
   PEER-ADDRESS, the ENCRYPTED-PEER-ADDRESS is also used to indicate
   server the address and port of the peer, while the ENCRYPTED-PEER-
   ADDRESS is applicable to the scenario where the address and port of
   the peer is contained in ENCRYPTED-RELAYED-ADDRESS.  ENCRYPTED-PEER-
   ADDRESS has the same struct as ENCRYPTED-RELAYED-ADDRESS, and IANA is
   requested to assign a type value for ENCRYPTED-PEER-ADDRESS.

   TURN server MUST perform the following steps to process ENCRYPTED-
   PEER-ADDRESS attribute.

   1.  Calculate plaintext-check-bit, allocate-port and Obfuscated-
       Address by the mask and formula defined in Section 4.1.



Zeng                    Expires 10 November 2022               [Page 20]

Internet-Draft                TURN-Cluster                      May 2022


   2.  Check if all bits of plaintext-check-bit are all '1', if the
       check fails, the TURN server SHOULD drop this packet silently.

   3.  Extract configuration ID and Obfuscated-value from Obfuscated-
       Address, and get the "divisor" and "modulus" of the server by
       configuration id.

   4.  Express Obfuscated-value as an unsigned integer, and divide the
       result by the "divisor" to get the "modulus" of the request.
       Check if the "modulus" of the request is equal to the "modulus"
       of the server, if not equal, TURN server SHOULD send an error
       response to the client, with setting the ERR_CODE to be TBD4(IANA
       is request to assign a "4xx" err code for this value, to indicate
       request is failed due to access to an inappropriate server).  If
       equal, the TURN server then sends the packet to the corresponding
       address.

   The check at step4 is based on this consideration: Since the cluster
   has provided the routing mechanisms, all peers of a relayed channel
   SHOULD be connected to the same server to avoid extra hops in the
   network.

4.5.  TLS Consideration

   For most STUN/TURN requests, TURN LB forwards them based on
   transaction ID, if these messages are transmitted over DTLS-over-UDP
   or TLS-over-TCP, TURN LB cannot see the transaction ID directly.  In
   these cases, TURN LB MUST also play a role of TLS offload device to
   obtain the plaintext transaction ID.

5.  Security Consideration

   This document describes an architectural framework for building
   large-scale TURN clusters, since an attacker cannot obtain network
   information of a TURN server inside the cluster, attacks based on
   source address forgery(e.g., TURN loop attack) can be effectively
   prevented.  While a TURN cluster still suffers most attacks against a
   single TURN server, This section will discuss possible attacks on a
   TURN cluster.  For the attacks discussed in Section 21 of [RFC8656],
   if they are not mentioned in this section, it indicates that the
   relevant analysis of the attack is still valid for the TURN cluster.










Zeng                    Expires 10 November 2022               [Page 21]

Internet-Draft                TURN-Cluster                      May 2022


5.1.  DoS Against TURN Cluster

   An attacker might generate a large number of legitimate allocation
   requests and flood it, to exhaust the available ports of all TURN
   servers in the cluster.  Since all requests are legitimate, the
   attack cannot be prevented directly.  The maintainer of the TURN
   cluster can set up some custom address-based rules, which limit the
   number of allocation requests from the same source address to
   mitigate this attack.

5.2.  DoS Against a Single TURN Server

   Since the routing message in the transaction ID is encrypted and will
   be checked, it is hard for an attacker to construct a large number
   legitimate TURN request to attack a single TURN server.  However,
   ChannelData messages are routed by the address, an attacker might
   obtain a ChannelData and flood the corresponding channel with
   traffic.  This attack is mitigated by the recommendation that the
   server limit the amount of bandwidth it will relay for a given
   username or just use (D)TLS to avoid forgery of legal ChannelData
   messages.

6.  IANA Consideration

   IANA is requested to assign the type values for the attribute
   ENCRYPTED-RELAYED-ADDRESS(defined in Section 4.1) and ENCRYPTED-PEER-
   ADDRESS(defined in Section 4.4).

   +----------------+---------------------------+-----------------+
   | attribute type |        description        |   reference     |
   +----------------+---------------------------+-----------------+
   |                |         value for         |                 |
   |      TBD1      | ENCRYPTED-RELAYED-ADDRESS,|    this RFC     |
   |                |    used to carry relayed  |                 |
   |                |       address safely      |                 |
   +----------------+---------------------------+-----------------+
   |                |         value for         |                 |
   |      TBD2      |  ENCRYPTED-PEER-ADDRESS,  |    this RFC     |
   |                | used to carry peer address|                 |
   |                |          safely           |                 |
   +----------------+--------------===----------+-----------------+

   IANA is requested to assign the err code for the TBD3(defined in
   Section 4.3) and TBD4(defined in Section 4.4) depicted below:







Zeng                    Expires 10 November 2022               [Page 22]

Internet-Draft                TURN-Cluster                      May 2022


   +----------+------------------------+-----------------+
   | err code |       description      |   reference     |
   +----------+------------------------+-----------------+
   |          | request failed due to  |                 |
   |   TBD3   | server configuration   |    this RFC     |
   |          | rotation               |                 |
   +----------+------------------------+-----------------+
   |          | request failed because |                 |
   |   TBD4   | the client accessed an |    this RFC     |
   |          | inappropriate server   |                 |
   +----------+------------------------+-----------------+

7.  Contributors

   The authors would like to thank
   HongQuan.Z(hongquan.zhq@antgroup.com), jim(jim.pj@alibaba-inc.com),
   Y.Chen(cy119846@antgroup.com), Han.X(han.xiao@antgroup.com),
   Bin.Y(yb261973@antgroup.com), and
   XiaoKang.Q(xiaokang.qxk@antgroup.com),
   LingTao.K(lingtao.klt@antgroup.com) for their contributions to the
   this document.

8.  References

8.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC5780]  MacDonald, D. and B. Lowekamp, "NAT Behavior Discovery
              Using Session Traversal Utilities for NAT (STUN)",
              RFC 5780, DOI 10.17487/RFC5780, May 2010,
              <https://www.rfc-editor.org/info/rfc5780>.

   [RFC7478]  Holmberg, C., Hakansson, S., and G. Eriksson, "Web Real-
              Time Communication Use Cases and Requirements", RFC 7478,
              DOI 10.17487/RFC7478, March 2015,
              <https://www.rfc-editor.org/info/rfc7478>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/info/rfc8174>.







Zeng                    Expires 10 November 2022               [Page 23]

Internet-Draft                TURN-Cluster                      May 2022


   [RFC8445]  Keranen, A., Holmberg, C., and J. Rosenberg, "Interactive
              Connectivity Establishment (ICE): A Protocol for Network
              Address Translator (NAT) Traversal", RFC 8445,
              DOI 10.17487/RFC8445, July 2018,
              <https://www.rfc-editor.org/info/rfc8445>.

   [RFC8489]  Petit-Huguenin, M., Salgueiro, G., Rosenberg, J., Wing,
              D., Mahy, R., and P. Matthews, "Session Traversal
              Utilities for NAT (STUN)", RFC 8489, DOI 10.17487/RFC8489,
              February 2020, <https://www.rfc-editor.org/info/rfc8489>.

   [RFC8656]  Reddy, T., Ed., Johnston, A., Ed., Matthews, P., and J.
              Rosenberg, "Traversal Using Relays around NAT (TURN):
              Relay Extensions to Session Traversal Utilities for NAT
              (STUN)", RFC 8656, DOI 10.17487/RFC8656, February 2020,
              <https://www.rfc-editor.org/info/rfc8656>.

   [RFC9000]  Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based
              Multiplexed and Secure Transport", RFC 9000,
              DOI 10.17487/RFC9000, May 2021,
              <https://www.rfc-editor.org/info/rfc9000>.

8.2.  Informative References

   [TURN-Load-balance]
              "TURN Performance and Load Balance", n.d.,
              <https://github.com/coturn/coturn/wiki/TURN-Performance-
              and-Load-Balance>.

Author's Address

   William Zeng
   Ant Group
   Email: william.zk@antfin.com

















Zeng                    Expires 10 November 2022               [Page 24]