Internet DRAFT - draft-rosenberg-dispatch-spin

draft-rosenberg-dispatch-spin







Dispatch                                                    J. Rosenberg
Internet-Draft                                                     Five9
Intended status: Standards Track                             C. Jennings
Expires: 12 January 2023                                       A. Cooper
                                                                   Cisco
                                                             J. Peterson
                                                                 Neustar
                                                            11 July 2022


              Simple Protocol for Inviting Numbers (SPIN)
                    draft-rosenberg-dispatch-spin-00

Abstract

   This document introduces a framework and a protocol for facilitating
   voice, video and messaging interoperability between application
   providers.  This work is motivated by the recent passage of
   regulation in the European Union - the Digital Markets Act (DMA) -
   which, amongst many other provisions, requires that vendors of
   applications with a large number of users enable interoperability
   with applications made by other vendors.  While such interoperability
   is broadly present within the public switched telephone network, it
   is not yet commonplace between over-the-top applications, such as
   Facetime, WhatsApp, and Facebook Messenger.  This document
   specifically defines the Simple Protocol for Inviting Numbers (SPIN)
   which is used to deliver invitations to mobile phone numbers that can
   bootstrap subsequent communications over the Internet.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 12 January 2023.






Rosenberg, et al.        Expires 12 January 2023                [Page 1]

Internet-Draft                    SPIN                         July 2022


Copyright Notice

   Copyright (c) 2022 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Implications of no Standards Action . . . . . . . . . . . . .   3
   3.  Affected Actors . . . . . . . . . . . . . . . . . . . . . . .   4
   4.  SPIN Framework  . . . . . . . . . . . . . . . . . . . . . . .   4
   5.  SPIN Protocol Overview  . . . . . . . . . . . . . . . . . . .   6
   6.  SPINvitation Object Syntax  . . . . . . . . . . . . . . . . .  10
   7.  SPIN Protocol for Providing URIs  . . . . . . . . . . . . . .  10
   8.  Mobile OS vendor API Recommendations  . . . . . . . . . . . .  10
   9.  Specifications of Communications Protocols  . . . . . . . . .  10
     9.1.  Voice and Video . . . . . . . . . . . . . . . . . . . . .  10
     9.2.  Messaging . . . . . . . . . . . . . . . . . . . . . . . .  11
   10. Security Considerations . . . . . . . . . . . . . . . . . . .  11
   11. Normative References  . . . . . . . . . . . . . . . . . . . .  11
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  12

1.  Introduction

   Voice, video and messaging today is commonplace on the Internet,
   enabled by two distinct classes of software.  The first are those
   provided by telecommunications carriers that make heavy use of
   standards, such as the Session Initiation Protocol (SIP) [RFC3261].
   In this approach - which we call the telco model - there is
   interoperability between different telcos, but the set of features
   and functionality is limited by the rate of definition and adoption
   of standards, often measured in years or decades.  The second model -
   the app model - allows a single entity to offer an application,
   delivering both the server side software and its corresponding
   client-side software.  The client-side software is delivered either
   as a web application, or as a mobile application through a mobile
   operating system app store.  The app model has proven incredibly
   successful by any measure.  It trades off interoperability for
   innovation and velocity.



Rosenberg, et al.        Expires 12 January 2023                [Page 2]

Internet-Draft                    SPIN                         July 2022


   The downside of the loss of interoperability is that entry into the
   market place by new providers is difficult.  Applications like
   WhatsApp, Facebook Messenger, and Facetime, have user bases numbering
   in the hundreds of millions to billions of users.  Any new
   application cannot connect with these user bases, requiring the
   vendor of the new app to bootstrap its own network effects.

   This situation has recently drawn the attention of regulators, and
   was one of the motivations behind the Digital Markets Act (DMA) in
   the European Union.  Amongst its many provisions, it requires vendors
   of large communications platforms to enable interoperability with
   third party vendors.  It does not, of course, specify an actual set
   of protocols or technologies for enabling that interoperability.

   This document seeks to fill that void, by defining a framework - the
   SPIN Framework - for such interoperability.  This framework seeks to
   strike a balance between innovation and standardization, by
   identifying only those portions of the protocol stack that must be
   standardized in order to achieve end-to-end security for a minimum
   feature set between providers, and leaving everything else to APIs
   and protocols which each vendor can define on it's own.

   This framework identifies the need for a new protocol to solve the
   identity mapping problem - the SPIN Protocol.  Specifically, how does
   an originating user using one application identify a target user in a
   different application with which they wish to communicate, and then
   obtain an identifier for the target user in the target application
   that is utilized by that target user?  Consider the following
   example.  User Alice is a user of Facebook Messenger, and wishes to
   send a 1-1 chat message to her friend Bob. Bob is a user of a
   different application for messaging - Signal for example - but this
   fact is not known to Alice.  Alice needs to somehow obtain a URI that
   can be used to send messages to the Signal application targeted at
   Bob. This is the identity mapping problem, and is addressed by the
   SPIN protocol defined here.

2.  Implications of no Standards Action

   In theory the application interoperability envisioned in the DMA
   could be achieved entirely through the publication of vendor-specific
   APIs and without standardization.  However, this would yield a
   suboptimal outcome for both users and app developers, as supporting
   the matrix of pairwise communication flows between all of the
   affected voice, video, and messaging applications in the market via
   vendor-specific APIs will create a patchwork of inconsistent user
   experiences and likely lead to buggy implementations.  Using a
   minimal standardized framework to bootstrap cross-app commmunications
   will provide more consistency while leaving app developers freedom to



Rosenberg, et al.        Expires 12 January 2023                [Page 3]

Internet-Draft                    SPIN                         July 2022


   continue to make their own design choices.

   Furthermore, the usage of a standards-based solution ensures that
   end-to-end messaging, voice, and video can happen between providers.
   Without a standard, each vendor subject to the DMA will publish APIs
   for access to their services.  These APIs have traditionally provided
   access to messages, voice and video that are not protected by e2e
   crypto.  While it is possible, in theory, that each application
   provider could amend their APIs to provide access to e2e encrypted
   content, doing so without an agreed-upon standard will almost
   certainly lead to third parties decrypting in the cloud to avoid
   implementing N variations in each client, one for each provider they
   interop with.

3.  Affected Actors

   The solution defined by the SPIN framework requires participation
   from multiple actors, and thus requires coordination through
   standards.  These actors are:

   *  Mobile OS Vendors: Most notable Apple and Google.  It requires
      them to implement new APIs in their operating systems, new user
      preference capabilities, and support for user identity through
      certificates.

   *  App Developers: App developers, such as a Signal or Facebook
      Messenger, are required to change.  They are required to utilize
      the APIs exposed by the mobile OSs, and also implement the voice,
      video and/or chat protocols specified by the SPIN Framework.

   *  STIR/SHAKEN PA/CA: The SPIN framework suggests that it be possible
      for the mobile OS vendors to generate STIR certificates for the
      device.  This requires that these vendors be supported as valid
      CAs for STIR.

   Note that the SPIN Framework described here does not require any
   support or changes from the carriers themselves (Note however, the
   open issue discussed below where we discuss an alternative
   certification model where the telcos perform delegation to the mobile
   OS vendors to install a cert on the phone).

4.  SPIN Framework

   The framework for SPIN is shown in the figure below:







Rosenberg, et al.        Expires 12 January 2023                [Page 4]

Internet-Draft                    SPIN                         July 2022


   +---------------+               +---------------+
   |               | Comm Protocol |               |
   |Originating Svc+---------------+Terminating Svc|
   |               |               |               |
   +-------+-------+               +-------+-------+
           |                               |
           |                               |
           |                               |
           |                               |
   +-------+-------+               +-------+-------+
   |               |               |               |
   |Originating App|               |Terminating App|
   |               |               |               |
   +-------+-------+               +-------+-------+
           |                               |
   +-------+-------+    +-----+    +-------+-------+
   |Originating OS +----+ SMS +----+Terminating OS |
   +---------------+    +-----+    +---------------+

   In the framework, we have two users - the originating and
   terminating.  The originating user wishes to send a message, make a
   video call, or make a voice call, to the terminating user.  A
   fundamental assumption of SPIN is that the originating and
   terminating users are both identifiable by telephone numbers on the
   Public Switched Telephone Network (PSTN), and that the terminating
   user can be reached via SMS.  The originating user knows the
   telephone number for the terminating user.  The originating user is
   using an app running on an operating system.  The operating system
   can be a mobile OS, such as iOS or Android.  The originating OS
   exposes APIs towards the application, which allow the originating app
   to request communication to a user with the specified number.  The
   originating app is associated with a service running on the Internet,
   and can connect to it for communications services.  There is a
   similar setup on the terminating side - the user has an application
   running on an operating system which can receive SMS messages, and
   their app is associated with a service reachable over the Internet.

   The role of the operating systems in this framework is to act as a
   trust anchor.  The OS is responsible for authenticating the
   applications and vetting their behaviors, as they normally do on
   mobile OSs.

   The goal of the SPIN protocol is to allow a user of the originating
   app to select a service (voice, video or messaging), and select a
   phone number to which they communicate, and then receive a URI which
   corresponds to the terminating service which can be used to perform
   that communication.  The URIs of course correspond to protocols for
   that form of communication.



Rosenberg, et al.        Expires 12 January 2023                [Page 5]

Internet-Draft                    SPIN                         July 2022


   Once the SPIN Protocol has run, the originating service now has a
   protocol URI for the particular media type - voice, video or chat,
   and can initiate it towards the terminating service.  The SPIN
   Framework recommends specific protocols for voice, video and chat.
   For voice and video, the SPIN Framework suggests SIP [RFC3261], with
   [I-D.rosenberg-dispatch-cloudsip], [RFC8224] and the webRTC media
   stack.  For messaging, it suggests creation of a new REST-based
   protocol for 1-1 messaging, including e2e encryption using STIR-based
   certificates, and features such as delivery and read receipts,
   emojis, stickers, reactions, threads, images, URLs, contacts, and so
   on, forming a baseline set of minimum viable 1-1 messaging.  For the
   initial phase of SPIN, group communications would be out of scope.

   Though the framework is expressed in terms that align with mobile
   operating systems, the same framework can apply in other cases.  For
   example, the terminating service, app and OS can logically be a
   single entity.  As an example, the terminating service, app and OS
   could be associated with a Contact Center as a Service (CCaaS)
   provider.  In that setup, the SMS messages are delivered directly to
   the CCaaS provider, and there is not a mobile operating system
   involved to receive them.

5.  SPIN Protocol Overview

   The behavior of the SPIN Protocol is best understood through a high
   level sequence diagram:

























Rosenberg, et al.        Expires 12 January 2023                [Page 6]

Internet-Draft                    SPIN                         July 2022


+-----------+         +---------+ +-----------+ +-----+  +---------+           +-----------+ +-----------+
| orig_app  |         | orig_os | | orig_svc  | | sms |  | term_os |           | term_app  | | term_svc  |
+-----------+         +---------+ +-----------+ +-----+  +---------+           +-----------+ +-----------+
      |                    |            |          |          |                      |             |
      |                    |            |          |          |             register |             |
      |                    |            |          |          |<---------------------|             |
      |                    |            |          |          |                      |             |
      | call {number}      |            |          |          |                      |             |
      |------------------->|            |          |          |                      |             |
      |                    |            |          |          |                      |             |
      |                    | inv        |          |          |                      |             |
      |                    |---------------------->|          |                      |             |
      |                    |            |          |          |                      |             |
      |                    |            |          | inv      |                      |             |
      |                    |            |          |--------->|                      |             |
      |                    |            |          |          | -------------\       |             |
      |                    |            |          |          |-| verify sig |       |             |
      |                    |            |          |          | |------------|       |             |
      |                    |            |          |          | ---------------\     |             |
      |                    |            |          |          |-| verify hndlr |     |             |
      |                    |            |          |          | |--------------|     |             |
      |                    |            |          |          |                      |             |
      |                    |            |          | send URI |                      |             |
      |                    |<---------------------------------|                      |             |
      |                    |            |          |          |                      |             |
      |                URI |            |          |          |                      |             |
      |<-------------------|            |          |          |                      |             |
      |                    |            |          |          |                      |             |
      | req passport       |            |          |          |                      |             |
      |------------------->|            |          |          |                      |             |
      |                    |            |          |          |                      |             |
      |           passport |            |          |          |                      |             |
      |<-------------------|            |          |          |                      |             |
      |                    |            |          |          |                      |             |
      | call               |            |          |          |                      |             |
      |-------------------------------->|          |          |                      |             |
      |                    |            |          |          |                      |             |
      |                    |            | INVITE   |          |                      |             |
      |                    |            |--------------------------------------------------------->|
      |                    |            |          |          |                      |             |



   On the terminating side, the terminating user at some point installs
   an application which is capable of handling communications for one or
   more media types (voice, video or messaging).  The application will
   register with the terminating OS, using APIs exposed in the OS, that
   it is capable of acting as a SPIN handler.  As part of the



Rosenberg, et al.        Expires 12 January 2023                [Page 7]

Internet-Draft                    SPIN                         July 2022


   registration, the application provides the OS with a URI for the
   service it provides of that media type.  As discussed below, this can
   be a proprietary API, or can be a baseline standard protocol.  In the
   case of voice, that baseline standard is SIP, and in particular,
   cloud SIP [I-D.rosenberg-dispatch-cloudsip].

   Later on, a user in an originating application decides to place a
   call to a number.  The originating application does not have a user
   with that number as part of its own service, so it knows it needs to
   use SPIN to route the call.  It goes to the operating system on the
   mobile phone, and requests it to provide a URI for voice
   communications to the specified phone number.  The originating OS
   then prepares an SPINvitation object.  This is a JWT which contains
   several fields.  THe fields include the phone number of the
   originating user (which must be known and verified by the mobile OS),
   and an HTTP URI that can be used by the terminating OS to send the
   results back, and the communications service that is requested.  This
   HTTP URI will normally contain an embedded Authorization header field
   that contains a short-lived token, valid to send the results back.
   It then signs the JWT and sends an SMS (more likely, an MMS given the
   size of the signed object), to the target user's phone number.  The
   terminating OS receives the SMS/MMS, and notices that it contains an
   SPINvitation object, and thus should not be rendered to the user.
   Should the terminating user and its OS not support this protocol, it
   will end up rendering the MMS.  The MMS includes some plain text,
   which can be rendered to the user, indicating that the caller wishes
   to speak with them, so that the human user can take some action (like
   a return voice call over the PSTN).

   Assuming the terminating OS supports this protocol, the MMS is
   absorbed and decoded.  THe signature is verified and then the
   communications service is obtained.  In this example use case, it is
   for a voice call.  The terminating OS has an application that has
   registered itself as a handler for voice.  Note that, the terminating
   user might have multiple applications on their OS which can act as
   handlers for voice.  In such a case, the mobile OS would offer the
   user a configuration setting to choose one as a default.

   The app had previously registered itself as a handler and provided a
   SIP URI for the receipt of calls, something like
   sip:{number}@provider.com.  This URI is sent back to the originating
   OS.  Rather than sending this back via SMS/MMS, IP communications are
   used.  The invitation object contained an HTTP URI which can be used
   by the terminating OS to send the SIP URI.  The SPIN protocol defines
   the exact syntax and semantics of this HTTP POST operation.  This is
   received by the originating OS, which then informs the app that it
   was able to locate the user.  The originating OS provides the
   communications URI (in this case, a SIP URI for voice calls).



Rosenberg, et al.        Expires 12 January 2023                [Page 8]

Internet-Draft                    SPIN                         July 2022


   Next - the originating app places a SIP call.  Because we are now
   dealing with inter-domain and inter-provider calls, secure caller ID
   is required.  SPIN requires that STIR passports [RFC8225] are
   included, sent using [RFC8224].  The originating OS is required to
   obtain a passport that is valid for the originating user.  In this
   framework, this is done by virtue of the mobile OS having a
   certificate by which it can perform the signing operation directly.

   There are two ways in which the originating OS can obtain such a
   certificate.  In one approach, the mobile OS would perform SMS
   verification (again, invisibly, by absorbing the SMS it sends to
   itself), and add an additional check of comparing it agaisnt the
   mobile numnber the user claimed they owned during provisioning time
   of the device.  The mobile OS vendor would be a valid CA, and then
   generte a certificate valid for that individual phone number.  In an
   alternative model, the telco uses certificate delegation [RFC9060],
   and generates a certificate that is handed to the phone during device
   provisioning.  The latter approach is more secure in some ways (as it
   would no longer depend on SMS forward routability for authentication
   of a user), but is much harder to deploy.

   The originating app makes an API call into the OS to obtain the
   passport, which is then returned to the app.  The app uses its own
   app-specific protocols to communicate with its servers, and will send
   the passport and the terminating user's phone number to its service.
   Its service will then send a SIP INVITE to the target number,
   including the passport in the SIP Identity header field.  From there,
   the terminating service can alert its app using the mobile OS push
   techniques, and a call has been placed.

   The SPIN framework therefore consists of the following:

   1.  A standardized syntax for the SPINvitation object that can be
       sent via MMS

   2.  A standardized HTTP-based protocol for providing URIs for
       communications - the actual SPIN on the wire protocol

   3.  Recommendations for mobile OS vendors on APIs they should provide
       to enable SPIN, without actually specifying any details of what
       those APIs look like

   4.  Specifications for communications protocols needed for voice,
       video and messaging between app providers







Rosenberg, et al.        Expires 12 January 2023                [Page 9]

Internet-Draft                    SPIN                         July 2022


6.  SPINvitation Object Syntax

   This will be a JWT that contains:

   *  The desired media type, one of an enumerated set

   *  An HTTP URI for a callback

   Details TBD.

7.  SPIN Protocol for Providing URIs

   To be filled in

8.  Mobile OS vendor API Recommendations

   To be filled in

9.  Specifications of Communications Protocols

   There are several ways in which the communications protocols could be
   specified.  On one extreme, the standard could leave this entirely up
   to the terminating provider to define its protocol or API and
   document it publically.  It would then be the responsibility of the
   originating service to implement each of these APIs for every
   terminating provider it wishes to speak to.  On the other extreme, we
   can fully specify a protocol - most likely with reference to existing
   standards.

   SPIN tries to take a middle ground.  It allows terminating providers
   to choose whether their interface is proprietary, or, whether it
   follows a minimum baseline protocol specified here.

9.1.  Voice and Video

   Because the communications are between providers that may not have
   previously had an established bilateral relationship, we want the
   communications to be possible without any kind of manual
   configuration.  For this reason, SPIN specifies that the default
   voice and video communications protocol is SIP [RFC3261], along with
   it's extension for cloud SIP [I-D.rosenberg-dispatch-cloudsip], and
   it utilizes the media protocols standardized by webRTC.  The usage of
   cloud SIP allows scalable, reliable, inter-provider SIP over the
   Internet, and the usage of the webRTC media stack provides a well-
   defined baseline media stack that is already widely implemented.  The
   SIP messaging MUST utilize [RFC8224] to ensure secure user identity.
   Media between the originating and terminating service will be DTLS-
   SRTP by virtue of using webRTC, and e2e media encryption is supported



Rosenberg, et al.        Expires 12 January 2023               [Page 10]

Internet-Draft                    SPIN                         July 2022


   and bootstrapped using a certificate bound to the user's phone
   numbers.  The mobile OS would hold the STIR certificate, and allow
   the application to request a signature over the keying material for
   driving DTLS-SRTP.

   Details to be filled out.

9.2.  Messaging

   For messaging, 1-1 messaging will be supported in the initial
   specification.  All messages will be e2e encrypted, using the STIR
   certificate as well.  A specification will be produced that defines a
   REST-based protocol for basic 1-1 messaging features, including read
   receipts, delivery notifications, typing indicators, images, videos,
   contact cards, and so on.  A baseline set of capabilities would be
   provided, along with an extensibility framework for future content
   that would allow users to pop out to a browser in cases where some
   new content is added, that is not yet supported.

   Details TBD.

10.  Security Considerations

   The SPIN protocol defined here is meant to address the following
   threats:

   *  A malicious application that "steals" incoming calls or chats
      against user wishes.  To prevent this, this protocol enlists the
      mobile operating system as a trusted third party that governs
      dispatch of communication requests to the right application based
      on user preferences.

   *  A malicious application that spams target users with requests for
      communication.  This is mitigated by enlisting the aid of the
      mobile operating system on the terminating side to absorb SMS's
      conforming to this specification, and not presenting them to the
      user.  Digital signatures are used over the content of the SMS
      messages, and the terminating OS can validate that it trusts the
      sender before taking further action.

   *  Intermediates that eavesdrop on communications between app
      providers.  This is mitigated by using e2e encryption across
      messaging, voice and video, ensuring it can be retained when
      crossing provider boundaries.

11.  Normative References





Rosenberg, et al.        Expires 12 January 2023               [Page 11]

Internet-Draft                    SPIN                         July 2022


   [I-D.ietf-mls-architecture]
              Beurdouche, B., Rescorla, E., Omara, E., Inguva, S., Kwon,
              A., and A. Duric, "The Messaging Layer Security (MLS)
              Architecture", Work in Progress, Internet-Draft, draft-
              ietf-mls-architecture-08, 16 June 2022,
              <https://www.ietf.org/archive/id/draft-ietf-mls-
              architecture-08.txt>.

   [I-D.rosenberg-dispatch-cloudsip]
              Rosenberg, J., Jennings, C., and T. Asveren, "SIP
              Extensions for High Availability and Load Balancing for
              Public Cloud", Work in Progress, Internet-Draft, draft-
              rosenberg-dispatch-cloudsip-00, 21 February 2021,
              <https://www.ietf.org/archive/id/draft-rosenberg-dispatch-
              cloudsip-00.txt>.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC3261]  Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
              A., Peterson, J., Sparks, R., Handley, M., and E.
              Schooler, "SIP: Session Initiation Protocol",
              DOI 10.17487/RFC3261, RFC 3261, June 2002,
              <https://www.rfc-editor.org/info/rfc3261>.

   [RFC8224]  Peterson, J., Jennings, C., Rescorla, E., and C. Wendt,
              "Authenticated Identity Management in the Session
              Initiation Protocol (SIP)", RFC 8224,
              DOI 10.17487/RFC8224, February 2018,
              <https://www.rfc-editor.org/info/rfc8224>.

   [RFC8225]  Wendt, C. and J. Peterson, "PASSporT: Personal Assertion
              Token", DOI 10.17487/RFC8225, RFC 8225, February 2018,
              <https://www.rfc-editor.org/info/rfc8225>.

   [RFC9060]  Peterson, J., "Secure Telephone Identity Revisited (STIR)
              Certificate Delegation", RFC 9060, DOI 10.17487/RFC9060,
              September 2021, <https://www.rfc-editor.org/info/rfc9060>.

Authors' Addresses

   Jonathan Rosenberg
   Five9
   Email: jdrosen@jdrosen.net





Rosenberg, et al.        Expires 12 January 2023               [Page 12]

Internet-Draft                    SPIN                         July 2022


   Cullen Jennings
   Cisco
   Email: fluffy@cisco.com


   Alissa Cooper
   Cisco
   Email: alissa@cooperw.in


   Jon Peterson
   Neustar
   Email: jon.peterson@neustar.biz






































Rosenberg, et al.        Expires 12 January 2023               [Page 13]