Internet DRAFT - draft-rosenberg-vcon-cc-usecases

draft-rosenberg-vcon-cc-usecases







Network Working Group                                       J. Rosenberg
Internet-Draft                                              A. Siciliano
Intended status: Informational                                     Five9
Expires: 9 January 2024                                      8 July 2023


           Contact Center Use Cases and Requirements for VCON
                  draft-rosenberg-vcon-cc-usecases-00

Abstract

   This document outlines use cases and requirements for the exchange of
   VCONs (Virtual Conversation) within contact centers.  A VCON is a
   standardized format for the exchange of call recordings and call
   metadata.  Today, call recordings are exchanged between different
   systems within the contact center.  Often, these are done using
   proprietary file formats and proprietary APIs.  By using VCONs,
   integration complexity can be reduced.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 9 January 2024.

Copyright Notice

   Copyright (c) 2023 IETF Trust and the persons identified as the
   document authors.  All rights reserved.











Rosenberg & Siciliano    Expires 9 January 2024                 [Page 1]

Internet-Draft            VCON CC Requirements                 July 2023


   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Types of Supporting Applications  . . . . . . . . . . . . . .   3
     2.1.  Recording . . . . . . . . . . . . . . . . . . . . . . . .   3
     2.2.  Quality Management (QM) . . . . . . . . . . . . . . . . .   4
     2.3.  Speech Analytics  . . . . . . . . . . . . . . . . . . . .   6
   3.  PII and PCI Redaction . . . . . . . . . . . . . . . . . . . .   7
   4.  Omni Channel  . . . . . . . . . . . . . . . . . . . . . . . .   7
   5.  Deployment Topologies . . . . . . . . . . . . . . . . . . . .   8
   6.  Required Meta-Data  . . . . . . . . . . . . . . . . . . . . .   8
   7.  Informative References  . . . . . . . . . . . . . . . . . . .  11
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  11

1.  Introduction

   Contact Centers (CC) are a capability provided by companies for the
   purposes of engaging with their customers.  They are staffed by
   contact center agents, whose job it is to handle these interactions.
   Interactions include phone calls, emails, texts, and messages
   delivered through messaging vendors, such as Facebook Messenger and
   WhatsApp.  Contact centers are staffed by human agents whose job it
   is to handle these interactions.  Interactions can be inbound - when
   the customer initiates the conversation, such as by calling a toll
   free number for the company.  Interactions can be outbound, such as
   when a company calls a customer for a reminder about an upcoming
   appointment.















Rosenberg & Siciliano    Expires 9 January 2024                 [Page 2]

Internet-Draft            VCON CC Requirements                 July 2023


   Contact centers are implemented through the usage of software
   applications.  These applications usually include web front ends
   consumed by agents, managers, supervisors and other persona in the
   contact center.  These are supported by backend servers, which
   receive the interactions, queue them, distribute them to agents, and
   handle agent actions like transfers and holds.  This functionality is
   sometimes referred to as the ACD - for Automatic Call Distribution.
   It is also sometimes called the core, as it represents the primary
   application in the contact center.  Like much other software, the ACD
   was initially deployed on-premise, but has now largely migrated to
   cloud-based delivery.  These vendors are often called Contact Center
   as a Service (CCaaS) vendors.

   Within the contact center, there are numerous supporting applications
   that are purchased by companies and need to plug in to the core.
   These include recording, quality management (QM), and speech
   analytics (SA).  These applications operate by obtaining recordings,
   along with recording meta-data, from the core.  Today, these
   supporting applications make use of a variety of proprietary APIs to
   obtain these recordings and their meta-data.  This means that the
   integrations vary from vendor to vendor, and result in
   incompatibilities, security weaknesses, and lengthy timelines to
   complete.

   Recently, the IETF has begun to explore the standardization of a file
   format for recordings and recording meta-data, called VCON (Virtual
   Conversation) [I-D.petrie-vcon].  This document is meant to provide
   input to the VCON effort by describing the use cases and requirements
   specifically within the contact center.

2.  Types of Supporting Applications

   In the contact center, there are several different types of
   applications which require consumption of recordings.  These
   typically go under the moniker of Workforce Optimization (WFO).  This
   section describes the main ones.

2.1.  Recording

   Call Recording applications receive call recordings from the core,
   and then provide long term storage, playback, and search
   functionality.  Recording storage is needed for archival purposes,
   and is often a requirement to meet compliance regulations in certain
   industries.







Rosenberg & Siciliano    Expires 9 January 2024                 [Page 3]

Internet-Draft            VCON CC Requirements                 July 2023


2.2.  Quality Management (QM)

   Quality Management (QM) applications are used by contact center
   managers to make sure agents are following guidelines on proper
   handling of calls.  Many consumers are familiar with the greeting
   played in voice response systems which say, "This call may be
   monitored for quality and training purposes".  That greeting refers
   specifically to QM applications.

   QM applications allow a user to playback recordings for a particular
   agent, and then based on that recording, rate them on how they
   performed.  These ratings are made against a questionnaire that
   defines the rubric against which agents are scored.  This rubric will
   often include questions like, "Did the agent thank the customer for
   calling and ask them how they can help"?  Or, "Did the agent upsell
   the customer on the newest product?".  These scorecards are then
   shared with the agents and their managers (the supervisors), along
   with coaching and training materials to handle cases where the agent
   didn't do well.  Originally, scoring was done entirely by humans, and
   as a consequence, only a handful of calls for each agent could
   possibly be scored.  These were often done by sampling calls at
   random.

   It is also common for QM applications to use speech recognition
   technology to transcribe calls into text.  This allows a call to be
   scored more quickly, and enables search functions for selection of
   specific calls that would be good candidates for scoring.

   A part of the agent role involves usage of corporate applications,
   such as ordering, billing, shipping, to handle the customer inquiry.
   To determine whether agents are using these tools correctly, it is
   common in the contact center for agents to have desktop recording
   applications installed.  These record the screen content as a video
   file.  Typically, the vendor of the QM software provides the desktop
   screen recording and backend applications which receive and store the
   recording.  These are then combined with the audio, email, or chat
   recordings that come from the core.  The following shows the flow of
   recordings in this use case:













Rosenberg & Siciliano    Expires 9 January 2024                 [Page 4]

Internet-Draft            VCON CC Requirements                 July 2023


   +--------+
   |Customer|
   +--------+
       ^ Real Time
       | Voice
       |
       V    Recording                    +--------+
     +----+ Transfer   +----+  Access    |Quality |
     |Core+----------->| QM +----------->|Manager |
     +----+            +----+            +--------+
       ^ Real Time       ^
       | Voice           |
       |                 | Desktop
       V                 | Recording
   +-------+             |
   | Agent |-------------+
   +-------+

                      Figure 1: QM Recording Exchanges

   In this flow, the customer calls into the contact center, and is
   connected to the core.  Typically this is done through the Session
   Initial Protocol (SIP) [RFC3261] and the Real-Time Transport Protocol
   (RTP) [RFC3550].  The call is delivered to the agent, also typically
   using SIP and RTP.  The core will record the call, and then at the
   end of the call, the recording is transferred to the QM system.
   During the call, the agent desktop is recorded, and this recording is
   transferred to the QM system.  At a later time, the Quality Manager
   can log into the QM application, and access the recording, inclusive
   of the audio, the transcript and the desktop recording.

   In practice, there are many variations on this basic exchange.
   Sometimes, the ACD sends the audio portion of the call to the QM
   system using real-time streaming, sometimes using SIPREC [RFC7866].
   This is then augmented with meta-data using proprietary REST APIs.
   In other cases, the audio is sent post-call, and similarly, meta-data
   is obtained using proprietary REST APIs.  When transcription takes
   place, it is most often done by the QM system but not always.  In
   some cases, a transcript is sent from the core to the QM system
   instead of, or in addition to, the audio recording.

   In a similar way, the transfer of the desktop recording from the
   agent's computer to the QM system can happen in real-time or post-
   call.  Post-call systems will often upload the recording in chunks,
   sometimes doing so after hours or when the agent is not on a call.

   A key considering for this use case is the concept of recording
   stitching.



Rosenberg & Siciliano    Expires 9 January 2024                 [Page 5]

Internet-Draft            VCON CC Requirements                 July 2023


   In a typical call in the contact center, there are multiple segments,
   each of which represents a phase of the call.  There will be a
   segment that contains the customer's interaction with the voice
   response system, where no agents were present.  When the customer is
   connected to an agent, there will be a segment representing the
   portion of the call where the customer talks to the agent.  As the
   call is conferenced, transferred or held, each corresponds to an
   additional segment.

   The process of assembling together these segments into a complete
   recording is referred to as stitching.  Different stitches are needed
   depending on the use case.  In a QM use case, the quality manager is
   rating the agent, and thus what matters is the call as seen by that
   agent.  In the case where a call was handled by multiple agents (a
   common use case in the contact center), a single call would result in
   two separate stitched recordings - one representing the customer's
   time with the first agent, and the second with the second agent.
   This is different than recording use cases as described above, where
   what matters is the entire call as seen by the customer.

2.3.  Speech Analytics

   Speech analytics applications provide graphs and dashboards on the
   content of conversations.  For voice calls, this includes metrics
   like cross-talk, silence durations, and anger, which are computed
   directly from the voice.  Voice calls are often transcribed to text,
   and further analysis is provided on the text.  This might include
   customer sentiment, frequency of common reasons for call, and so on.
   These tools will also often provide discovery tools, such as word
   clouds and clustering.

   Speech analytics tools are often used to help companies decide which
   calls should be used for quality management.  This is an improvement
   over pure random based sampling.  They are also used to help
   companies improve their processes in the contact center, identifying
   areas where agents are inefficient.  For example, speech analytics
   can be used to determine that there has been a spike in customer
   refund requests, and the agents are taking too long to handle these
   types of calls.

   Architecturally, speech analytics look a lot like recording.  At the
   end of the call, a transcript is sent from the core platform to the
   speech analytics platform for processing.  Meta-data is then fetched.








Rosenberg & Siciliano    Expires 9 January 2024                 [Page 6]

Internet-Draft            VCON CC Requirements                 July 2023


3.  PII and PCI Redaction

   A common requirement in contact center use cases is the redaction of
   payment card information (PCI) and personally identifiable
   information (PII) from recordings and transcripts.  This happens in
   several ways.

   For payment cards, it is common for the agent to transfer the call to
   dedicated voice response systems whose job is to collect the credit
   card numbers and process them.  This way, the agent never hears this
   information.  Furthermore, the system can be configured to pause the
   recording so that this particular segment is not recorded.  For cases
   where the agent does collect the credit card information, it is
   common for systems to have a "pause recording" button that can be
   triggered manually by the agent to ensure that this content is not
   recorded.  Another common solution is to instrument the website where
   credit card information is entered, so that when the agent places
   their mouse into this form, the recording is paused.  It would be
   useful in the VCON to indicate that this particular section of the
   recording was absent for PCI reasons.

   It is also a common request to remove PII information, such as first
   and last name, street address, email address, and phone numbers, from
   recordings and from transcripts.  In such cases, it is desirable to
   clearly indicate in the transferred recording that this has happened,
   so that downstream analytics applications function properly.  Just
   replacing a first name with "XXX" is likely to confuse a word cloud
   tool in a speech analytics application, and make it think that "XXX"
   is a common word in the transcript.  At the same time, just removing
   the PII entirely results in transcripts that are improperly formed
   language, making it harder to process by natural language
   understanding (NLU) tools.

4.  Omni Channel

   In contact center, the term "omni channel" is used to refer to the
   usage of non-voice communications with a customer.  Sometimes, this
   means an email exchange or web chat from a widget on a web page.  In
   other cases, it can involve a combination of voice with these other
   technologies.  For example, a customer might call into the contact
   center, and then the agent uses SMS to send the customer links to
   information, or collect information from the customer.  In that case,
   the overall interaction is composed of a voice segment and an SMS
   segment combined together.

   In some cases, video is used in contact center applications.  Mostly,
   this is in support of the "see what I see" case, where the customer
   uses the front camera on their mobile phone to show something to the



Rosenberg & Siciliano    Expires 9 January 2024                 [Page 7]

Internet-Draft            VCON CC Requirements                 July 2023


   contact center agent.  For example, a customer might show the agent a
   part that is broken and needs to be replaced, to help the agent
   identify which part to send.  In other use cases, traditional person-
   to-person video is used, in high touch support or sales use cases.

   Co-browsing is also used in contact center applications.  This is
   sometimes used in support situations, where a customer is having
   trouble navigating the website.  The agent can take control over the
   browsing experience and get the customer where they need to be.  This
   is different than screen sharing use cases common in meetings.

   As it relates to recording, all of these additional channels need to
   be included in the VCON.

5.  Deployment Topologies

   As one might imagine, there are a variety of deployment topologies
   for these applications, mixing and matching on-premise vs cloud
   delivery.  The core platform can be delivered on-premise, or via
   cloud.  The supporting applications can also be delivered on-premise
   or via cloud.  In the cloud delivery model, they can be co-resident
   with the core application (meaning, the vendor of the core service
   also deploys and operates the supporting application), or be
   delivered via different cloud services.

6.  Required Meta-Data

   This section enumerates the meta-data which needs to be transferred
   from the core application to recording, QM and speech analytics
   applications.  This represents the information that is transferred
   today between the core and supporting applications.

   *  Interaction Type: This could be audio, screen recording, email,
      SMS, web chat, or potentially a combination of multiple ones in
      omni-channel use cases

   *  Interaction ID: a unique identifier for the interaction.  This can
      get complicated as calls are conferenced (merging) and transferred
      (splitting).

   *  File Type: For the media - what is the encoding and/or container
      format

   *  Media meta-data: Bitrate, channels (typically each participant is
      its own channel even in conferencing use cases), the participant
      that is the source of the channel, resolution for video and screen
      share.




Rosenberg & Siciliano    Expires 9 January 2024                 [Page 8]

Internet-Draft            VCON CC Requirements                 July 2023


   *  Start time

   *  Duration

   *  Direction of call - inbound or outbound

   *  Details for each participating party, which include

      -  Participant UUID: A unique identifier for the participant.  In
         a contact center, this is particular important for the agent,
         and must be static across interactions to allow correlation
         with the actual agent configuration provisioned into the
         systems

      -  First Name: In cases where the agent information is not
         provisioned ahead of time, the recording itself can be used to
         push agent configuration from the core into the supporting
         application.  Basic identifying information is needed so that
         (for example) the manager scoring agents can know which agent
         the scoring is for.

      -  Last Name:

      -  Participant Type: Is this the agent or the customer?  Is this
         the party that initiated the communications or received it?  Is
         this the party that initiated the transfer or conference, or
         the party that received it?

      -  Participant Info: In the case of a customer, additional
         information about the customer - such as phone number, email,
         address, and so on, is often desired in archival use cases.

   *  PII and PCI Information: Indications on whether data has already
      been redacted, and if so, what type of information?

















Rosenberg & Siciliano    Expires 9 January 2024                 [Page 9]

Internet-Draft            VCON CC Requirements                 July 2023


   *  Skill: This is a core concept in contact centers, common across
      many vendors.  The skill refers to the general purpose of the
      call, which is then matched to the expertise of agents that can
      handle it.  In a simple case, a contact center might have a skill
      for sales, one for billing support, and one for technical support.
      Each of those three would have different sets of agents.  In an
      inbound call, the voice response system is used to determine the
      customer intent and thus derive the skill.  Skill information is
      needed in speech analytics applications, and is also useful for
      sorting/filtering in recording and QM use cases.  The skill name
      and skill ID are both desirable.  Note that, in transfer
      situations, a call can be transferred to a different skill.  Thus,
      the skill is a property of the segment and not the overall
      interaction.

   *  Campaign: This is another core concept in contact centers, common
      across vendors.  The campaign is a container for configuration,
      such as routing rules, voice response scripts, and so on.  A
      campaign is typically bound to a phone number or email address.
      When an interaction is received, it is immediately mapped to a
      campaign to determine how the interaction is processed.  Campaign
      name and campaign ID are required.  Like skills, it is possible
      for an interaction to be transferred between campaigns, and thus
      this is a property of the segment.

   *  Transfer Bit Flag: Was this call transferred or not.  This is
      useful for filtering purposes.

   *  Conference Big Flag: Was this call conferenced or not.

   *  Number of conferences: The number of times this call had a
      conference during the call.  The higher this number, the more
      worrisome it is.  This is useful for determining which calls to
      listen to, for quality purposes.

   *  Number of transfers: Similar to number of conferences, the more
      frequently a call has been transferred, the more problematic it
      is.

   *  Number of holds: The dreaded call hold.  The more it happens, the
      more concerning it is.

   *  Hangup Party: Which party initiated the call hangup

   *  Disposition: This is another core concept, common to most contact
      center software vendors.  At the end of an interaction, the agent
      selects a disposition code which indicates "what was the
      conclusion of this".  An example disposition might be, "Support,



Rosenberg & Siciliano    Expires 9 January 2024                [Page 10]

Internet-Draft            VCON CC Requirements                 July 2023


      Issue Resolved" or "Sales Inquiry, Followup needed".  These
      dispositions are used for reporting purposes, and also drive
      automations.  For example, if a contact center agent selects the
      followup disposition code, this might trigger an email to be sent
      to the sales department asking them to contact the customer.  The
      disposition includes the name and unique ID.

   *  Dialing List: For outbound calls, they are typically made against
      a list that is imported into the system.  The name and ID of the
      list is useful for quality management cases.

7.  Informative References

   [I-D.petrie-vcon]
              Petrie, D. and T. McCarthy-Howe, "The JSON format for vCon
              - Conversation Data Container", Work in Progress,
              Internet-Draft, draft-petrie-vcon-01, 13 March 2023,
              <https://datatracker.ietf.org/doc/html/draft-petrie-vcon-
              01>.

   [RFC3261]  Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
              A., Peterson, J., Sparks, R., Handley, M., and E.
              Schooler, "SIP: Session Initiation Protocol", RFC 3261,
              DOI 10.17487/RFC3261, June 2002,
              <https://www.rfc-editor.org/info/rfc3261>.

   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
              Jacobson, "RTP: A Transport Protocol for Real-Time
              Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550,
              July 2003, <https://www.rfc-editor.org/info/rfc3550>.

   [RFC7866]  Portman, L., Lum, H., Ed., Eckel, C., Johnston, A., and A.
              Hutton, "Session Recording Protocol", RFC 7866,
              DOI 10.17487/RFC7866, May 2016,
              <https://www.rfc-editor.org/info/rfc7866>.

Authors' Addresses

   Jonathan Rosenberg
   Five9
   Email: jdrosen@jdrosen.net


   Andrew Siciliano
   Five9
   Email: Andrew.Siciliano@five9.com





Rosenberg & Siciliano    Expires 9 January 2024                [Page 11]