SIPCORE D. Worley
Internet-Draft Ariadne
Intended status: Standards Track February 17, 2017
Expires: August 21, 2017

TBD: Happy Earballs: Success with Dual-Stack SIP
draft-worley-sip-happy-earballs-00

Abstract

TBD: The Session Initiation Protocol (SIP) supports multiple transports running both over IPv4 and IPv6 protocols. In more and more cases, a SIP user agent (UA) is connected to network interfaces with multiple address families. In these cases sending a message from a dual stack client to a dual stack server may suffer from the issues described in [RFC6555] ("Happy Eyeballs"): the UA attempts to send the message using IPv6, but IPv6 connectivity is not working to the server. This can cause significant delays in the process of sending the message to the server. This negatively affects the user's experience.

TBD: This document builds on [RFC6555] by modifying the procedures specified in [RFC3263] and related specifications to require that a client ensure that communication targets are accessible before sending messages to them, to allow a client to contact targets out of the order required by other specifications, and to require a client to properly distribute the message load among targets over time.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on August 21, 2017.

Copyright Notice

Copyright (c) 2017 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.


Table of Contents

1. Introduction

                            Earballs -- n., another word for ears.
                             Made famous by the animated American TV
                             spy comedy, "Archer".

                             "Ow, my earballs!" -- Cheryl Tunt, "Archer"

                             -- from "Urban Dictionary"

The Session Initiation Protocol (SIP) [RFC3261] and the documents that extended it provide support for both IPv4 and IPv6. However, this support has problems with environments that are characteristic of the transitional migratory phase from IPv4 to IPv6 networks. During this phase, many server and client implementations run on dual-stack hosts. In such environments, a dual-stack host will likely suffer greater connection delay, and by extension an inferior user experience, than an IPv4-only host. The difficulty stems from the reality that a device cannot predict whether apparent IPv6 connectivity to another device is usable; both devices may have IPv6 addresses and yet some transit network between the two may not transport IPv6. SIP requires a device that transmits a request to one destination address (e.g., the apparently useful IPv6 address) to wait for a response for a substantial period (usually 32 seconds) before transmitting the request to another destination address (the less-preferred IPv4 address). The result is that apparent IPv6 connectivity that is not functional can cause substantial delays in processing SIP requests. Especially when the requests are call setups (INVITE requests) this creates very poor user experience.

TBD: The need to remedy this diminished performance of dual-stack hosts led to the development of the "Happy Eyeballs" [RFC6555] algorithm, which has since been implemented in many protocols and applications.

TBD: The concepts in this document are elaborated from those developed in [RFC6555], and so some background information in RFC 6555 is not repeated here. The reader is encouraged to read the available documentation regarding implementations of RFC 6555, as well as study Open Source implementations, in order to learn from the experience accumulated since the publishing of RFC 6555 in 2012.

TBD: A SIP client uses DNS to find a server based on a SIP URI. This process is described in [RFC3263] and updated in [RFC7984]. Using this process, a list of "targets" is constructed, where each target consists of an IP address, a port number, and a protocol (e.g., TCP, UDP, TLS) by which to contact that address/port. The process proceeds by constructing a sequence of host names, possibly by looking up NAPTR and/or SRV DNS records, and then for each host name looking up DNS address records (for all address families supported by the client) to generate the list of IP addresses for targets that are derived from that host name. The addresses for each host name are ordered using the client's destination selection rules [RFC6724]. The sorted targets for all the host names are then concatenated into the sequence of targets to which the client will attempt to send the SIP message.

TBD: Previously, the client contacts the targets in order until one is contacted successfully. In order to contact a target, the client establishes a transport connection (if necessary), sends the message using the transport (possibly resending the message several times), and then (for requests) waits for a response (either provisional or final). The process ends successfully if the client receives a response. The process ends unsuccessfully if the client receives a permanent error from the transport layer or if a SIP timer (Timer B or Timer F in [RFC3261]) expires. Timeouts generally default to 32 seconds.

TBD: If the user has to wait for even one timeout, this will seriously degrade the user experience. Thus, it is desirable to minimize the number of times the client has timeouts when sending requests.

TBD: If the target list contains both IPv6 addresses and IPv4 addresses, this procedure can degrade the user's experience in common situations. Typically, this problem arises when the client has an IPv6 interface, the server's preferred address is an IPv6 address, but the transit networks between the client and server do not carry IPv6. This can cause the client to attempt to send a SIP request for 32 seconds before it times out that target and continues with an IPv4 target. This problem parallels a problem that was widely seen in web browsers that was cured by specifying that web browsers should use a "Happy Eyeballs" algorithm [RFC6555] to determine the order in which to contact target addresses.

TBD: This document specifies an amendment to these procedures, by which the subsequences of targets derived from individual host names may be contacted in a different order than is specified by the destination selection rules. As in [RFC6555], the algorithm that the client uses is not specified by this document, but this document places requirements on the algorithm that improve the user's experience without unduly burdening the Internet infrastructure. By analogy with the name "Happy Eyeballs" for similar algorithms in web browsers, we label these algorithms "Happy Earballs" [UD].

2. Terminology

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].

baseline:
the prior specifications of the behavior of a client for sending a message to a goal. The baseline specifications are modified by this document.
cache:
(verb) to store temporarily information regarding the response time of a target so as to accelerate future message transmission; (noun) the collection of information so stored
client:
the device that must send a message
flow:
a group of transmissions to a target which are considered related. For connection-oriented protocols, is the data carried by a connection. For connectionless protocols, is all messages sent to a particular target (5-tuple). For protocols with security associations, is all messages sent within a particular security association.
goal:
the identification of a particular server. May be a URI, a TSAP, or information provided by the context of the message.
initial:
a target which has no target prioritized before it (considered relative to all targets in the target set, or some subset or rearrangement of the target set, depending on the context)
Limit(t):
a function converting one time value into another. if RTT(T1) > Limit(RTT(T0)), then target T1 responds "too slowly" relative to the response time of target T0, and T1 is considered non-responsive. Depends on two parameters, "m" and "f". Limit(infinity) is considered to be infinity. TBD: What is a better name for this?
normal:
a target which is not slow (relative to a particular goal)
NSAP:
"Network Service Access Point", the identification of a network interface, which comprises an address family and an address
probe:
a transport operation that attempts to determine if a target is responsive, without transmitting a message. Since a probe does not send a message, if the transmission fails, it does not commit the client to waiting a timeout period before sending the message to another target.
quick:
a target whose cached RTT() is less than Limit(0), and thus is never slow for any goal
RTT(T):
for a target T, the round-trip response time of T. There is a special value of "infinity" if T does not respond at all. Collectively, these values are called "RTT() values".
responsive:
a property of a target relative to the set of targets for a goal: the response time of the target is sufficiently short when compared with the response time of other targets. See TBD Section 5 for the complete definition.
send:
attempted transmission of a (possibly modified copy of) a message to a target. Contrasted with "successful send", which is when the message is received by the server or when the client detects that the message is received by the server. Does not include "probe" transport operations.
server:
the (conceptual) device to which a message is to be sent. May consist of multiple physical devices.
T1:
the value of that name used in the procedures of [RFC3261], which is commonly the round-trip time estimate of the relevant network, and defaults to 500ms
slow:
a target T0 which appears to be slow due to cached RTT() information, i.e., there is another target T1 of the goal for which RTT(T0) > Limit(RTT(T1)) for the cached RTT() values. This includes the case where RTT(T0) is infinity, i.e., T0 does not respond at all. Opposite of "normal".
target:
the complete specification of a transport to be used to send a message from the client to the server. A target is commonly conceptualized as "protocol/address/port" (which is a TSAP), but the target also includes the TSAP that will be used as the source of the communication, and so "5-tuple" is more accurate. In many cases, the source TSAP is determined by the destination TSAP, so it is not mentioned. Perforce, the transport protocol and address family of the source and destination TSAPs are the same.
timeout:
(noun) the period of time after sending a message which a client must wait before it is permitted to send the message to another target without receiving positive indication of the failure of the first transmission. For sending requests, either Timer B or Timer F. (verb) the event when the timeout period has expired. [RFC3261].
TSAP:
"Transport Service Access Point", the identification of an endpoint of a transport flow, which usually comprises a transport protocol, an NSAP (or network address), and a port number
traffic:
the messages for a particular goal that are successfully sent to a particular target or set of targets; the number of such messages sent over a period of time; or the fraction of such messages relative to all messages for the goal

Note: While this document uses the term "dual-stack" based on RFC 6555 and earlier terminology, its scope includes contexts with more than two interfaces and with more than two address families.

3. Structure of This Document

This document modifies the procedures with which a client sends a message to a server. It assumes that the context of the message provides a "goal", which is the specification of the device or collection of devices which are the server, and that there are existing "baseline" specifications which translate the goal into a set of "targets", and in what order(s) the client may send (possibly modified copies of) the message to the targets, until one of the send operations is successful.

This document relaxes the requirements on the client regarding the order(s) in which the message is sent to the targets, that is, it permits additional orders, so that the client is less likely to have to wait for a timeout. On the whole, when network connectivity is imperfect, this allows clients to transmit the messages to servers more quickly than they would using the unmodified baseline specifications.

However, this document also places additional restrictions on the client's sending behavior to ensure that the overall traffic distribution among the targets converges over time to the distribution that would have resulted from obeying the baseline specifications.

Following that, this document discusses some consequences of the new requirements, including what new orders of targets are permitted, what behaviors minimize the time needed to successfully send a message, techniques for probing a target (that is, determining if it is responsive without sending the message, and thus possibly committing to waiting for a timeout period), and suitable approaches for caching information about targets.

This document also requires certain behaviors that ensure that the use of IPv6 is not disadvantaged in mixed IPv4/IPv6 networks. TBD: don't forget to write these requirements

This document also contains a number of miscellaneous requirements to optimize the behavior of clients.

3.1. Scope of Applicability

This document modifies any SIP target selection processes that are defined now or may be defined in the future, excepting those that explicitly exempt themselves.

This document does not affect communications specified to be carried only by a single WebSocket transport, as in those contexts there is only one transport target (the WebSocket connection), and hence there is no target selection process.

A client MUST NOT consider the set of the target URIs of a "forking" operation to be a single goal to which the processes of this document apply. Instead, the modifications MUST be applied to each of those URIs as separate goals. This is because the decision of whether to send a request to a later forking target may be affected by the SIP response to an earlier transmission. ([RFC3261] section 16) However, a forking proxy may, as part of its policy, apply some or all of these procedures to the entirety of a forking operation.

4. Baseline Procedures

The situation that this document addresses is when a SIP device is required to send a message (which may be either a request or a response). This document uses the term "client" for the device which must send the message. The client is given a "goal", which is the specification of the "server", which is the (possibly composite) device to which the message is to be sent. (Both of these usages are broader than the usage in [RFC3261].) The purpose of client is to successfully send the message to the server.

(Note that in the case of a request, when the message is sent to a target, a Via header field will be added to the message, and that the added Via header field will be different for each target. This document considers all of these versions of the message to be copies of the original message to be sent.)

If the message is a request, the goal is usually the hostport part of the URI in either the first Route header field or the request-line. If the message is a response, the goal is specified by the first via-param in the first Via header field. If the message is to be sent to an outbound proxy as specified by a DHCP option ([RFC3319] or [RFC3361]), then the goal is the ordered list of addresses or domain names provided by the DHCP option. In other situations, the goal may be specified by other means.

Baseline specifications (e.g., [RFC3263], [RFC3319], [RFC3361], [RFC6724], [RFC7984]) prescribe the construction of a set "targets" which are potential transport destinations to which the message can be sent. Which specifications apply is determined by the context of the message. Targets are commonly conceptualized as protocol/address/port combinations, but in general they are the pairs of source and destination TSAPs that provide the full specification of a transport flow.

For example, the sending of an initial REGISTER message can involve six steps of expanding the goal into a list of targets:

The process of deriving a set of targets from a goal can be conceptualized as constructing a tree, with the root node being the goal and the leaf nodes being the targets (whether or not an implementation constructs such a representation). Each non-leaf node is expanded into zero or more child nodes by the application of the appropriate baseline specification.

For a particular node, the relevant baseline specification may prescribe relationships between the traffic volume sent to the subsets of targets that are descended from its children. E.g., a standard may prescribe prioritization, such that if any target descended from a higher-priority child is responsive, no traffic should be sent to any target descended from a lower-priority child. (SRV records and DHCP options can specify prioritization.) Similarly, a standard may prescribe load balancing, such that if there are responsive targets descended from two children, the ratio of traffic to the two subsets targets descended from the two children must be a particular (non-zero positive) number. (SRV records can specify load balancing.) Alternatively, a node may place no restrictions on the traffic to the subsets of targets descended from its children.

As always, the construction of the tree and the traffic restrictions incorporated into it may be modified by the local policy. In this document, we assume that all modifications are made to the tree that summarizes the requirements of the baseline specifications. This makes it easier to determine the the interaction local policy with the procedure modifications of this document. And this assumption does not limit the generality of what local policy may do, since the local policy can remove any ordering restrictions from the tree, thus permitting almost any behavior by the modified procedures.

The targets as generated by the specified processes MAY be subsetted by deleting any targets that the client cannot access for reasons such as the client does not implement the protocol, or it does not have a network interface that supports the protocol, or it does not have a network interface that can communicate with the address. Removing these targets at an early stage of processing does not affect the on-the-wire behavior of either the baseline processes or the modified processes, since sends to such targets fail immediately.

What constitutes failure of a send depends on the situation, and may be a transport protocol failure, the absence of a timely 100 Trying response, or a 503 response ([RFC3261] section 21.5.4 and [RFC3263] section 4.3). For any particular message, either the overall sending process fails or the message is successfully sent to exactly one target.

In the worst-case situation, the process may require waiting for one or more transaction timeouts (e.g., Timer B or Timer F in [RFC3261]) before successfully transmitting the message to a target. As the timeouts are typically 32 seconds, such a wait severely impacts the user experience.

4.1. Target Ordering

The baseline specifications assume that the client will effectively generate an order in which to contact the targets, then the client will sequence through the list, sending the message to each target until one of the sends is deemed to be successful. (Each send may include retransmissions of the message.) This is because at any stage, the client's next action is determined only by the goal and whether sending to previous targets has failed -- the first target in the order is the target that the client will choose first (which depends only on the goal), the second target in the order is the target that the client will choose if and when the send to the first target fails (which depends only on the goal and the identity of the first target), etc.

(In mathematical terms, the target order is a total ordering of the targets that is compatible with the partial ordering of the targets specified by the traffic restrictions.)

4.1.1. Prioritization Node

The order will be compatible with the traffic restrictions imposed by the specifications on the targets. For example, if the children of a node are prioritized, all of the targets descended from a higher-priority child must precede all of the targets descended from a lower-priority child. Suppose the tree of targets has one interior node that specified prioritization of two targets. This can result from these DNS records:

    _sip._udp.example.com.    SRV    1 1 5060 sip1.example.com
    _sip._udp.example.com.    SRV    2 1 5060 sip2.example.com
    _sip._udp.example.com.    SRV    3 1 5060 sip3.example.com
    sip1.example.com.         A      192.0.2.1
    sip2.example.com.         A      192.0.2.2
    sip3.example.com.         A      192.0.2.3

We show the tree with the targets from left to right from highest priority to lowest priority:

         |
    -priority--
    |    |    |
    A    B    C

We can then represent the traffic restrictions in a graph which shows a traffic restriction that requires sending to target A before sending to target B by a line joining A on the left to B on the right. We add a fictitious "start" and "finish" nodes, represented by "*":

    *----A----B----C----*

There is only one allowed target order:

    A B C

4.1.2. Unordered Node

If the children of a node have no traffic restrictions, there is no collective relationship between the targets descended from its children, and the targets descended from different children may be appear in any order, and can even be interleaved. We show the tree of a simple example:

    sip.example.com.         A      192.0.2.1
    sip.example.com.         A      192.0.2.2
    sip.example.com.         A      192.0.2.3
         |
    -unordered-
    |    |    |
    A    B    C

We then represent the lack of traffic restrictions by a graph which has no lines between targets:

      A
     / \
    *-B-*
     \ /
      C

There are six allowed target orders:

    A B C

    A C B

    B A C

    B C A

    C A B

    C B A

An prioritized pair of hosts each with an unordered pair of targets results in this tree:

              |
        ---priority---
        |            |
    unordered    unordered
    |       |    |       |
    A       B    C       D

with this graph, which we simplify by adding a fictitious target which must follow both A and B and must precede both C and D:

      A   C
     / \ / \
    *   *   *
     \ / \ /
      B   D

There are four allowed target orders:

    A B C D

    A B D C

    B A C D

    B A D C

An unordered pair of hosts each with prioritized pair of targets results in this tree:

             |
       --unordered--
       |           |
    priority    priority
    |      |    |      |
    A      B    C      D

with this graph:

      A----B
     /      \
    *        *
     \      /
      C----D

There are six allowed target orders:

    A B C D

    A C B D

    C A B D

    A C D B

    C A D B

    C D A B

When the client is allowed to select any of the available source addresses for a send, each source address (combined with the destination address) generates a separate target. Of the targets, the one selected by the default source address selection rules is preferred, and the remainder are unordered. This results in a tree with this form:

         |
    --priority--
    |          |
    |      -unordered-
    |      |    |    |
    A      B    C    D

with this graph:

                B
               / \
    *----A----*-C-*
               \ /
                D

There are eight allowed target orders:

    A B C D

    A B D C

    A C B D

    A C D B

    A D B C

    A D C B

4.1.3. Load-Balancing Node

If the children of a node are load-balanced, the subsets of targets descended from the children must be ordered in a suitable way for each instance of sending a message, so that some messages are sent to each target. The practical difficulty is to ensure that the right proportion of traffic is sent to the descendants of each child node without having to maintain long-term records of the amount of traffic that has been sent to each child's descendants.

A simple way to do this is to generate a new randomized ordering of the children for each new message to be processed. A randomization algorithm that achieves the correct traffic distribution is described in Appendix A. For each instance, once the children of the node are ordered, they are handled as described above for the children of a prioritized node.

5. Procedure Modifications

The following modifications are specified for all baseline specifications:

5.1. Permitted to Reorder Targets

A client MAY send the message to the targets in an order that is not permitted by the baseline specifications.

5.2. Must Preserve Traffic Distribution

To state the next requirement, we must define what it means to say that a target is "responsive". Intuitively, a target is responsive if its response time to a message is not "too much" longer than the response time of any other target for the goal. The responsiveness of a target is always defined relative to the set of targets for a particular goal; hence, a target may be responsive for one goal at the same time that it is not responsive for another goal.

We define RTT(T) to mean the round-trip response time of a target, the time it takes to receive confirmation of the receipt of a message sent to the target. If the target does not respond at all, we consider RTT(T) to be "infinity", which is larger than any number. Note that RTT(T) is a fact about reality at some instant, not a measure of the client's current knowledge about reality at some instant.

We define a function "Limit" that converts one time value into another: if RTT(T1) > Limit(RTT(T0)), then target T1 responds "too slowly" relative to the response time of target T0, and T1 is considered non-responsive. We define Limit(t) = m*t + f, where "m" and "f" are parameters defined below.

The parameter "m" limits the range of response times that we will allow among targets we consider responsive. We set m to be 2. (TBD: Is this a good choice?)

The parameter "f" is the length of time that we consider to be insignificant when comparing the response time of targets. We set f to be 2*T1, where T1 is the value of that name used in the procedures of [RFC3261], which is commonly the round-trip time estimate of the relevant network, and defaults to 500ms. (TBD: Is this a good choice?)

The result is that Limit(t) is "twice t, plus a little more to account for the inherent delays in the network".

A target T0 is defined to be "responsive" if

TBD: The following wording must be set so that the client can move non-responsive targets to any place in the order (or at least, any later place in the order) without violating this condition. It is not clear this has been accomplished yet. Later: I think we've accomplished this now.

We are now ready to specify the major new constraint on the client's behavior: The client's procedures MUST, over time, distribute the traffic for any particular goal among the responsive targets the same proportions as are required by the baseline specifications. Specifically,

5.3. Address Family Preference

Unless overridden by user configuration or by network configuration: If the host has a policy of preferring one address family, the client MUST prefer it. If the host's policy is unknown or not obtainable, the client MUST prefer IPv6 over IPv4. This usually the client must give preference to IPv6 over IPv4.

This preference MUST have the following effect: Consider the "initial" targets, which are the targets which the baseline specifications do not prioritize after any other targets. The client must additionally prioritize the initial targets which are of the preferred address family before the other initial targets.

TBD: Is this sufficient? We don't require address family preference to affect non-initial targets. Alternatively, if the server has a lot of IPv6 addresses, none of which are responsive, the only way to quickly send to an IPv4 address is to send probes to all of the initial IPv6 addresses and one (formerly initial) IPv4 addresses. This is recommended in the probing heuristics, but might require a lot of probes.

5.4. Address Selection

Clients SHOULD provide a mechanism by which the address selection configuration [RFC6724] can be customized for the client independently of any other application.

Clients SHOULD implement the destination address selection mechanism specified in [RFC6724]. Note that this mechanism provides a priority order among the set of A/AAAA records for a single server host name, whereas [RFC3263] assumes that such sets of A/AAAA records are unordered.

Clients SHOULD implement rule 5.5 of section 5 of [RFC6724], preferring to use a source address with a prefix assigned by the selected next-hop. This requires that the IPv6 stack remembers which next-hops advertised which prefixes.

Clients SHOULD by default use the source address selection mechanism specified in [RFC6724], which chooses one source TSAP for any particular destination TSAP.

Clients SHOULD also be configurable to use an alternative mechanism, in which for any destination TSAP, targets are generated for each source TSAP that could possibly communicate with the destination TSAP, with the source TSAP selected by [RFC6724] prioritized over the other source TSAPs and the other source TSAPs being unordered among themselves.

The alternative policy is useful in situations where the source address selection table prioritizes an interface which does not forward SIP traffic to the destination address. (For an example, when the source address selection table routes almost all destinations to an organizational VPN which has restricted connectivity.)

5.5. Vias

A client MUST provide Vias in requests that properly route from the server to the client, regardless of the presence of NATs in the transportation path. This is necessary even when the request is sent via a connection-oriented transport, because the connection may be terminated before the response is sent back to the client and the server may need to reestablish a connection. In general, the client SHOULD provide the "rport" parameter on the via-param. [RFC3581]

Additionally, to assist tracing and diagnosis, a client SHOULD provide the source TSAP that it used in the via-param. TBD: Is this too strict? Is it useful?

5.6. DNS Caching

The information a client uses to determine a target set must be up-to-date. In particular, DNS information MUST NOT be retained longer than the TTL as it was last retrieved from DNS, and information computed from DNS information MUST NOT be retained longer than the TTL of any DNS information used to compute it. TBD: Should we allow a client to cache target set computations somewhat longer than the TTL to minimize disruption and DNS traffic? Phone calls typically take 3 minutes, so we could allow 5 minutes grace and thus ensure that target sets rarely have to be recomputed during a call.

5.7. Unused Flows

Flows that are created as probes but not subsequently used (either to send the message or to maintain a SIP Outbound flow) SHOULD be terminated, even though they could -- in some cases -- be put to reasonable use. This includes flows that are connection-oriented protocols as well as non-connection-oriented flows with security associations. Minimizing the number of unused connections reduces the load on the server and on stateful middleboxes. Also, if the abandoned connection is IPv4, this reduces IPv4 address sharing contention.

5.8. Debugging and Troubleshooting

Happy Earballs is aimed at ensuring a reliable user experience regardless of connectivity problems affecting any single transport. However, this naturally means that applications employing these techniques are by default less useful for diagnosing issues with a particular address family. To assist in that regard, an implementation MAY provide a mechanism to disable their Happy Earballs behavior via a user setting, and to provide data useful for debugging (e.g., a log or way to review current preferences).

6. Consequences of the New Requirements

In this section we explore some of the consequences of these requirements and describe possible approaches for designing clients that satisfy the modified requirements and provide shorter transmission latency.

A client may send the message to the targets in an order that is not permitted by the baseline specifications, but it may not omit any targets from its ordering. Thus, the client is required to send it to all the targets before it may declare failure of the send process. TBD: Would it be better to 408 the message faster? For example, if the client has cached information which indicates that a target is unreachable, the client may move that target to the end of the order, but if sending to all other targets is unsuccessful, the client must send to that target before declaring failure.

A client may cache measured RTT() values for targets and use this information to optimize target orderings. Because a single target may appear in the target set for multiple goals, the client should cache RTT(T) for targets (rather than judgments of responsiveness), and then when sending to a goal use that value to determine whether the target is responsive relative to that particular goal.

A client may determine a target T1 to be "slow" (relative to a given goal) if its cached RTT(T1) is greater than Limit(RTT(T2)) for the cached RTT(T2) for some other target T2 in the goal's target set. A target that is not slow is "normal". Note that a target being slow is determined by the client via a combination of the information in the cache and the state of the network at the moments that the cached information was recorded. As it were, the client thinks a slow target is non-responsive, but the target may or may not actually be non-responsive at that moment, depending on whether the cached RTT() values agree with current reality.

If there is be an upper bound on the length of time that a client retains cached RTT() values, then the client may assume that any slow target is non-responsive, in that it may place the target after the normal targets in the order. For this reason, we assume in this document that the client puts an upper bound on the length of time that RTT() values remain in the cache, after which they are either deleted or replaced by values based on more recent observations of the target's behavior.

The upper bound on the lifetime of cache entries should on the order of 10 minutes. (This parallels [RFC6555].)

The client may act on its cached RTT() values in this way because it will not violate the traffic distribution requirement: If a target is responsive over a long period, the eventual delete/refresh of cached values from before that period ensures that the client will eventually see the target as normal. (2) If a target is not continuously responsive over a long period, the traffic distribution requirement places no restriction on whether the client sends traffic to it or not, and repositioning it after all normal targets does not affect the traffic distribution among the normal targets.

The length of time that different RTT() values are cached may differ from each other. When the state of a source address is changed, or the state of the interface it is assigned to changes, or when the network it is connected to is re-initialized, cached RTT() values for targets with that source address should be deleted. Interfaces can determine network re-initialization by a variety of mechanisms (e.g., [RFC4436] and [RFC6059]).

When a client processes a message, the ordering of targets that it sends to must be an ordering permitted by the baseline specifications, with the exception that slow targets may be moved after all normal targets. Note that any randomization of target groups to implement load balancing will be reflected among the normal targets in the client's ordering.

Since the client's goal is to deliver the message as quickly as possible, a client should always move slow targets to the end of the order, after the normal targets. Note that if a probe transmission during message processing discovers a target to be slow, the target can be moved at that time to after all normal targets.

A client obtains RTT(T) for a target T whenever it sends to the target. But it can also obtain RTT(T) by a probe, which is any transmission to T which requires a response but does not involve sending the message (and hence does not commit the client to possibly waiting for a timeout before sending to another target). A client may send probes to several targets simultaneously.

Probe operations include:

(Note that a probe using an OPTIONS request can be used with any protocol. If the OPTIONS reaches the target, the target is required to respond with either a 200 or 483 response [RFC3261] without forwarding it to another entity. Conveniently, a server can respond to such a request statelessly, so such requests are low-overhead. (Although the SIP Outbound keep-alive methods have even lower overhead.))

Similarly, if a client has a connection to a target T, and the connection has been idle for long enough, the client will not have a cached RTT(T) for T, reflecting the fact that the connection may have failed without the client's knowledge. The client can refresh the cached RTT(T) by performing a probe operation within the connection.

A flow or connection that is established to a target should be preferred over establishing a new flow or connection to that target for sending either a probe or a message. TBD: However, we want to broaden this to cover all flows that are to the same actual host, but how do we define that condition? Conversely, this mustn't override prioritization.

If the client initiates a probe of a target T, it may be able to decide that T is slow without waiting to determine the actual value of RTT(T), which may take as long as the timeout period, because the true value of RTT(T) is always at least as large as the elapsed time since the probe was sent, and the determination of slowness depends on whether RTT(T) exceeds Limit(Tfastest) (where Tfastest is the target with smallest RTT() value of any target in the target set).

If the client establishes a connection to a target without simultaneously sending the message, the connection establishment is a probe of the target, and after initiating the connection but before sending the message, if the probe reveals that the target is slow, the client may move that target later in the ordering, and turn its attention to another target.

In order to minimize the chance that the client must wait for a timeout before sending to another target, the client may send probes to targets, and the RTT() values revealed by those probes can change what target the client will send to next. Because of this, the client's procedures do not simply convert the tree of targets into an ordering of the targets, which the client then follows -- Information discovered during the sequence of sends can affect the order targets are sent to.

If the client maintains maximum flexibility, instead of representing the target tree and its traffic restrictions as a single order, it represents them as the graph described above. At any time, the client can send probes to any or all of the targets (presumably, ones for which it does not have cached RTT() values). If there is no outstanding sent message (no message has been sent which has not timed out), the client can choose one target to sent to, a target for which all targets connected to it to the left in the graph have already been sent to. The design space of Happy Earballs solutions is choosing which target should next be sent to and choosing when to send probes.

There is little value sending a probe to a target unless all targets of higher priority (1) have been sent the message (and failed), (2) have been sent a probe, or (3) have a cached RTT() value.

A target T0 is "quick" if its cached RTT(T0) is less than Limit(0), that is, if T0 will be normal regardless of the RTT() values of any other target. If an allowed next target has a cached RTT() value, and it is "quick", then it is never slow for any goal.

If the client has reason to believe that it will soon be asked to send a message to a goal with a target T, and if the cached RTT(T) is likely to expire before then, it may decide to refresh the cached value by probing T.

Similarly, a client may be in a situation where it has advance notice that it is likely to need to send a message to a particular target, for instance, if the user of a UA begins dialing an outgoing call which will be routed through a particular outgoing proxy. In such a situation, the client should consider preemptively probing the target.

Note that the use of probes increases the non-message traffic to the targets, and thus has a cost. A client minimizes the expected transmission time by initially probing all of the targets, but that strategy maximizes the additional traffic. A client should weigh the tradeoff between improved user experience and increased traffic. In particular, the client should be aware of which messages require rapid service for good user experience (e.g., INVITE and BYE) and which do not (e.g., REGISTER and re-SUBSCRIBE).

A client should avoid sending to a target which does not have a cached RTT() value (unless it is the last remaining target), because the target might be non-responsive forcing the client to wait for a timeout. Instead, the client should probe the target first.

7. Examples

In this section, we show some ways that clients can handle situations involving various combinations of targets with particular properties in order to provide a good user experience. In this section, we will annotate graphs by adding to targets attributes like "(cached)" (has RTT() cached), "(slow)", etc.

7.1. Two Unordered Targets, Both Cached

Suppose there are two unordered targets, both of which have cached RTT() values:

      A (cached)
     /          \
    *            *
     \          /
      B (cached)

As is shown by the graph, the client could send to either target first, because no target has a preceding target, and both have cached RTT() values. Optimally, the client will send to the target with the smallest RTT() value, which we will assume is A.

In the unlikely case that sending to A fails, it can be deleted from the graph to show the remaining possibilities:

    *            *
     \          /
      B (cached)

From this graph, the client must choose B to send to.

7.2. Two Unordered Targets, One Cached

Suppose there are two unordered targets, only one of which has a cached RTT() value:

      A (cached)
     /          \
    *            *
     \          /
          B

As is shown by the graph, the client is allowed to send to either target first, because no target has a preceding target. If RTT(A) is small enough, the client may choose to send to A immediately. But it might be worth the client's effort to send a probe to B, and if the probe returns quickly enough, the client may choose to send to B first.

7.3. Two Unordered Targets, Neither Cached

Suppose there are two unordered targets, neither of which have cached RTT() values:

      A
     / \
    *   *
     \ /
      B

As is shown by the graph, the client could send to either target first. But the client does not know whether either target is responsive, and thus sending to either of them risks waiting for a timeout. Instead, the client should send probes to both targets. When the first probe returns, the graph is changed to indicate that one target is cached:

      A (cached)
     /          \
    *            *
     \          /
          B

After this state change, the client will send to A.

In the unlikely case that sending to A fails, A is deleted from the graph, and the client can send to B without further delay (since B is the only remaining target).

This situation parallels the standard "Happy Eyeballs" situation in HTTP, where the client has two (or more) unordered addresses for the server, one IPv4 and one IPv6. The client requests connections with both addresses simultaneously, and the first connection that succeeds is used to send the HTTP request. [RFC6555]

7.4. Two Prioritized Targets, Both Cached

Suppose there are two prioritized targets, both of which have cached RTT() values:

    *----A (cached)----B (cached)----*

As is shown by the graph, the baseline specification is that the client must send to A first, and then if that fails, sent to B. If the RTT() values make both targets normal, the client must follow that sequence.

However, it's possible that A is slow because RTT(A) > Limit(RTT(B)), in which case A can be moved after the normal targets (that is, B):

    *----B (cached)----*----A (cached) (slow)----*

At this point, there are no targets before B and B has a cached RTT(), so the client sends to B.

7.5. Two Prioritized Targets, the Second Cached

Suppose there are two prioritized targets, but only the second has a cached RTT() value:

    *----A----B (cached)----*

The client should send a probe to A. If the probe response is fast enough, the client is required to send to A. But after Limit(RTT(B)) elapses (or the relevant timeout), the client knows that A is slow and can move it to after the normal targets:

    *----B (cached)----*----A (cached) (slow)----*

At that point, the client sends to B.

7.6. Two Prioritized Targets, the First Cached

Suppose there are two prioritized targets, but only the first has a cached RTT() value:

    *----A (cached)----B ----*

If RTT(A) is <= Limit(0), the client must send to A (since it's quick), and so the client should send to A immediately. But if RTT(A) is large enough that there is a reasonable chance that RTT(B) is smaller than Limit(RTT(A)), which would make A slow, the client can send a probe to B first. If the probe response returns quickly enough, the client then knows that A is slow, can postpone it, and send to B. If the probe does not return quickly enough to make A slow, the client sends to A.

7.7. Three Targets

A more complex case could arise when the client must choose between three source addresses for a destination address. One, the address selected by the source address selection rules, is prioritized and the other two are unordered:

                B
               / \
    *----A----*   *
               \ /
                C

Combining the heuristics shown in the previous examples, we can see that the client should start by probing A, and unless it is being conservative regarding probes, it should simultaneously probe B and C. As the probe responses arrive, RTT() values are measured. If any of the targets are revealed as slow, they should be moved to the end of the order. Note that a target can be revealed as slow even if the probe has not yet returned, as the elapsed time since the probe was sent is a minimum on RTT().

As the probe responses come in, the client watches to see when A (the only target that can be sent to first) becomes verified as reachable, which changes the graph to:

                         B
                        / \
    *----A (cached)----*   *
                        \ /
                         C

At that point, the client should send to A.

If A is revealed as slow, it is moved to the end of the order, leaving B and C available to be sent to. (If A is sent to but that fails, then A is removed from the graph entirely, leaving a similar graph but without A.)

      B
     / \
    *   *----A (cached)----*
     \ /
      C

When there are two targets that might be sent to, the client uses heuristics like the ones discussed in Section 7.1 to choose between them.

8. Heuristics

Generally, clients will operate on heuristics like the following. These heuristics operate on a dynamic data structure, a directed acyclic graph, which implements the graphs discussed above.

8.1. A Simplified Method

Instead of maintaining a directed acyclic graph to control the client's operation, the client can replace the graph with a sequence of sets of targets based on their "rank". The rank of a target is defined as:

Thus, a target is only prioritized after targets with lower ranks. As processing progresses, all targets in the lowest still-non-empty rank are initial, and all targets in higher ranks are non-initial.

For example, consider a server with two addresses, IPv6 and IPv4, with IPv6 prioritized via SRV records. Both addresses accept both TCP and UDP traffic:

    _sip._udp.example.com.    SRV    1 1 5060 sip1.example.com
    _sip._udp.example.com.    SRV    2 1 5060 sip2.example.com
    _sip._tcp.example.com.    SRV    1 1 5060 sip1.example.com
    _sip._tcp.example.com.    SRV    2 1 5060 sip2.example.com
    sip1.example.com.         AAAA   2001:DB8::1
    sip2.example.com.         A      192.0.2.1

The tree of targets is:

                                    |
                   -------------unordered-----------
                   |                               |
           -----priority-----              -----priority-----
           |                |              |                |
    TCP 2001:DB8::1  TCP 192.0.2.1  UDP 2001:DB8::1  UDP 192.0.2.1

The graph, annotating each target with its rank, is:

      (0) TCP 2001:DB8::1----(1) TCP 192.0.2.1
     /                                        \
    *                                          *
     \                                        /
      (0) UDP 2001:DB8::1----(1) UDP 192.0.2.1

Which can be turned into a list of lists as:

    rank 0: TCP 2001:DB8::1, UDP 2001:DB8::1

    rank 1: TCP 192.0.2.1, UDP 192.0.2.1

The rank representation is functionally equivalent to the following graph, which is the original graph with additional lines, showing that the rank representation constrains the client's behavior more than the original graph does:

      (0) TCP 2001:DB8::1   (1) TCP 192.0.2.1
     /                   \ /                 \
    *                     *                   *
     \                   / \                 /
      (0) UDP 2001:DB8::1   (1) UDP 192.0.2.1

The rank lists can be built without first constructing the graph by walking the target tree from left to right (highest priority to lowest priority), with each node passing downward MRdown, the minimum rank any descendant target is allowed, and each node passing upward MRup, the minimum rank allowed for any target prioritized after that node.

The root node's MRdown is 0.

For an unordered node:

For a prioritized node (with the children ordered by priority):

For a load-balancing node, the children are first prioritized randomly ([RFC2782] and Appendix A), then processed as for a prioritized node.

For a target node:

Here is the preceding example's tree, with each node annotated with its MRdown on the left of the node label and its MRup on the right of the node label:

                                    |
                   ---------(0) unordered (2)-------
                   |                               |
           -(0) priority (2)-              -(0) priority (2)-
           |                |              |                |
(0) TCP 2001:DB8::1 (1)     |   (0) UDP 2001:DB8::1 (1)     |
                            |                               |
                 (1) TCP 192.0.2.1 (2)           (1) UDP 192.0.2.1 (2)

Note that the target tree does not have to be explicitly constructed; it can be implicitly walked by a series of function calls, with the functions passing MRdown and MRup values between themselves, and each target being inserted into the rank list-of-lists as it is generated.

The address family preference rule Section 5.3 can be implemented within the rank representation by first constructing the ranks based on the baseline specifications, and then splitting rank 0 into two ordered sub-ranks, 0.0 and 0.1, with 0.0 containing all rank 0 targets of the preferred address family and rank 0.1 containing all other rank 0 targets.

An example of address family preference processing is the ordinary case of two prioritized servers each with an IPv6 and IPv4 address:

    _sip._udp.example.com.    SRV    1 1 5060 sip1.example.com
    _sip._udp.example.com.    SRV    2 1 5060 sip2.example.com
    sip1.example.com.         AAAA   2001:DB8::1
    sip1.example.com.         A      192.0.2.1
    sip2.example.com.         AAAA   2001:DB8::2
    sip2.example.com.         A      192.0.2.2
                                       |
                      ----------(0) priority (2)--------
                      |                                |
             -(0) unordered (1)-              -(1) unordered (2)-
             |                 |              |                 |
    (0) 2001:DB8::1 (1)        |   (1) 2001:DB8::2 (2)          |
                               |                                |
                      (0) 192.0.2.1 (1)                (1) 192.0.2.2 (2)

After splitting rank 0 based on the address family preference, the the ranks are:

    rank 0.0: 2001:DB8::1

    rank 0.1: 192.0.2.1

    rank 1: 2001:DB8::2, 192.0.2.2

9. Security Considerations

This document changes the order in which a client will send to targets but does not change the set of targets that it will send to. There are no known SIP systems whose security depends on the order in which a client sends to targets. Given that network connectivity is unreliable, it is unlikely that the security of any SIP system depends on the ordering of targets.

The specific security vulnerabilities, attacks and threat models of the various protocols mentioned in this document (SIP, DNS, SRV records, etc.) are well-documented in their respective specifications, and their effect on the security of SIP systems is unchanged.

10. IANA Considerations

This document does not require any actions by IANA.

11. History

Note to RFC Editor: Upon publication, remove this section.

11.1. Changes from draft-worley-sip-he-connection-01 to draft-worley-sip-happy-earballs-00

Complete overhaul.

Changed "EarBalls" to "Earballs".

11.2. Changes from draft-worley-sip-he-connection-00 to draft-worley-sip-he-connection-01

Minor changes.

Add note that WebSocket is out of scope, because there is only one possible transport in WebSocket.

11.3. Changes from draft-johansson-sip-he-connection-01 to draft-worley-sip-he-connection-00

This version has a different name for technical reasons. It is, in reality, the successor to draft-johansson-sip-he-connection-01.

Move Acknowledgments after References, as that is the style the Editor prefers.

Updated Security Considerations: This increment of the H.E. work does not make normative changes in existing SIP.

Copy a lot of text from RFC 6555, as this I-D is parallel to RFC 6555.

Changed "hostname" to "host name", as the latter form is more common in RFCs by a moderate margin.

Revised some of the introduction text to parallel the introduction of RFC 7984.

Changed name of algorithm to "Happy EarBalls", added reference to Urban Dictionary.

Many expansions of the discussion and revisions of the wording.

12. References

12.1. Normative References

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997.
[RFC2782] Gulbrandsen, A., Vixie, P. and L. Esibov, "A DNS RR for specifying the location of services (DNS SRV)", RFC 2782, DOI 10.17487/RFC2782, February 2000.
[RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M. and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, DOI 10.17487/RFC3261, June 2002.
[RFC3263] Rosenberg, J. and H. Schulzrinne, "Session Initiation Protocol (SIP): Locating SIP Servers", RFC 3263, DOI 10.17487/RFC3263, June 2002.
[RFC3581] Rosenberg, J. and H. Schulzrinne, "An Extension to the Session Initiation Protocol (SIP) for Symmetric Response Routing", RFC 3581, DOI 10.17487/RFC3581, August 2003.
[RFC6555] Wing, D. and A. Yourtchenko, "Happy Eyeballs: Success with Dual-Stack Hosts", RFC 6555, DOI 10.17487/RFC6555, April 2012.
[RFC6724] Thaler, D., Draves, R., Matsumoto, A. and T. Chown, "Default Address Selection for Internet Protocol Version 6 (IPv6)", RFC 6724, DOI 10.17487/RFC6724, September 2012.
[RFC7984] Johansson, O., Salgueiro, G., Gurbani, V. and D. Worley, "Locating Session Initiation Protocol (SIP) Servers in a Dual-Stack IP Network", RFC 7984, DOI 10.17487/RFC7984, September 2016.

12.2. Informative References

[I-D.johansson-sip-he-connection] Johansson, O., Salgueiro, G. and D. Worley, "Setting up a SIP (Session Initiation Protocol) connection in a dual stack network using connection oriented transports", Internet-Draft draft-johansson-sip-he-connection-01, October 2016.
[RFC3319] Schulzrinne, H. and B. Volz, "Dynamic Host Configuration Protocol (DHCPv6) Options for Session Initiation Protocol (SIP) Servers", RFC 3319, DOI 10.17487/RFC3319, July 2003.
[RFC3361] Schulzrinne, H., "Dynamic Host Configuration Protocol (DHCP-for-IPv4) Option for Session Initiation Protocol (SIP) Servers", RFC 3361, DOI 10.17487/RFC3361, August 2002.
[RFC4213] Nordmark, E. and R. Gilligan, "Basic Transition Mechanisms for IPv6 Hosts and Routers", RFC 4213, DOI 10.17487/RFC4213, October 2005.
[RFC4436] Aboba, B., Carlson, J. and S. Cheshire, "Detecting Network Attachment in IPv4 (DNAv4)", RFC 4436, DOI 10.17487/RFC4436, March 2006.
[RFC5626] Jennings, C., Mahy, R. and F. Audet, "Managing Client-Initiated Connections in the Session Initiation Protocol (SIP)", RFC 5626, DOI 10.17487/RFC5626, October 2009.
[RFC6059] Krishnan, S. and G. Daley, "Simple Procedures for Detecting Network Attachment in IPv6", RFC 6059, DOI 10.17487/RFC6059, November 2010.
[UD] "The Jews Who Stole Christmas", , "Urban Dictionary, entry 'Earballs'", December 2011.

Appendix A. Implementing Load Balancing

Load-balancing is specified by the "weight" field of DNS SRV records. The defining algorithm is specified in [RFC2782]. The same result can be obtained with a simpler algorithm: For each server, calculate a "score": If its weight is 0, its score is "infinity" (in practice, 100 suffices). If its weight is non-zero, its score is calculated by choosing a random number between 0 and 1, taking the negative of the logarithm of that number, and dividing the result by the weight. (Thus, the score is always a positive number.) (The resulting score has an exponential distribution whose parameter is the weight.) Then, sort the servers into order of increasing scores, so that the servers with the smallest scores are used first.

This alternative algorithm is analyzed and sample implementations are provided in the files in the directory sipXrouter/sipXtackLib/doc/developer/scores in the GitHub Sipfoundry project (https://github.com/sipfoundry/sipXrouter), among other repositories.

Acknowledgments

TBD:

The authors would like to acknowledge the support and contribution of the SIP Forum IPv6 Working Group. This document is based on a lot of tests and discussions at SIPit events, organized by the SIP Forum.

The foundation of this document is the work done by Olle Johansson and Gonzalo Salgueiro in earlier documents, including [I-D.johansson-sip-he-connection]. In turn, the foundation of that work is [RFC6555], whose authors are Dan Wing and Andrew Yourtchenko.

Scott O. Bradner suggested that the formula for determining responsiveness should contain a constant term.

Roman Shpount described the need for configuration to override the source address selection mechanism.

Tolga Asveren suggested requiring "rport".

Author's Address

Dale R. Worley Ariadne Internet Services 738 Main St. Waltham, MA 02451 US EMail: worley@ariadne.com