Internet DRAFT - draft-worley-sipcore-happy-earballs
draft-worley-sipcore-happy-earballs
SIPCORE D. Worley
Internet-Draft Ariadne
Intended status: Standards Track April 4, 2017
Expires: October 6, 2017
Happy Earballs: Success with Dual-Stack SIP
draft-worley-sipcore-happy-earballs-00
Abstract
TBD: The Session Initiation Protocol (SIP) supports multiple
transports running both over IPv4 and IPv6 protocols. In more and
more cases, a SIP user agent (UA) is connected to network interfaces
with multiple address families. In these cases sending a message
from a dual stack client to a dual stack server may suffer from the
issues described in [RFC6555] ("Happy Eyeballs"): the UA attempts to
send the message using IPv6, but IPv6 connectivity is not working to
the server. This can cause significant delays in the process of
sending the message to the server. This negatively affects the
user's experience.
TBD: This document builds on [RFC6555] by modifying the procedures
specified in [RFC3263] and related specifications to require that a
client ensure that communication targets are accessible before
sending messages to them, to allow a client to contact targets out of
the order required by other specifications, and to require a client
to properly distribute the message load among targets over time.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on October 6, 2017.
Worley Expires October 6, 2017 [Page 1]
Internet-Draft Happy Earballs April 2017
Copyright Notice
Copyright (c) 2017 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5
3. Structure of This Document . . . . . . . . . . . . . . . . . 8
3.1. Scope of Applicability . . . . . . . . . . . . . . . . . 9
4. Baseline Procedures . . . . . . . . . . . . . . . . . . . . . 10
4.1. Target Ordering . . . . . . . . . . . . . . . . . . . . . 12
4.2. Priority Nodes . . . . . . . . . . . . . . . . . . . . . 13
4.3. Unordered Nodes . . . . . . . . . . . . . . . . . . . . . 14
4.4. Combinations of Prioritization and Unordered Nodes . . . 15
4.5. Load-Balancing Nodes . . . . . . . . . . . . . . . . . . 17
5. Primary Procedure Modifications . . . . . . . . . . . . . . . 18
5.1. Permitted to Reorder Targets . . . . . . . . . . . . . . 18
5.2. Must Preserve Traffic Distribution . . . . . . . . . . . 18
6. Additional Requirements . . . . . . . . . . . . . . . . . . . 19
6.1. Address Family Preference . . . . . . . . . . . . . . . . 19
6.2. Address Selection . . . . . . . . . . . . . . . . . . . . 20
6.3. Vias . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.4. DNS Caching . . . . . . . . . . . . . . . . . . . . . . . 21
6.5. Unused Flows . . . . . . . . . . . . . . . . . . . . . . 21
6.6. Debugging and Troubleshooting . . . . . . . . . . . . . . 22
7. Consequences of the New Requirements . . . . . . . . . . . . 22
8. Generic Algorithm . . . . . . . . . . . . . . . . . . . . . . 26
8.1. Policies and Implementor Latitude . . . . . . . . . . . . 27
8.2. Examples . . . . . . . . . . . . . . . . . . . . . . . . 28
8.2.1. Two Unordered Targets, Both Cached . . . . . . . . . 28
8.2.2. Two Unordered Targets, Both Cached, One Slow . . . . 29
8.2.3. Two Unordered Targets, One Cached . . . . . . . . . . 30
8.2.4. Two Unordered Targets, Neither Cached . . . . . . . . 31
8.2.5. Three Unordered Targets, One Cached . . . . . . . . . 31
8.2.6. Two Prioritized Targets, Both Cached . . . . . . . . 32
8.2.7. Two Prioritized Targets, the Second Cached . . . . . 33
Worley Expires October 6, 2017 [Page 2]
Internet-Draft Happy Earballs April 2017
8.2.8. Two Prioritized Targets, the First Cached . . . . . . 34
8.2.9. Two Prioritized Targets, Neither Cached . . . . . . . 34
8.2.10. Three Targets . . . . . . . . . . . . . . . . . . . . 35
9. Using a Simplified Data Structure . . . . . . . . . . . . . . 36
10. Security Considerations . . . . . . . . . . . . . . . . . . . 39
11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 39
12. History . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
12.1. Changes from draft-worley-sip-happy-earballs-01 to
draft-worley-sipcore-happy-earballs-00 . . . . . . . . . 40
12.2. Changes from draft-worley-sip-happy-earballs-00 to
draft-worley-sip-happy-earballs-01 . . . . . . . . . . . 40
12.3. Changes from draft-worley-sip-he-connection-01 to draft-
worley-sip-happy-earballs-00 . . . . . . . . . . . . . . 40
12.4. Changes from draft-worley-sip-he-connection-00 to draft-
worley-sip-he-connection-01 . . . . . . . . . . . . . . 40
12.5. Changes from draft-johansson-sip-he-connection-01 to
draft-worley-sip-he-connection-00 . . . . . . . . . . . 40
13. References . . . . . . . . . . . . . . . . . . . . . . . . . 41
13.1. Normative References . . . . . . . . . . . . . . . . . . 41
13.2. Informative References . . . . . . . . . . . . . . . . . 42
Appendix A. Implementing Load Balancing . . . . . . . . . . . . 43
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 43
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 44
1. Introduction
Earballs -- n., another word for ears.
Made famous by the animated American TV
spy comedy, "Archer".
"Ow, my earballs!" -- Cheryl Tunt, "Archer"
-- from "Urban Dictionary"
The Session Initiation Protocol (SIP) [RFC3261] and the documents
that extended it provide support for both IPv4 and IPv6. However,
this support has problems with environments that are characteristic
of the transitional migratory phase from IPv4 to IPv6 networks:
During this phase, many server and client implementations run on
dual-stack hosts. In such environments, a dual-stack host will
likely suffer greater connection delay, and by extension an inferior
user experience, than an IPv4-only host.
The difficulty stems from the reality that a device cannot predict
whether apparent IPv6 connectivity to another device is usable; both
devices may have IPv6 addresses and yet some transit network between
the two may not transport IPv6. SIP requires a device that transmits
a request to one destination address (e.g., the apparently useful
Worley Expires October 6, 2017 [Page 3]
Internet-Draft Happy Earballs April 2017
IPv6 address) to wait for a response for a substantial period
(usually 32 seconds) before transmitting the request to another
destination address (the less-preferred IPv4 address). The result is
that apparent IPv6 connectivity that is not functional can cause
substantial delays in processing SIP requests. Especially when the
requests are call setups (INVITE requests) this creates a very poor
user experience.
The overall procedure for transmitting a SIP message is as follows:
The specification of where a message is to be sent is called a
"goal"; in the common case, the goal is the host-port part of a SIP
URI. The SIP client uses specified algorithms (particularly
[RFC3263]) to make DNS lookups to determine a prioritized set of
"targets" (basically, destination IP address family, IP address,
transport protocol, and port). Under these specifications, the
client contacts the targets in order until one is contacted
successfully. In order to contact a target, the client establishes a
transport connection (if necessary), sends the message using the
transport (possibly resending the message several times), and then
(for requests) waits for a response (either provisional or final).
The process ends successfully if the client receives a response. The
process ends unsuccessfully if the client receives a permanent error
from the transport layer or if a SIP timer (Timer B or Timer F in
[RFC3261]) expires. Timeouts generally default to 32 seconds. If
the user has to wait for even one timeout, this will seriously
degrade the user experience. Thus, it is desirable to minimize the
number of times the client has timeouts when sending requests.
If the target list contains both IPv6 addresses and IPv4 addresses,
this procedure can degrade the user's experience in common
situations. Typically, this problem arises when the client has an
IPv6 interface, the server's preferred address is an IPv6 address,
but the transit networks between the client and server do not carry
IPv6. This can cause the client to attempt to send a SIP request to
the IPv6 target and then wait for a timeout before it continues with
an IPv4 target. This problem parallels a problem that was widely
seen in web browsers. The web browser problem was cured by
specifying that web browsers should use a "Happy Eyeballs" algorithm
[RFC6555] to determine the order in which to contact target
addresses.
This document builds on the concepts of [RFC6555] by amending these
procedures, so that targets derived from a goal may be contacted in a
different order than is specified by the previous specifications. By
analogy with the name "Happy Eyeballs" for similar algorithms in web
browsers, we label these algorithms "Happy Earballs" [UD].
Worley Expires October 6, 2017 [Page 4]
Internet-Draft Happy Earballs April 2017
Conceptually, the Happy Earballs algorithm is composed of a number of
interlocking components:
a. For a given message, the set of targets is organized into a
directed acyclic graph (or partial order) specifying which
targets must be contacted before other targets.
b. Targets which are unresponsive, or which respond too slowly
relative to other targets in the set, are moved to the end of the
order.
c. Despite this change, the relative proportions of traffic sent to
responsive targets remains as specified by the baseline
specifications.
d. The client caches the known response times of targets.
e. The client does not send the message to a target before knowing
the target's response time (unless this is the last target in the
set). If needed, the client sends a "probe" transmission (which
has no semantic effect) to determine the target's response time.
These components are assembled into an event-driven algorithm for
sending the message to targets and, when necessary, sending probes to
targets. Implementors have flexibility in choosing among targets
that are not prioritized, and choosing when to send probes to
targets. Implementors also have flexibility defining when one target
is considered "too slow" compared to another target. (TBD: Is that
true?) Nonetheless, all variants of this algorithm have the
properties of distributing traffic among responsive targets as
specified by the baseline specifications and not sending messages to
non-responsive targets (unless all responsive targets have failed).
This document does not address the similar problem for media (RTP and
RTCP) packets. Problems with media connectivity can be addressed
using ICE [RFC5245], ALTC [RFC6947], or probing using RTP packets.
2. Terminology
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
baseline: the prior specifications of the behavior of a client for
sending a message to a goal. The baseline specifications are
modified by this document.
Worley Expires October 6, 2017 [Page 5]
Internet-Draft Happy Earballs April 2017
cache: (verb) to store temporarily information regarding the
response time of a target so as to accelerate future message
transmission; (noun) the collection of information so stored
client: the device that must send a message
CRTT(T): for a target T, the cached round-trip response time of T
(if any is cached). There is a special value of "infinity" if T
did not respond at all. Collectively, these values are called
"CRTT() values". Contrasted with RTT(T), which is the actual
round-trip response time (at some given moment).
dual-stack: narrowly, a device with both IPv4 and IPv6 interfaces;
broadly, a device with more than one interface, including those
with interfaces that use two or more address families
flow: a group of transmissions to a target which are considered
related by a transport protocol. For connection-oriented
protocols, is the data carried by a connection. For
connectionless protocols, is all messages sent to a particular
target (5-tuple). For protocols with security associations, is
all messages sent within a particular security association.
goal: the identification of a particular server. E.g., a URI, a
TSAP, a via-para, or information provided by the context of the
message.
graph: a directed, acyclic graph whose nodes are a set of targets
and whose arcs are requirements that one target must be sent to
before another target is sent to. (Mathematically, the graph
defines a partial order on the targets.) Graph arcs are drawn
without arrowheads, and are to be read as the leftmost endpoint
must be sent to before the rightmost endpoint. A graph is always
relative to the sending of a particular message and may be
modified during the processing of the message. A graph may be
augmented with dummy nodes that do not represent targets in order
to reduce the number of arcs that must be drawn to express the
requirements. (Abstractly, dummy nodes may be considered targets
to which sends immediately fail.)
initial: a target which has no target prioritized before it
(considered relative to all targets in the target set, or some
subset or rearrangement of the target set, depending on the
context)
Limit(t): a function converting one time value into another. If
RTT(T1) > Limit(RTT(T0)), then target T1 responds "too slowly"
relative to the response time of target T0, and T1 is considered
Worley Expires October 6, 2017 [Page 6]
Internet-Draft Happy Earballs April 2017
non-responsive. Depends on two parameters, "m" and "f".
Limit(infinity) is considered to be infinity.
normal: a target which is not slow (relative to a particular goal)
NSAP: "Network Service Access Point", the identification of a
network interface, which comprises an address family and an
address
priority: the situation where the client must send the message to a
first target before it may send the message to a second target.
The prioritization of a set of targets necessarily forms a partial
order and can be represented by a directed acyclic graph.
probe: a transport operation that attempts to determine if a target
is responsive, without transmitting a message. Since a probe does
not send a message, if the transmission fails, it does not commit
the client to waiting a timeout period before sending the message
to another target.
quick: a target whose cached RTT() is less than Limit(0), and thus
is never slow for any goal
RTT(T): for a target T, the actual round-trip response time of T.
There is a special value of "infinity" if T does not respond at
all. Collectively, these values are called "RTT() values".
Contrasted with CRTT(T), which is the RTT(T) measurement recorded
in the cache.
responsive: a property of a target T0 relative to the set of targets
for a goal: the response time of the target T0 is sufficiently
short when compared with the response time of other targets;
RTT(T0) <= RTT(T1) for any T1 in the set of targets. See TBD
Section 5 for discussion.
send: (verb and noun) to attempt transmission of a (possibly
modified copy of) a message to a target. Contrasted with
"successfully send", which is when the message is received by the
server or when the client detects that the message is received by
the server. Does not include "probe" transport operations.
server: the (conceptual) device to which a message is to be sent.
May consist of multiple physical devices listening on multiple
network interfaces.
slow: a target T0 which appears to be "too slow" based on the cached
RTT() information, i.e., there is another target T1 of the goal
for which RTT(T0) > Limit(RTT(T1)) using the cached RTT() values.
Worley Expires October 6, 2017 [Page 7]
Internet-Draft Happy Earballs April 2017
This includes the case where RTT(T0) is infinity, i.e., T0 does
not respond at all. Opposite of "normal".
target: the complete specification of a transport to be used to send
a message from the client to the server. A target is commonly
conceptualized as "protocol/address/port" (which is a TSAP), but
the target also includes the TSAP that will be used as the source
of the communication, and so "5-tuple" is more accurate. In many
cases, the source TSAP is determined by the destination TSAP, so
it is not mentioned. Perforce, the transport protocol and address
family of the source and destination TSAPs are the same.
timeout: (noun) the period of time after sending a message which a
client must wait before it is permitted to send the message to
another target without receiving positive indication of the
failure of the first transmission. For sending requests, either
Timer B or Timer F [RFC3261]. Refers either to the abstract time
interval or a particular instance after a particular send
operation. (verb) the event when the timeout period has expired.
TSAP: "Transport Service Access Point", the identification of an
endpoint of a transport flow, which usually comprises a transport
protocol, an NSAP (or network address), and a port number
traffic: the messages for a particular goal that are successfully
sent to a particular target or set of targets; the number of such
messages sent over a period of time; or the fraction of such
messages relative to all messages for the goal
3. Structure of This Document
This document assumes that the context of the message provides a
"goal", which is the specification of the device or collection of
devices which are the server, and that there are existing "baseline"
specifications which translate the goal into a set of "targets", and
in what order(s) the client may send (possibly modified copies of)
the message to the targets, until one of the send operations is
successful.
Section 4 introduces a model of how the baseline specifications
operate: in every instance, they produce a partial order or directed
acyclic graph of targets specifying which targets must be sent to
before sending to which other targets. This section also describes
how "load-balanced" targets (SRV records with equal priority) are be
handled within this model.
Section 5describes the major modifications of the procedures for how
a client sends a message to a server.
Worley Expires October 6, 2017 [Page 8]
Internet-Draft Happy Earballs April 2017
This document relaxes the requirements on the client regarding the
order(s) in which the message may be sent to the targets, that is, it
permits additional orders, so that the client is less likely to have
to wait for a timeout. On the whole, when network connectivity is
imperfect, this allows clients to transmit the messages to servers
more quickly than they would using the unmodified baseline
specifications.
However, this document also places additional restrictions on the
client's sending behavior to ensure that the overall traffic
distribution among the targets converges over time to the
distribution that would have resulted from obeying the baseline
specifications.
Section 6 requires certain additional behaviors that ensure that the
use of IPv6 is not disadvantaged in mixed IPv4/IPv6 networks as well
as a number of miscellaneous requirements to optimize the behavior of
clients.
Section 7 discusses some consequences of the new requirements,
including what new orders of targets are permitted, what behaviors
minimize the time needed to successfully send a message, techniques
for probing a target (that is, determining if it is responsive
without sending the message, and thus possibly committing to waiting
for a timeout period), and suitable approaches for caching
information about targets.
These consequences, when assembled with the overarching goal of
minimizing the time to successfully send the message, produce a
generic algorithm, which is succinctly described in event-driven
terms in Section 8.
Section 8.2 applies the generic algorithm to a number of simple
cases, including how it replicates "Happy Eyeballs" behaviors in
analogous situations.
Section 9 constructs a simplified algorithm (which is a subset of the
generic algorithm) that replaces the directed acyclic graph with a
list of lists.
3.1. Scope of Applicability
This document modifies any SIP target selection processes that are
defined now or may be defined in the future, excepting those that
explicitly exempt themselves.
This document does not affect communications specified to be carried
only by a single WebSocket transport, as in those contexts there is
Worley Expires October 6, 2017 [Page 9]
Internet-Draft Happy Earballs April 2017
only one transport target (the WebSocket connection), and hence the
target selection process is trivial.
Similarly, this document does not affect sending a message when it is
specified to be carried by a particular existing connection, as when
a client sends a response to a request that it received via a
connection-oriented protocol, and that connection still exists.
A client MUST NOT consider the set of the target URIs of a "forking"
operation to be a single goal to which the processes of this document
apply. Instead, the modifications MUST be applied to each of those
URIs as separate goals. This is because the decision of whether to
send a request to a later forking target may be affected by the SIP
response to an earlier transmission ([RFC3261] section 16). A
forking proxy may, as part of its policy, apply some or all of these
procedures to a forking operation. This may be useful when sending a
request to a UAS that has registered multiple addresses for the same
AoR (as may be discerned by examining the +sip.instance media feature
tags in the registrations [RFC5626]). However, any such processing
is outside the scope of this document.
4. Baseline Procedures
The situation that this document addresses is when a SIP device is
required to send a message (which may be either a request or a
response). This document uses the term "client" for the device which
must send the message. The client is given a "goal", which is the
specification of the "server", which is the (possibly composite)
device to which the message is to be sent. (Both of these usages are
broader than the usage in [RFC3261].) The purpose of client is to
successfully send the message to the server.
(Note that in the case of a request, when the message is sent to a
target, a Via header field will be added to the message, and that the
added Via header field will be different for each target. If the
client is a UAC, it may effectively use the SIP signaling path as a
connectivity check for the media path by varying the SDP between
requests sent to different targets. This document considers all of
these versions of the message to be copies of the original message to
be sent.)
If the message is a request, the goal is usually the hostport part of
the URI in either the first Route header field or the request-line.
If the message is a response, the goal can be specified by the first
via-param in the first Via header field. If the message is to be
sent to an outbound proxy as specified by a DHCP option ([RFC3319] or
[RFC3361]), then the goal is the ordered list of addresses or domain
Worley Expires October 6, 2017 [Page 10]
Internet-Draft Happy Earballs April 2017
names provided by the DHCP option. In other situations, the goal may
be specified by other means.
Baseline specifications (e.g., [RFC3263], [RFC3319], [RFC3361],
[RFC6724], [RFC7984]) prescribe the construction of a set "targets"
which are potential transport destinations to which the message can
be sent. Which specifications apply is determined by the context of
the message. Targets are commonly conceptualized as
protocol/address/port combinations, but in general they are the pairs
of source and destination TSAPs that provide the full specification
of a transport flow.
For example, the sending of an initial REGISTER message can involve
six steps of expanding the goal into a list of targets:
Selecting one of the SIP domain names from the list provided by a
DHCP option [RFC3319] and [RFC3361].
NAPTR translation from a SIP domain name to a server domain name
[RFC3263].
Selecting a transport protocol (e.g., UDP or TCP), if there is no
transport parameter in the URI.
SRV translation of a server domain name to server interface names
[RFC2782].
A/AAAA translation of a server interface name to server addresses.
Selecting a source address to use to communicate with a server
address [RFC6724].
The process of deriving a set of targets from a goal can be
conceptualized as constructing a tree, with the root node being the
goal and the leaf nodes being the targets (whether or not an
implementation constructs such a representation). Each non-leaf node
is expanded into zero or more child nodes by the application of the
appropriate baseline specification.
For a particular node, the relevant baseline specification may
prescribe relationships between the traffic volume sent to the
subsets of targets that are descended from its children. E.g., a
standard may prescribe prioritization, such that if any target
descended from a higher-priority child is responsive, no traffic
should be sent to any target descended from a lower-priority child.
(SRV records and DHCP options can specify prioritization.)
Similarly, a standard may prescribe load balancing, such that if
there are responsive targets descended from two children, the ratio
Worley Expires October 6, 2017 [Page 11]
Internet-Draft Happy Earballs April 2017
of traffic to the two subsets targets descended from the two children
must be a particular (non-zero positive) number. (SRV records can
specify load balancing.) Alternatively, a node may place no
restrictions on the traffic to the subsets of targets descended from
its children.
As always, the construction of the tree and the traffic restrictions
incorporated into it may be modified by the local policy. In this
document, we assume that all modifications are made to the tree that
summarizes the requirements of the baseline specifications. This
makes it easier to determine the the interaction of local policy with
the procedure modifications of this document. And this assumption
does not limit the generality of what local policy may do, since the
local policy can remove any ordering restrictions from the tree, thus
permitting almost any behavior by the modified procedures.
The targets as generated by the baseline specifications MAY be
subsetted by deleting any targets that the client cannot access for
reasons such as the client does not implement the protocol, or it
does not have a network interface that supports the protocol, or it
does not have a network interface that can communicate with the
address. Removing these targets at an early stage of processing does
not affect the on-the-wire behavior of either the baseline processes
or the modified processes, since sends to such targets fail
immediately.
What constitutes failure of a send depends on the situation, and may
be a transport protocol failure, the absence of a timely 100 Trying
response, or a 503 response ([RFC3261] section 21.5.4 and [RFC3263]
section 4.3). For any particular message, either the message is
successfully sent to exactly one target or the overall sending
process fails.
In the worst-case situation, the process may require waiting for one
or more transaction timeouts (e.g., Timer B or Timer F in [RFC3261])
before successfully transmitting the message to a target. As the
timeouts are typically 32 seconds, such a wait severely impacts the
user experience.
4.1. Target Ordering
The baseline specifications assume that the client will effectively
generate an order in which to contact the targets, then the client
will sequence through the list, sending the message to each target
until one of the sends is deemed to be successful. (Each send may
include retransmissions of the message.) This is because at any
stage, the client's next action is determined only by the goal and
the fact that sending to the previous targets has failed -- the first
Worley Expires October 6, 2017 [Page 12]
Internet-Draft Happy Earballs April 2017
target in the order is the target that the client will choose first
(which depends only on the goal), the second target in the order is
the target that the client will choose if and when the send to the
first target fails (which depends only on the goal and the identity
of the first target), etc.
(In mathematical terms, the target order is a total ordering of the
targets that is compatible with the partial ordering of the targets
specified by the traffic restrictions.)
The order must be compatible with the traffic restrictions imposed by
the specifications on the targets, and the traffic restrictions are
coded into the nodes of the target-derivation tree. The following
subsections discuss how the target order reflects these constraints
for every non-leaf node type.
4.2. Priority Nodes
If a node specifies that its children are prioritized, all of the
targets descended from a higher-priority child must precede all of
the targets descended from a lower-priority child. Suppose the tree
of targets has one interior node that specifies prioritization of
three targets. This can result from these DNS records:
_sip._udp.example.com. SRV 1 1 5060 sip1.example.com
_sip._udp.example.com. SRV 2 1 5060 sip2.example.com
_sip._udp.example.com. SRV 3 1 5060 sip3.example.com
sip1.example.com. A 192.0.2.1
sip2.example.com. A 192.0.2.2
sip3.example.com. A 192.0.2.3
We show the tree with the targets from left to right from highest
priority to lowest priority:
|
-priority--
| | |
A B C
We can then represent the traffic restrictions in a (directed,
acyclic) graph. In our graphs, the arcs are to be read from left to
right, requiring sending to the left endpoint before sending to the
right endpoint. In this case, sending to A must be done before
sending to B, and sending to B before sending to C. We can add dummy
"start" and "finish" nodes, represented by "*", for convenience in
later analysis. We can imagine them to be targets to which sending
fails immediately.
Worley Expires October 6, 2017 [Page 13]
Internet-Draft Happy Earballs April 2017
*----A----B----C----*
There is only one allowed target order:
A B C
4.3. Unordered Nodes
If a node places no traffic restrictions on its children, there is no
collective relationship between the targets descended from its
children. The targets descended from different children may be
appear in any order, and can even be interleaved. We show the tree
of a simple example:
sip.example.com. A 192.0.2.1
sip.example.com. A 192.0.2.2
sip.example.com. A 192.0.2.3
|
-unordered-
| | |
A B C
We then represent the lack of traffic restrictions by a graph which
has no lines between targets. We add dummy start and finish nodes to
clarify that all three targets are part of one graph:
A
/ \
*-B-*
\ /
C
There are six allowed target orders:
A B C
A C B
B A C
B C A
C A B
C B A
Worley Expires October 6, 2017 [Page 14]
Internet-Draft Happy Earballs April 2017
4.4. Combinations of Prioritization and Unordered Nodes
A prioritized pair of hosts each with an unordered pair of targets
can be specified with these DNS records:
_sip._udp.example.com. SRV 1 1 5060 sip1.example.com
_sip._udp.example.com. SRV 2 1 5060 sip2.example.com
sip1.example.com. A 192.0.2.1
sip1.example.com. AAAA 2001:DB8::1
sip2.example.com. A 192.0.2.2
sip2.example.com. AAAA 2001:DB8::2
These records result in this tree:
|
---priority---
| |
unordered unordered
| | | |
A B C D
From this tree, we generate this graph. We can simplify the graph by
adding a dummy target which must follow both A and B and must precede
both C and D, in order to avoid writing four arcs A-C, A-D, B-C, and
B-D.
A C
/ \ / \
* * *
\ / \ /
B D
There are four allowed target orders:
A B C D
A B D C
B A C D
B A D C
Now consider the situation of a client that has two IPv4 interfaces,
192.0.2.129 and 192.0.2.193 (on two separate /26 subnets), with the
first prioritized before the second. If the client is configured to
attempt to use every interface for every destination address (rather
than being restricted by the source address selection rules
Section 6.2), every destination address generates two targets, one
Worley Expires October 6, 2017 [Page 15]
Internet-Draft Happy Earballs April 2017
with source address 192.0.2.129 and one with source address
192.0.2.193, with the first prioritized before the second.
In this situation, the following DNS records result in an unordered
pair of hosts, each with a prioritized pair of targets.
sip.example.com. A 192.0.2.1
sip.example.com. A 192.0.2.65
This situation results in this tree (where we now have to pay
attention to the source addresses in the targets):
|
-----------unordered-----------
| |
----priority---- ----priority----
| | | |
A B C D
192.0.2.129 192.0.2.193 192.0.2.129 192.0.2.193
192.0.2.1 192.0.2.1 192.0.2.65 192.0.2.65
and this graph:
A----B
/ \
* *
\ /
C----D
There are six allowed target orders:
A B C D
A C B D
C A B D
A C D B
C A D B
C D A B
When the client is allowed to select any of the available source
addresses for a send, each source address (combined with the
destination address) generates a separate target Section 6.2. Of the
targets, the one selected by the default source address selection
Worley Expires October 6, 2017 [Page 16]
Internet-Draft Happy Earballs April 2017
rules is preferred, and the remainder are unordered. If there is
only one destination address, it results in a tree with this form:
|
--priority--
| |
| -unordered-
| | | |
A B C D
with this graph:
B
/ \
*----A----*-C-*
\ /
D
There are eight allowed target orders:
A B C D
A B D C
A C B D
A C D B
A D B C
A D C B
4.5. Load-Balancing Nodes
The practical difficulty with load-balancing nodes is to ensure that
the right proportion of traffic is sent to the descendants of each
child node without having to maintain long-term records of the amount
of traffic that has been sent to each child's descendants. A simple
way to accomplish this is to generate a new priority ordering of the
children for each new message to be processed, with the orderings
randomized to provide the correct traffic distribution among the
children. A randomization algorithm that achieves the correct
traffic distribution is described in Appendix A.
With this method, load-balancing nodes are processed for each message
instance by generating a suitably randomized ordering of the
children, and then processing them in the same way as the children of
a priority node.
Worley Expires October 6, 2017 [Page 17]
Internet-Draft Happy Earballs April 2017
5. Primary Procedure Modifications
The following modifications are specified for all baseline
specifications:
5.1. Permitted to Reorder Targets
A client MAY send the message to the targets in an order that is not
permitted by the baseline specifications.
5.2. Must Preserve Traffic Distribution
To state the next requirement, we must define what it means to say
that a target is "responsive". Intuitively, a target is responsive
if its response time to a message is not "too much" longer than the
response time of any other target for the goal. Note that the
responsiveness of a target is always defined relative to the set of
targets for a particular goal; hence, a target may be responsive for
one goal at the same time that it is not responsive for another goal.
We define RTT(T) to mean the round-trip response time of a target,
the time it takes to receive confirmation of the receipt of a message
sent to the target. If the target does not respond at all, we
consider RTT(T) to be "infinity", which is larger than any number.
Note that RTT(T) is a fact about reality at some instant, not a
measure of the client's current knowledge about reality at some
instant.
We define a function "Limit" that converts one time value into
another: if RTT(T1) > Limit(RTT(T0)), then target T1 responds "too
slowly" relative to the response time of target T0, and T1 is
considered non-responsive. We define Limit(t) = m*t + f, where "m"
and "f" are parameters defined below.
The parameter "m" limits the range of response times that we will
allow among targets we consider responsive. We set m to be 2. (TBD:
Is this a good choice?)
The parameter "f" is the length of time that we consider to be
insignificant when comparing the response time of targets. We set f
to be 2*T1, where T1 is the value of that name used in the procedures
of [RFC3261]. That value is commonly the round-trip time estimate of
the relevant network, and defaults to 500ms. (TBD: Is this a good
choice?)
The result is that Limit(t) is "twice t, plus a little more to
account for the inherent delays in the network".
Worley Expires October 6, 2017 [Page 18]
Internet-Draft Happy Earballs April 2017
A target T0 is defined to be "responsive" if
its response time, RTT(T0), is less than the relevant timeout
(often Timer B or Timer F), and
for any other target T1 of the goal, RTT(T0) < Limit(RTT(T1)).
TBD: The following wording must be set so that the client can move
non-responsive targets to any place in the order (or at least, any
later place in the order) without violating this condition. It is
not clear this has been accomplished yet. Later: I think we've
accomplished this now.
We are now ready to specify the major new constraint on the client's
behavior: The client's procedures MUST, over time, distribute the
traffic for any particular goal among the responsive targets in the
same proportions as are required by the baseline specifications.
Specifically,
If the set of targets for a particular goal does not change over a
period of time,
for any two subsets of targets, both of which contain at least one
target which is responsive for the entire period of time, and
if the baseline specifications prescribe a proportion between the
traffic to the two subsets, then
the proportion between the traffic to the responsive subsets of
the two subsets MUST converge to the proportion specified by the
baseline specifications.
Note that this is a "God's-eye view" requirement; the client has no
direct way to determine whether it is satisfying the requirement
because it cannot measure RTT() for every possible target at every
moment. Nonetheless, the client's procedures must ensure that the
requirement is satisfied under all circumstances.
6. Additional Requirements
6.1. Address Family Preference
Unless overridden by user configuration or by network configuration:
If the host has a policy of preferring one address family, the client
MUST prefer it. If the host's policy is unknown or not obtainable,
the client MUST prefer IPv6 over IPv4. Thus, usually clients must
give preference to IPv6 over IPv4.
Worley Expires October 6, 2017 [Page 19]
Internet-Draft Happy Earballs April 2017
Such a preference MUST be implemented in the following way: Consider
the "initial" targets, which are the targets which the baseline
specifications do not prioritize after any other targets. The client
must additionally prioritize the initial targets which are of the
preferred address family before the other initial targets.
TBD: Is this sufficient? We don't require address family preference
to affect non-initial targets. Alternatively, if the server has a
lot of IPv6 addresses, none of which are responsive, the only way to
quickly send to an IPv4 address is to send probes to all of the
initial IPv6 addresses and one (almost-initial) IPv4 addresses. This
is recommended in the probing heuristics, but might require a lot of
probes. This parallels the advice in RFC 6555 section 5.4 "web site
operators with native IPv6 connectivity SHOULD NOT offer multiple
AAAA records".
6.2. Address Selection
Clients SHOULD provide a mechanism by which the address selection
configuration [RFC6724] can be customized for the client
independently of any other application.
Clients SHOULD implement the destination address selection mechanism
specified in [RFC6724]. Note that this mechanism provides a priority
order among the set of A/AAAA records for a single server host name,
whereas [RFC3263] assumes that such sets of A/AAAA records are
unordered.
Clients SHOULD implement rule 5.5 of section 5 of [RFC6724],
preferring to use a source address with a prefix assigned by the
selected next-hop. This requires that the IPv6 stack remembers which
next-hops advertised which prefixes.
Clients SHOULD by default use the source address selection mechanism
specified in [RFC6724], which chooses one source TSAP for any
particular destination TSAP.
Clients SHOULD also be configurable to use an alternative mechanism,
in which for any destination TSAP, targets are generated for each
source TSAP that could possibly communicate with the destination
TSAP, with the source TSAP selected by [RFC6724] prioritized over the
other source TSAPs and the other source TSAPs being unordered among
themselves.
The alternative policy is useful in situations where the source
address selection table prioritizes an interface which does not
forward SIP traffic to the destination address. (For an example,
when the source address selection table routes almost all
Worley Expires October 6, 2017 [Page 20]
Internet-Draft Happy Earballs April 2017
destinations to an organizational VPN which has restricted
connectivity.)
6.3. Vias
A client MUST provide a via-param in the first Via header field in
requests that properly routes from the server to the client,
regardless of the presence of NATs in the transportation path. This
is necessary even when the request is sent via a connection-oriented
protocol, because the connection may be terminated before the
response is sent back to the client and the server may need to
reestablish a connection. The client SHOULD provide the "rport"
parameter on the via-param [RFC3581].
Additionally, to assist tracing and diagnosis, a client SHOULD
provide the source TSAP that it used in the via-param. TBD: Is this
too strict? Is it actually useful?
6.4. DNS Caching
The information a client uses to determine a target set must be up-
to-date. In particular, the construction of a target set MUST NOT
utilize information from DNS whose TTL has expired. However, once a
target set has been constructed, it may be retained until the message
has either been sent or failed, without regard to the TTL of the
information used to construct it.
TBD: Should we allow a client to cache target set computations
somewhat longer than the TTL to minimize disruption and DNS traffic?
Phone calls typically take 3 minutes, so we could allow 5 minutes
grace and thus ensure that target sets rarely have to be recomputed
during a call.
6.5. Unused Flows
A flow that is created as a probe but not subsequently used (either
to send the message or to maintain a SIP Outbound flow) SHOULD be
terminated unless there is reason to believe that it will be used in
the near future. This includes flows that are connection-oriented
protocols as well as non-connection-oriented flows with security
associations. Minimizing the number of unused connections reduces
the load on the server and on stateful middleboxes. Also, if the
abandoned connection is IPv4, this reduces IPv4 address sharing
contention in any intermediate NATs.
Worley Expires October 6, 2017 [Page 21]
Internet-Draft Happy Earballs April 2017
6.6. Debugging and Troubleshooting
Happy Earballs is aimed at ensuring a reliable user experience
regardless of connectivity problems affecting any single transport.
However, this naturally means that applications employing these
techniques are by default less useful for diagnosing issues with a
particular address family or interface. To assist in that regard, an
implementation MAY provide a mechanism to disable its Happy Earballs
behavior via a user setting, and SHOULD provide data useful for
debugging (e.g., a log or way to review current preferences).
7. Consequences of the New Requirements
In this section we explore some of the consequences of these
requirements and describe possible approaches for designing clients
that satisfy the modified requirements and provide shorter
transmission latency.
A client may send the message to the targets in an order that is not
permitted by the baseline specifications, but it may not omit any
targets from its ordering. Thus, the client is required to send it
to all the targets before it may declare failure of the send process.
(TBD: Would it be better to 408 the message faster?) For example, if
the client has cached information which indicates that a target is
unreachable, the client may move that target to the end of the order,
but if sending to all other targets is unsuccessful, the client must
send to that target before declaring failure.
A client may cache measured RTT() values for targets and use this
information to optimize target orderings; the cached value of RTT(T)
for a target T we name CRTT(T). Because a single target may appear
in the target set for multiple goals, the client should cache RTT()
for targets (rather than judgments of responsiveness), and then when
sending to a goal use CRTT() to determine whether a target T is
responsive relative to the other targets of that particular goal.
A client may determine a target T1 to be "slow" (relative to a given
goal) if its CRTT(T1) is greater than Limit(CRTT(T2)) for some other
target T2 in the goal's target set. A target that is not slow is
labeled "normal". Note that a target being slow is determined by the
client via a combination of the information in the cache and the
state of the network at the moments that the cached information was
recorded. As it were, the client thinks a slow target is non-
responsive, but the target may or may not actually be non-responsive
at that moment, depending on whether the CRTT() values agree with
current reality, which is the RTT() values.
Worley Expires October 6, 2017 [Page 22]
Internet-Draft Happy Earballs April 2017
If there is be an upper bound on the length of time that a client
retains CRTT() values, then the client may assume that any slow
target is non-responsive, in that it may place the target after the
normal targets in the order. For this reason, the client is required
to have an upper bound on the length of time that CRTT() values
remain in the cache, after which they are either deleted or replaced
by values based on more recent observations of RTT().
The client may act on its CRTT() values in this way because in doing
so, it will not violate the traffic distribution requirement: If a
target is responsive over a long period, the eventual delete/refresh
of cached values from before that period ensures that the client will
eventually see the target as normal. (2) If a target is not
continuously responsive over a long period, the traffic distribution
requirement places no restriction on whether the client sends traffic
to it or not, and repositioning it after all normal targets does not
affect the traffic distribution among the normal targets.
Implementations have flexibility regarding the lifetime of cache
entries. The only absolute restriction is that as long as the
configuration is stable, there is a maximum lifetime that CRTT()
values are kept before being deleted or refreshed. The advice in
[RFC6555] suggests that the upper bound on the lifetime of cache
entries should on the order of 10 minutes.
The length of time that different CRTT() values are cached may differ
from each other. If the RTT() observation is from a periodic event
(e.g., registration or NAT binding refresh), the cache lifetime
should be similar to but longer than the period of the event, as the
event period is likely to be configured based on knowledge of the
interval over which the network is likely to remain stable.
When the state of a source address is changed, or the state of the
interface it is assigned to changes, or when the network it is
connected to is re-initialized, cached RTT() values for targets with
that source address should be deleted. Interfaces can determine
network re-initialization by a variety of mechanisms (e.g., [RFC4436]
and [RFC6059]).
When a client processes a message, the ordering of targets that it
sends to must be an ordering permitted by the baseline
specifications, with the exception that slow targets may be moved
after all normal targets.
Since the client's goal is to deliver the message as quickly as
possible, a client should always move slow targets to the end of the
order, after the normal targets. Note that if a probe transmission
Worley Expires October 6, 2017 [Page 23]
Internet-Draft Happy Earballs April 2017
during message processing discovers a target to be slow, the target
can be moved at that time to after all normal targets.
A client obtains RTT(T) for a target T whenever it sends to the
target. But it can also obtain RTT(T) by a probe, which is any
transmission to T which requires a response but does not involve
sending the message (and hence does not commit the client to possibly
waiting for a timeout before sending to another target). A client
may send probes to several targets simultaneously.
Probe operations include:
o establishing a transport connection to the target (without sending
the message as data on the connection)
o sending a keep-alive, as specified by the transport protocol
o sending a CR-LF-CR-LF keep-alive on a SIP Outbound flow [RFC5626]
o sending a STUN keep-alive message on a SIP Outbound flow [RFC5626]
o sending an OPTIONS request with "Max-Forwards: 0".
(Note that a probe using an OPTIONS request can be used with any
protocol. If the OPTIONS reaches the target, the target is required
to respond with either a 200 or 483 response [RFC3261], without
forwarding it to another entity, unless the target demands
authentication, which would be indicated by an immediate 401 or 407
response. Conveniently, a server can respond to such a request
statelessly, so such requests are low-overhead. (Although the SIP
Outbound keep-alive methods have even lower overhead.))
Similarly, if a client has a connection to a target T, and the
connection has been idle for long enough, the client will not have a
CRTT(T) for T, reflecting the fact that the connection may have
failed without the client's knowledge. The client can refresh
CRTT(T) by performing a suitable probe operation within the
connection.
The client should use a flow or connection that is established to a
target instead of establishing a new flow or connection to that
target for sending either a probe or a message.
If the client initiates a probe of a target T, it may be able to
decide that T is slow without waiting to determine the actual value
of RTT(T), which may take as long as the timeout period: the true
value of RTT(T) is always at least as large as the elapsed time since
the probe was sent, and the determination of slowness depends on
Worley Expires October 6, 2017 [Page 24]
Internet-Draft Happy Earballs April 2017
whether RTT(T) exceeds Limit(CRTT(Tfastest)) (where Tfastest is the
target with smallest CRTT() value of any target in the target set).
If the client establishes a connection to a target without
simultaneously sending the message, the connection establishment is a
probe of the target, and after initiating the connection but before
sending the message, if that probe reveals that the target is slow,
the client may move that target to the end of the ordering, and
utilize another target.
In order to minimize the chance that the client must wait for a
timeout before sending to another target, the client may send probes
to targets, and the RTT() values revealed by those probes can change
what target the client will send to next. Because of this, the
client's procedures do not simply convert the tree of targets into an
ordering of the targets, which the client then follows -- Information
discovered during the sequence of sends can affect the order targets
are sent to.
There is little value sending a probe to a target unless all targets
of higher priority (1) have been sent the message (and failed), (2)
have been sent a probe, or (3) have a CRTT() value.
A target T0 is "quick" if its CRTT(T0) is less than Limit(0), that
is, if T0 will be normal regardless of the CRTT() values of any other
target. If an allowed next target has a cached RTT() value, and it
is "quick", then it is never slow for any goal.
If the client has reason to believe that it will soon be asked to
send a message to a goal with a target T, and if CRTT(T) does not
exist or is likely to expire before then, the client may decide to
preemptively refresh the cached value by probing T.
Similarly, a client may be in a situation where it has advance notice
that it is likely to need to send a message to a particular target,
for instance, if the user of a UA begins dialing an outgoing call
which will be routed through a particular outgoing proxy. In such a
situation, the client should consider preemptively probing the
target.
Note that the use of probes increases the non-message traffic to the
targets, and thus has a cost. A client minimizes the expected
transmission time by initially probing all of the targets, but that
strategy maximizes the additional traffic. A client should weigh the
tradeoff between improved user experience and increased traffic. In
particular, the client should be aware of which messages require
rapid service for good user experience (e.g., INVITE and BYE) and
which do not (e.g., REGISTER and re-SUBSCRIBE).
Worley Expires October 6, 2017 [Page 25]
Internet-Draft Happy Earballs April 2017
A client should avoid sending to a target which does not have a
CRTT() value (unless it is the last remaining target), because the
target might be non-responsive forcing the client to wait for a
timeout. Instead, the client should probe the target before sending
the message to it.
8. Generic Algorithm
Based on these observations, we can construct an event-driven
algorithm that satisfies the revised requirements and avoids sending
the message to slow targets. This algorithm has a number of places
where the implementor has latitude.
The primary state of the algorithm is a graph whose nodes are the
targets which have not yet been sent to and failed.
o Each target is annotated with its CRTT() value, or the absence
thereof.
o Each target has an indicator of whether a probe has been sent to
it, and if so, when the probe was sent.
o The graph is initialized with the graph of targets as constructed
from the target construction tree.
o The initial graph is assumed for convenience to have a unique
rightmost (final) target named Fin, which may be a dummy. Targets
determined to be slow will be moved to after target Fin.
The algorithm uses some data derived from the graph.
o S, which is Limit(CRTT(Tfastest)), where Tfastest is the target
(that is or was in the graph) with the smallest CRTT(). Thus, any
target T with CRTT(T) > S is slow.
o The set of initial targets, targets which (at this time) have no
targets prioritized before them, and thus could be chosen as the
next target to send to.
The algorithm can be most easily described in an event-driven manner,
where particular situations trigger particular actions:
1. When a target T is slow, that is, CRTT(T) > S and it is before
Fin, it is removed from its position in the graph and moved to a
new position, after Fin. (Note that additional arcs may need to
be inserted where T used to be; if A is before T and T is before
B, A remains before B.) (Note that a target can be discovered
Worley Expires October 6, 2017 [Page 26]
Internet-Draft Happy Earballs April 2017
to be slow because a probe has been outstanding longer than S
without responding, even if S is less than the timeout period.)
2. When there is no outstanding send, the client may send to any
initial target whose CRTT() is known.
3. When there is no outstanding send and only one target in the
graph, the client may send to it.
4. When there is an initial dummy target, it may be removed from
the graph.
5. When a target T has no CRTT() and all targets that precede T
either have a CRTT() or have a probe outstanding, the client may
send a probe to T.
6. When a probe or a send succeeds, fails, or times out, the client
records CRTT() for that target.
7. When the startup of a protocol (e.g., TCP SYN) probes the target
without sending the message, the client should perform the
startup as a probe, and send the message as a separate action.
8. When a send to a target succeeds, sending to the goal has
succeeded (and the algorithm terminates).
9. When a send to a target fails, the target is removed from the
graph.
10. When the graph is empty (has no nodes), sending to the goal has
failed (and the algorithm terminates).
8.1. Policies and Implementor Latitude
The generic algorithm allows the implementor a considerable degree of
latitude in making the allowed choices.
o If multiple initial targets have a CRTT(), the client can choose
which to send to, e.g., the one with the smallest CRTT().
o The client chooses which targets to probe. The client should
balance the value of information to be obtained by probing targets
with the cost of doing so. The naive aggressive strategy is to
probe all initial targets that do not have a CRTT().
o When a probe is sent and a response is not received for a time
period, the client may consider it "lost" and send another probe.
Worley Expires October 6, 2017 [Page 27]
Internet-Draft Happy Earballs April 2017
This time period will likely be longer than the typical network
RTT but significantly less than the send timeout period.
o The client may choose to probe targets to exercise the maximum
diversity of network paths, covering the range of interfaces and
address families in the target set. E.g., a non-initial target
that provides diversity may be chosen to be probed (which may
require first probing all of its preceding targets).
o If all initial targets with CRTT() have "large" values, the client
may choose not to send to any of them, and instead send probes to
other targets which might respond faster. Note that in this
situation, probing non-initial targets might be useful; if such a
probe responds quickly, it might cause the initial targets to
become slow and to be moved to the end, causing the probed target
to become initial (and thus available for sending).
o The client may add prioritizations between targets which are not
required to have them due to the baseline specifications (as long
as they are consistent with the prior prioritizations, i.e., do
not specify both that A is before B and that B is before A).
Doing this judiciously may allow the algorithm to use a simpler
data structure for recording prioritization Section 9
8.2. Examples
This section shows examples of the generic algorithm (with naive
choices for the various "policies") in situations involving various
combinations of targets with particular properties.
8.2.1. Two Unordered Targets, Both Cached
Suppose there are two unordered targets, both of which have cached
RTT() values less than S.
A (cached)
/ \
* Fin
\ /
B (cached)
First, the initial dummy node is removed.
A (cached)
\
Fin
/
B (cached)
Worley Expires October 6, 2017 [Page 28]
Internet-Draft Happy Earballs April 2017
The client chooses the target with the smallest CRTT(), which we
assume is A, and sends to it.
If the send succeeds, the algorithm terminates successfully.
In the unlikely case that sending to A fails, it is deleted from the
graph. (Sending to A will likely succeed, because there is a current
cached record of successful communication.)
Fin
/
B (cached)
The client chooses B to send to.
If the send succeeds, the algorithm terminates successfully.
If sending to B fails, it is deleted from the graph.
Fin
The initial dummy node Fin is deleted from the graph. The graph
becomes empty, and the algorithm terminates unsuccessfully.
8.2.2. Two Unordered Targets, Both Cached, One Slow
Suppose there are two unordered targets, both of which have cached
RTT() values, but the value for target A is greater than S.
A (cached)
/ \
* Fin
\ /
B (cached)
First, the initial dummy node is removed.
A (cached)
\
Fin
/
B (cached)
Target A is examined and found to be slow, so it is moved after Fin.
Fin----A (slow)
/
B (cached)
Worley Expires October 6, 2017 [Page 29]
Internet-Draft Happy Earballs April 2017
The client chooses B to send to.
If the send succeeds, the algorithm terminates successfully.
If sending to B fails, it is deleted from the graph.
Fin----A (slow)
The initial dummy node is deleted from the graph.
A (slow)
The client sends to A.
8.2.3. Two Unordered Targets, One Cached
Suppose there are two unordered targets, only one of which has a
CRTT():
A (cached)
/ \
* Fin
\ /
----B-----
The initial dummy node is deleted from the graph.
A (cached)
\
Fin
/
B-----
Only target A is initial and has a CRTT(), so the client sends to it.
Since B is initial and has no CRTT(), the client sends a probe to it.
If the send to A succeeds, the algorithm terminates successfully.
If the send to A fails, it is deleted from the graph.
Fin
/
B-----
Since B is the only remaining (non-dummy) target, the client sends to
it.
Worley Expires October 6, 2017 [Page 30]
Internet-Draft Happy Earballs April 2017
8.2.4. Two Unordered Targets, Neither Cached
Suppose there are two unordered targets, neither of which have CRTT()
values:
A
/ \
* Fin
\ /
B
The initial dummy node is deleted from the graph.
A
\
Fin
/
B
The client cannot send to either target. The client sends probes to
both targets. When one of the probes returns, let us say it is the
probe to A, the graph indicates that one target is cached:
A (cached)
\
Fin
/
B-----
Target A now is initial and has a CRTT(), so the client sends to it.
In the unlikely case that sending to A fails, A is deleted from the
graph, and the client immediately sends to B because B is the only
remaining target.
This situation parallels the standard "Happy Eyeballs" situation in
HTTP, where the client has two (or more) unordered addresses for the
server, one IPv4 and one IPv6. The client requests connections with
both addresses simultaneously, and the first connection that succeeds
is used to send the HTTP request [RFC6555].
8.2.5. Three Unordered Targets, One Cached
Suppose there are three unordered targets, only one of which has a
CRTT():
Worley Expires October 6, 2017 [Page 31]
Internet-Draft Happy Earballs April 2017
A (cached)
/ \
*-----B------Fin
\ /
----C-----
The initial dummy node is deleted from the graph.
A (cached)
\
B------Fin
/
C-----
Only target A is initial and has a CRTT(), so the client sends to it.
Since B is initial and has no CRTT(), the client sends a probe to it.
Similarly for C.
If the send to A succeeds, the algorithm terminates successfully.
If the send to A fails, it is deleted from the graph.
B------Fin
/
C-----
The client waits until one of the probes (to B and C) either succeeds
or fails. Suppose the probe to B succeeds.
B (cached)------Fin
/
C--------------
Target B is now initial and has a CRTT(), so the client sends to it.
8.2.6. Two Prioritized Targets, Both Cached
Suppose there are two prioritized targets, both of which have CRTT():
*----A (cached)----B (cached)----Fin
If neither A nor B is slow, processing proceeds according to the
baseline specification: the client must send to A first, and then if
that fails, send to B. In detail:
The initial dummy node is deleted from the graph.
Worley Expires October 6, 2017 [Page 32]
Internet-Draft Happy Earballs April 2017
A (cached)----B (cached)----Fin
The client sends to A.
If sending to A succeeds, the algorithm terminates successfully. If
sending to A fails, A is removed from the graph.
B (cached)----Fin
Target B is now initial, and the client sends to B.
If sending to B succeeds, the algorithm terminates successfully. If
sending to B fails, B is removed from the graph, then Fin is removed
from the graph, and the algorithm terminates unsuccessfully.
However, as processing begins, it is possible that A is slow because
CRTT(A) > S = Limit(CRTT(B)). In that case A is moved after Fin.
B (cached)----Fin----A (slow)
Target B is now initial, and the client sends to B.
If sending to B succeeds, the algorithm terminates successfully. If
sending to B fails, B is removed from the graph, then Fin is removed
from the graph. Then A is initial, and the client sends to A.
8.2.7. Two Prioritized Targets, the Second Cached
Suppose there are two prioritized targets, but only the second has a
CRTT():
*----A----B (cached)----Fin
The initial dummy node is deleted from the graph.
A----B (cached)----Fin
The client sends a probe to A. If the probe response (the new
CRTT(A)) is less than S = Limit(CRTT(B), the the graph becomes:
A (cached)----B (cached)----Fin
The client then sends to A.
But if the probe remains outstanding longer than S, A becomes slow
and is moved to after Fin.
B (cached)----Fin----A (slow)
Worley Expires October 6, 2017 [Page 33]
Internet-Draft Happy Earballs April 2017
At that point, the client sends to B.
8.2.8. Two Prioritized Targets, the First Cached
Suppose there are two prioritized targets, but only the first has a
CRTT():
*----A (cached)----B ----Fin
The initial dummy node is deleted from the graph.
A (cached)----B----Fin
The client can send to A immediately. If CRTT(A) <= Limit(0), the
client must send to A immediately since A is quick, and cannot be
revealed to be slow by probing other targets.
But if RTT(A) is large enough that there is a reasonable chance that
RTT(B) is smaller than Limit(RTT(A)) (which would make A slow), the
client may choose to send a probe to B first. If the probe response
returns quickly enough, CRTT(A) > Limit(CRTT(B)), which makes A slow.
Then, A would be moved to after Fin, and the client would send to B.
At any point before the probe response arrives, the client can decide
to send to A. But if the probe does not return within
Limit(CRTT(B)), the client should immediately send to A, as a will
not be shown to be slow.
8.2.9. Two Prioritized Targets, Neither Cached
Suppose there are two prioritized targets, neither of which have
CRTT():
*----A----B----Fin
The initial dummy node is deleted from the graph.
A----B----Fin
The client must now send a probe to A before it can send the message
to A. But if A and B use different network paths (as they likely
will), it is probably advantageous to send probes to both A and B, in
case B is sufficiently more responsive than A to make A slow and
override the prioritization. If the client chooses to send both
probes:
If the response to the probe of A returns first, the client
immediately sends to A.
Worley Expires October 6, 2017 [Page 34]
Internet-Draft Happy Earballs April 2017
If the response to the probe of B returns, and the response to the
probe of A returns before Limit(CRTT(B)), the client immediately
sends to A.
If the response to the probe of B returns, and the response to the
probe of A does not returns before Limit(CRTT(B)), the client moves A
to after Fin, and then immediately sends to B.
8.2.10. Three Targets
A more complex case is when the client must choose between three
source addresses for a destination address. One, the address
selected by the usual rules rules, is prioritized and the other two
are unordered. Consider the example where the destination address is
192.0.2.1 and the client has interfaces with addresses 192.0.2.65,
192.0.2.129, and 192.0.2.197, with the first of those being the
interface on the same subnet as the default outbound gateway. Then
the graph is:
B
192.0.2.129/
192.0.2.1
A / \
*----192.0.2.65/----* Fin
192.0.2.1 \ C /
192.0.2.197/
192.0.2.1
Assuming that there are no CRTT() values, the client starts by
probing A. Unless it is being very conservative with probes, the
client also sends probes via B and C.
As the probe responses arrive, CRTT() values are recorded, which can
cause targets to be declared slow and relocated after Fin.
Eventually, some target will be both initial and have a CRTT(), and
the client sends to it.
If the probe response from A arrives before Limit(RTT(B)) and
Limit(RTT(C)), the graph is updated to:
B
/ \
A (cached)----* Fin
\ /
C
At that point, the client sends to A.
Worley Expires October 6, 2017 [Page 35]
Internet-Draft Happy Earballs April 2017
On the other hand, if a probe response arrives from B but one does
not arrive from A before Limit(CRTT(B)), then A becomes slow, and it
is moved to after Fin. Then the dummy target after A and before B
and C is deleted.
B (cached)
\
Fin----A (slow)
/
C---------
At this point, the client sends to B.
9. Using a Simplified Data Structure
Instead of maintaining a directed acyclic graph to control the
client's operation, the client can replace the graph with a sequence
of sets of targets based on their "rank". The rank of a target is
defined as:
The rank of an initial target is 0.
The rank of a non-initial target is 1 more than the highest rank
of any target that it is prioritized after.
In this scheme, a target is prioritized after all targets with lower
ranks. As processing progresses, all targets in the lowest non-empty
rank are initial, and all targets in higher ranks are non-initial.
This is compatible with the generic algorithm because it only adds to
the prioritization relationships, it does not delete any of them.
Thus, the behavior it causes is compatible with the Happy Earballs
requirements.
For example, consider a server with two addresses, IPv6 and IPv4,
with IPv6 prioritized via SRV records. Both addresses accept both
TCP and UDP traffic:
_sip._udp.example.com. SRV 1 1 5060 sip1.example.com
_sip._udp.example.com. SRV 2 1 5060 sip2.example.com
_sip._tcp.example.com. SRV 1 1 5060 sip1.example.com
_sip._tcp.example.com. SRV 2 1 5060 sip2.example.com
sip1.example.com. AAAA 2001:DB8::1
sip2.example.com. A 192.0.2.1
Suppose a request can be transported using either TCP or UDP, and the
situation does not specify a priority between them. The tree of
targets is then:
Worley Expires October 6, 2017 [Page 36]
Internet-Draft Happy Earballs April 2017
|
-------------unordered-----------
| |
-----priority----- -----priority-----
| | | |
TCP 2001:DB8::1 TCP 192.0.2.1 UDP 2001:DB8::1 UDP 192.0.2.1
The graph, annotating each target with its rank, is:
(0) TCP 2001:DB8::1----(1) TCP 192.0.2.1
/ \
* *
\ /
(0) UDP 2001:DB8::1----(1) UDP 192.0.2.1
Which can be turned into a list of lists as:
rank 0: TCP 2001:DB8::1, UDP 2001:DB8::1
rank 1: TCP 192.0.2.1, UDP 192.0.2.1
The rank representation is functionally equivalent to the following
graph, which is the original graph with additional lines, showing
that the rank representation constrains the client's behavior more
than the original graph does:
(0) TCP 2001:DB8::1 (1) TCP 192.0.2.1
/ \ / \
* * *
\ / \ /
(0) UDP 2001:DB8::1 (1) UDP 192.0.2.1
The rank lists can be built (without first constructing the graph) by
walking the target tree from top to bottom and left to right (highest
priority to lowest priority), with each node passing downward MRdown,
the minimum rank that any descendant target is allowed, and each node
passing upward MRup, the minimum rank allowed for any target
prioritized after that node.
o The root node's MRdown is 0.
o For an unordered node:
* Each child's MRdown is the node's MRdown.
* The node's MRup is the maximum of the node's MRdown and all of
its children's MRup's.
Worley Expires October 6, 2017 [Page 37]
Internet-Draft Happy Earballs April 2017
o For a priority node (with the children ordered by priority):
* The first child's MRdown is the node's MRdown.
* A later child's MRdown is the preceding child's MRup.
* The node's MRup is the final child's MRup, or if there are no
children, the node's MRdown.
o For a load-balancing node, the children are first prioritized
randomly ([RFC2782] and Appendix A), then processed as for a
prioritized node.
o For a target node:
* The target has rank MRdown.
* The node's MRup is MRdown + 1.
Here is the preceding example's tree, with each node annotated with
its MRdown on the left of the node label and its MRup on the right of
the node label:
|
---------(0) unordered (2)-------
| |
-(0) priority (2)- -(0) priority (2)-
| | | |
(0) TCP 2001:DB8::1 (1) | (0) UDP 2001:DB8::1 (1) |
| |
(1) TCP 192.0.2.1 (2) (1) UDP 192.0.2.1 (2)
Note that the target tree does not have to be explicitly constructed;
it can be implicitly walked by a series of recursive function calls,
with the functions passing MRdown and MRup values between themselves,
and each target being inserted into the rank list-of-lists as it is
generated.
The address family preference rule Section 6.1 can be implemented
within the rank representation by first constructing the ranks based
on the baseline specifications, and then splitting rank 0 into two
ordered sub-ranks, 0.0 and 0.1, with 0.0 containing all rank 0
targets of the preferred address family and rank 0.1 containing all
other rank 0 targets.
An example of address family preference processing is the ordinary
case of two prioritized servers each with an IPv6 and IPv4 address:
Worley Expires October 6, 2017 [Page 38]
Internet-Draft Happy Earballs April 2017
_sip._udp.example.com. SRV 1 1 5060 sip1.example.com
_sip._udp.example.com. SRV 2 1 5060 sip2.example.com
sip1.example.com. AAAA 2001:DB8::1
sip1.example.com. A 192.0.2.1
sip2.example.com. AAAA 2001:DB8::2
sip2.example.com. A 192.0.2.2
|
----------(0) priority (2)--------
| |
-(0) unordered (1)- -(1) unordered (2)-
| | | |
(0) 2001:DB8::1 (1) | (1) 2001:DB8::2 (2) |
| |
(0) 192.0.2.1 (1) (1) 192.0.2.2 (2)
After splitting rank 0 based on the address family preference, the
the ranks are:
rank 0.0: 2001:DB8::1
rank 0.1: 192.0.2.1
rank 1: 2001:DB8::2, 192.0.2.2
10. Security Considerations
This document changes the order in which a client will send to
targets but does not change the set of targets that it will send to.
There are no known SIP systems whose security depends on the order in
which a client sends to targets. Given that network connectivity is
unreliable, it is unlikely that the security of any SIP system
depends on the reliable ordering of targets.
The specific security vulnerabilities, attacks and threat models of
the various protocols mentioned in this document (SIP, DNS, SRV
records, etc.) are well-documented in their respective
specifications, and their effect on the security of SIP systems is
unchanged.
11. IANA Considerations
This document does not require any actions by IANA.
Worley Expires October 6, 2017 [Page 39]
Internet-Draft Happy Earballs April 2017
12. History
Note to RFC Editor: Upon publication, please remove this section.
12.1. Changes from draft-worley-sip-happy-earballs-01 to draft-worley-
sipcore-happy-earballs-00
Change name to draft-worley-sipcore-happy-earballs.
Minor fixes to the references.
12.2. Changes from draft-worley-sip-happy-earballs-00 to draft-worley-
sip-happy-earballs-01
Overhaul of the exposition.
12.3. Changes from draft-worley-sip-he-connection-01 to draft-worley-
sip-happy-earballs-00
Complete overhaul.
Changed "EarBalls" to "Earballs".
12.4. Changes from draft-worley-sip-he-connection-00 to draft-worley-
sip-he-connection-01
Minor changes.
Add note that WebSocket is out of scope, because there is only one
possible transport in WebSocket.
12.5. Changes from draft-johansson-sip-he-connection-01 to draft-
worley-sip-he-connection-00
This version has a different name for technical reasons. It is, in
reality, the successor to draft-johansson-sip-he-connection-01.
Move Acknowledgments after References, as that is the style the
Editor prefers.
Updated Security Considerations: This increment of the H.E. work does
not make normative changes in existing SIP.
Copy a lot of text from RFC 6555, as this I-D is parallel to RFC
6555.
Changed "hostname" to "host name", as the latter form is more common
in RFCs by a moderate margin.
Worley Expires October 6, 2017 [Page 40]
Internet-Draft Happy Earballs April 2017
Revised some of the introduction text to parallel the introduction of
RFC 7984.
Changed name of algorithm to "Happy EarBalls", added reference to
Urban Dictionary.
Many expansions of the discussion and revisions of the wording.
13. References
13.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<http://www.rfc-editor.org/info/rfc2119>.
[RFC2782] Gulbrandsen, A., Vixie, P., and L. Esibov, "A DNS RR for
specifying the location of services (DNS SRV)", RFC 2782,
DOI 10.17487/RFC2782, February 2000,
<http://www.rfc-editor.org/info/rfc2782>.
[RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
A., Peterson, J., Sparks, R., Handley, M., and E.
Schooler, "SIP: Session Initiation Protocol", RFC 3261,
DOI 10.17487/RFC3261, June 2002,
<http://www.rfc-editor.org/info/rfc3261>.
[RFC3263] Rosenberg, J. and H. Schulzrinne, "Session Initiation
Protocol (SIP): Locating SIP Servers", RFC 3263,
DOI 10.17487/RFC3263, June 2002,
<http://www.rfc-editor.org/info/rfc3263>.
[RFC3581] Rosenberg, J. and H. Schulzrinne, "An Extension to the
Session Initiation Protocol (SIP) for Symmetric Response
Routing", RFC 3581, DOI 10.17487/RFC3581, August 2003,
<http://www.rfc-editor.org/info/rfc3581>.
[RFC6555] Wing, D. and A. Yourtchenko, "Happy Eyeballs: Success with
Dual-Stack Hosts", RFC 6555, DOI 10.17487/RFC6555, April
2012, <http://www.rfc-editor.org/info/rfc6555>.
[RFC6724] Thaler, D., Ed., Draves, R., Matsumoto, A., and T. Chown,
"Default Address Selection for Internet Protocol Version 6
(IPv6)", RFC 6724, DOI 10.17487/RFC6724, September 2012,
<http://www.rfc-editor.org/info/rfc6724>.
Worley Expires October 6, 2017 [Page 41]
Internet-Draft Happy Earballs April 2017
[RFC7984] Johansson, O., Salgueiro, G., Gurbani, V., and D. Worley,
Ed., "Locating Session Initiation Protocol (SIP) Servers
in a Dual-Stack IP Network", RFC 7984,
DOI 10.17487/RFC7984, September 2016,
<http://www.rfc-editor.org/info/rfc7984>.
13.2. Informative References
[I-D.johansson-sip-he-connection]
Johansson, O., Salgueiro, G., and D. Worley, "Setting up a
SIP (Session Initiation Protocol) connection in a dual
stack network using connection oriented transports",
draft-johansson-sip-he-connection-01 (work in progress),
October 2016.
[RFC3319] Schulzrinne, H. and B. Volz, "Dynamic Host Configuration
Protocol (DHCPv6) Options for Session Initiation Protocol
(SIP) Servers", RFC 3319, DOI 10.17487/RFC3319, July 2003,
<http://www.rfc-editor.org/info/rfc3319>.
[RFC3361] Schulzrinne, H., "Dynamic Host Configuration Protocol
(DHCP-for-IPv4) Option for Session Initiation Protocol
(SIP) Servers", RFC 3361, DOI 10.17487/RFC3361, August
2002, <http://www.rfc-editor.org/info/rfc3361>.
[RFC4213] Nordmark, E. and R. Gilligan, "Basic Transition Mechanisms
for IPv6 Hosts and Routers", RFC 4213,
DOI 10.17487/RFC4213, October 2005,
<http://www.rfc-editor.org/info/rfc4213>.
[RFC4436] Aboba, B., Carlson, J., and S. Cheshire, "Detecting
Network Attachment in IPv4 (DNAv4)", RFC 4436,
DOI 10.17487/RFC4436, March 2006,
<http://www.rfc-editor.org/info/rfc4436>.
[RFC5245] Rosenberg, J., "Interactive Connectivity Establishment
(ICE): A Protocol for Network Address Translator (NAT)
Traversal for Offer/Answer Protocols", RFC 5245,
DOI 10.17487/RFC5245, April 2010,
<http://www.rfc-editor.org/info/rfc5245>.
[RFC5626] Jennings, C., Ed., Mahy, R., Ed., and F. Audet, Ed.,
"Managing Client-Initiated Connections in the Session
Initiation Protocol (SIP)", RFC 5626,
DOI 10.17487/RFC5626, October 2009,
<http://www.rfc-editor.org/info/rfc5626>.
Worley Expires October 6, 2017 [Page 42]
Internet-Draft Happy Earballs April 2017
[RFC6059] Krishnan, S. and G. Daley, "Simple Procedures for
Detecting Network Attachment in IPv6", RFC 6059,
DOI 10.17487/RFC6059, November 2010,
<http://www.rfc-editor.org/info/rfc6059>.
[RFC6947] Boucadair, M., Kaplan, H., Gilman, R., and S.
Veikkolainen, "The Session Description Protocol (SDP)
Alternate Connectivity (ALTC) Attribute", RFC 6947,
DOI 10.17487/RFC6947, May 2013,
<http://www.rfc-editor.org/info/rfc6947>.
[UD] "The Jews Who Stole Christmas", , "Urban Dictionary, entry
'Earballs'", December 2011,
<http://www.urbandictionary.com/define.php?term=Earballs>.
Appendix A. Implementing Load Balancing
Load-balancing is specified by the "weight" field of DNS SRV records.
The defining algorithm is specified in [RFC2782]. The same result
can be obtained with a simpler algorithm: For each server, calculate
a "score": If its weight is 0, its score is "infinity" (in practice,
100 suffices). If its weight is non-zero, its score is calculated by
choosing a random number between 0 and 1, taking the negative of the
logarithm of that number, and dividing the result by the weight.
(Thus, the score is always a positive number.) (The resulting score
has an exponential distribution whose parameter is the weight.)
Then, sort the servers into order of increasing scores, so that the
servers with the smallest scores are used first.
This alternative algorithm is analyzed and sample implementations are
provided in the files in the directory
sipXrouter/sipXtackLib/doc/developer/scores in the GitHub Sipfoundry
project (https://github.com/sipfoundry/sipXrouter), among other
repositories.
Acknowledgments
The authors would like to acknowledge the support and contribution of
the SIP Forum IPv6 Working Group. This document is based on a lot of
tests and discussions at SIPit events, organized by the SIP Forum.
The foundation of this document is the work done by Olle Johansson
and Gonzalo Salgueiro in earlier documents, including
[I-D.johansson-sip-he-connection]. In turn, the foundation of that
work is [RFC6555], whose authors are Dan Wing and Andrew Yourtchenko.
Scott O. Bradner suggested that the formula for determining
responsiveness should contain a constant term.
Worley Expires October 6, 2017 [Page 43]
Internet-Draft Happy Earballs April 2017
Roman Shpount described the need for configuration to override the
source address selection mechanism.
Tolga Asveren suggested requiring "rport".
Author's Address
Dale R. Worley
Ariadne Internet Services
738 Main St.
Waltham, MA 02451
US
Email: worley@ariadne.com
Worley Expires October 6, 2017 [Page 44]