Internet DRAFT - draft-weinstein-bbn-pim-dm-large-lan
draft-weinstein-bbn-pim-dm-large-lan
Independent Submission J. Weinstein
Internet-Draft J. Keller
Intended status: Experimental BBN Technologies
Expires: December 6, 2013 June 4, 2013
PIM-DM Optimizations for Large LANs
draft-weinstein-bbn-pim-dm-large-lan-00
Abstract
Optimizations to the PIM-DM protocol [RFC3973] are proposed that
dramatically reduce the PIM-DM routing overhead on local area
networks containing large numbers of PIM routers. These
optimizations reduce the likelihood of two or more routers emitting
redundant JOIN or PRUNE messages onto the same LAN. While these
optimizations are of general applicability to large LANs, they were
motivated in particular by performance requirements for supporting
PIM-DM across LANs emulated by IP tunneling over large wireless
MANETs.
PIM-DM routers supporting these optimizations will interoperate with
standard PIM-DM as defined in RFC 3973, even on the same LAN.
However, to obtain maximum benefit from these optimizations, all
PIM-DM routers on the LAN SHOULD support them.
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. This document may not be modified,
and derivative works of it may not be created, and it may not be
published except as an Internet-Draft.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on December 6, 2013.
Copyright Notice
Copyright (c) 2013 IETF Trust and the persons identified as the
Weinstein & Keller Expires December 6, 2013 [Page 1]
Internet-Draft PIM-DM Optimizations for Large LANs June 2013
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Conventions used in this document . . . . . . . . . . . . . . 5
3. Optimized Redundant JOIN suppression . . . . . . . . . . . . . 5
3.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2. Modifications to PIM-DM Timers [RFC 3973 Sec. 4.8] . . . . 7
3.3. Other modifications . . . . . . . . . . . . . . . . . . . 8
4. Redundant PRUNE suppression . . . . . . . . . . . . . . . . . 8
4.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . 8
4.2. Modifications to the state-machine-specific timers
[RFC3973 Sec. 4.4.1] . . . . . . . . . . . . . . . . . . . 10
4.3. Modifications to the tabular description of the state
machine [RFC3973 Sec. 4.4.1] . . . . . . . . . . . . . . . 11
4.4. Modifications to Transitions from the Forwarding (F)
State [RFC3973 Sec. 4.4.1] . . . . . . . . . . . . . . . . 12
4.5. Modifications to Transitions from the Pruned (P) State
[RFC3973 Sec. 4.4.1] . . . . . . . . . . . . . . . . . . . 13
4.6. Modifications to PIM-DM Timers [RFC 3973 Sec. 4.8] . . . . 14
4.7. Other modifications . . . . . . . . . . . . . . . . . . . 15
5. Compatibility with RFC 3973 . . . . . . . . . . . . . . . . . 15
6. Security Considerations . . . . . . . . . . . . . . . . . . . 16
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16
8. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 16
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17
9.1. Normative References . . . . . . . . . . . . . . . . . . . 17
9.2. Informative References . . . . . . . . . . . . . . . . . . 17
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18
Weinstein & Keller Expires December 6, 2013 [Page 2]
Internet-Draft PIM-DM Optimizations for Large LANs June 2013
1. Introduction
PIM-DM [RFC3973] is an IP multicast routing protocol optimized for
situations where group members (receivers) are fairly densely
distributed throughout the multicast distribution domain and there
are not too many sources per group. It is based upon the reversed-
shortest-path broadcast paradigm, in which multicast traffic from
each source is distributed along a tree rooted at that source and
consisting of the shortest paths from each potential destination in
the multicast distribution domain back to that source. It optimizes
this reversed-shortest-path broadcast tree by pruning unneeded
branches that do not lead to any actual group members. Pruning of
unneeded branches is accomplished through use of JOIN/PRUNE messages,
sent by downstream routers towards the source in a direction opposite
to that of data flow.
When large numbers of PIM-DM routers are attached to a single LAN,
PIM-DM as described in RFC 3973 becomes vulnerable to storms of
redundant JOIN/PRUNE messages on the LAN. These storms can arise
when more than one of the routers on the LAN lack downstream members
for a particular (S,G) pair, and each sends a separate PRUNE message
to the upstream router towards the source S. Each of these PRUNE
messages may, in turn, trigger an overriding JOIN from one or more
routers that do have downstream members for the (S,G) pair. Since IP
multicast forwarding decisions are made on a per-interface basis, not
a per-downstream-router basis, all except one of these PRUNE messages
is unnecessary. Likewise, all except one of the overriding JOIN
messages, if any, are redundant.
These excess JOIN/PRUNE messages are particularly problematic because
they tend to be emitted in a synchronous burst by all downstream
routers on the LAN when the first user data packet or STATE REFRESH
message arrives, potentially causing a temporary overload on the LAN.
Furthermore, these JOIN/PRUNE storms will repeat periodically if
STATE REFRESH messages are lost or otherwise unavailable. Since a
separate set of JOIN/PRUNE messages is used for each (source, group)
pair, the load from these redundant JOIN/PRUNE messages scales
approximately as the product [number_of_groups] * [number_of_sources
] * [number_of_routers_per_LAN] * [rate_of_prune_cycles]. As such,
this load becomes the key constraint limiting the number of sources
and groups that can be supported within a single PIM-DM multicast
domain.
The PIM-DM specification [RFC3973] includes a mechanism for avoiding
redundant overriding JOINs, but it is sub-optimal when the number of
PIM routers on a single LAN is large or propagation delays on the LAN
are non-negligible. This mechanism is built into the upstream state
machine [RFC3973 Sec. 4.4.1] through the Override Timer. When a
Weinstein & Keller Expires December 6, 2013 [Page 3]
Internet-Draft PIM-DM Optimizations for Large LANs June 2013
router in the Forwarding state or AckPending state overhears an
upstream PRUNE that must be overridden, it sets this timer to a
random value between 0 and Override_Interval. It then sends an
overriding JOIN to the upstream router if and only if this timer
expires before it overhears another overriding JOIN from some other
router on the same interface to the same upstream router for the same
(S,G) pair. We shall refer to this mechanism as redundant JOIN
suppression.
As will be explained in Section 3 of this document, the effectiveness
of redundant JOIN suppression is extremely sensitive to the algorithm
used to determine the setting for the Override Timer. RFC 3973
specifies that this timer is to be set to a random value between 0
and Override_Interval. It does not specify the probability
distribution to use, but most implementations probably use a linear
distribution. If the number of routers on the LAN is sufficiently
large compared to the inverse propagation delay, large numbers of
redundant messages may still be sent. The likelihood of this
occurring can be greatly reduced by employing a deterministic
algorithm for setting the Override_Timer that approximates an
exponential probability distribution. Section 3 of this document
specifies this algorithm.
PIM-DM as specified in RFC 3973 does not, however, include any
mechanism for avoiding redundant PRUNEs. Section 4 of this document
describes a proposed mechanism, which is very similar to the strategy
used for avoiding redundant JOINs. Instead of sending a PRUNE
message upstream as soon as a router receives an incoming data packet
from a LAN, the router may instead defer sending this PRUNE for a
short period of time. The router then sends this PRUNE if, and only
if, it does not overhear a PRUNE message from another router on the
same LAN to the same upstream router for the same (S,G) pair before
this timer expires. By using a deterministic algorithm for computing
this delay, the delay before the first such router emits a PRUNE can
be made effectively zero for the only case where the delay length
impacts overall protocol performance -- those situation where there
are no group members downstream of any router on the LAN. We shall
refer to this proposed mechanism as redundant PRUNE suppression.
Although the mechanisms described in this document should be
beneficial in any situation where there are a large number of PIM-DM
routers on a single LAN, these optimizations were motivated by the
need to support PIM-DM across LANs emulated by multicast-capable IP
tunneling over wireless mobile ad-hoc networks (MANETs)
[Weinstein_In_Progress_a]. LANs emulated in this way are motivated
by the desire to hide most mobility-related wireless network topology
changes from IP by handling such changes through a custom MANET
routing protocol at a lower layer. As the effectiveness of mobility
Weinstein & Keller Expires December 6, 2013 [Page 4]
Internet-Draft PIM-DM Optimizations for Large LANs June 2013
hiding is dependent upon including all mobile routers in the theater
of operation within a single emulated LAN, the number of PIM-DM
routers on such a LAN can become quite large. At the same time, the
capacity of such networks can be very limited, so that minimizing the
overhead from PIM-DM can be crucially important. Finally, the
propagation delays for traffic across such a network may be
significantly larger than those typical of high-capacity wired LANs,
reducing the efficacy of the redundant JOIN suppression mechanisms
specified by RFC 3973. A separate document will deal with
adaptations to PIM-DM necessitated by the multi-layer routing
structure itself [Weinstein_In_Progress_b].
2. Conventions used in this document
This document describes optimizations to PIM-DM with reference to its
current specification in RFC 3973. The terminology of that reference
is adopted in its entirety. Concepts and terms not defined herein
should be understood as having the meaning specified in RFC 3973.
Algorithms and protocol elements not discussed herein should be
understood to be identical to those specified in [RFC3973].
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC2119.
In this document, these words will appear with that interpretation
only when in ALL CAPS. Lower case uses of these words are not to be
interpreted as carrying RFC2119 significance.
3. Optimized Redundant JOIN suppression
3.1. Motivation
If a PIM-DM router with a non-null outgoing interface list (e.g.,
possible downstream members) for some (S,G) pair overhears a PRUNE on
its incoming interface from some other router for the same (S,G)
pair, then it must cancel that PRUNE by sending an overriding (S,G)
JOIN to the upstream router U towards S. If there is more than one
PIM-DM router with a non-null outgoing interface list for (S,G) with
its incoming interface on the same network, more than one such router
may send an overriding (S,G) JOIN to the upstream router U. All
except the first of these is unnecessary, as the receipt of even one
overriding JOIN is sufficient to prevent the upstream router from
pruning its outgoing interface onto the LAN.
The PIM-DM specification includes a mechanism for avoiding redundant
Weinstein & Keller Expires December 6, 2013 [Page 5]
Internet-Draft PIM-DM Optimizations for Large LANs June 2013
overriding JOINs, but it is sub-optimal when the number of PIM
routers on a single LAN is large and propagation delays on the LAN
are non-trivial. This mechanism is built into the upstream state
machine [RFC3973 Sec. 4.4.1] through the Override Timer. When a
router in the Forwarding state or AckPending state overhears an
upstream PRUNE that must be overridden, it sets this timer to a
random value between 0 and Override_Interval. It then sends an
overriding JOIN to the upstream router if and only if this timer
expires before it overhears another overriding JOIN from some other
router on the same interface to the same upstream router for the same
(S,G) pair. RFC 3973 does not specify the probability distribution
to use when setting the Override Timer for redundant JOIN
suppression, although most implementations probably use a linear
distribution.
If the number of routers on the LAN is sufficiently large compared to
the inverse propagation delay, redundant messages may still be sent.
The likelihood of this occurring can be greatly reduced by employing
a deterministic algorithm for setting the Override_Timer that
approximates an exponential distribution.
It is not difficult to estimate the expected number of redundant JOIN
messages for any given algorithm from the propagation delay, the
number of routers N on the LAN, and the number of routers on the LAN
with downstream group members. The worst case occurs when only one
router lacks downstream members for some source and group, and hence
sends a PRUNE message upstream. In this case, N-2 other routers will
overhear this PRUNE and schedule possible origination of an
overriding JOIN message. As PIM-DM is optimized for use in
situations where group members are densely distributed throughout the
multicast distribution domain, this worst-case scenario and/or near-
worst-case scenarios will be common.
Using a linear probability distribution for the Override Timer as
implied by RFC 3973, the expected delay before the first overriding
JOIN is sent by some router will be given by Override_Interval/(N-2).
However, it will typically take Propagation_Delay seconds for other
routers to overhear this overriding JOIN. In the meantime, (N-3) *
Propagation_Delay / Override_Interval other routers can be expected
to emit overriding JOINs of their own. If the Propagation Delay is
non-trivial, the number of overriding JOINs sent will increase
linearly with the number N of routers on the LAN, and can be quite
large if the number of routers N on the LAN is also large.
The number of redundant JOINs can be dramatically reduced by
employing a deterministic algorithm for picking the delay, which
mimics an exponential probability distribution. We begin by
considering an arbitrary index function J that assigns to each router
Weinstein & Keller Expires December 6, 2013 [Page 6]
Internet-Draft PIM-DM Optimizations for Large LANs June 2013
on the LAN (except the upstream router towards the source S) a unique
integer in the closed interval [0,N-2], where N is the total number
of PIM routers on the LAN (including the current router). Such a
function can be constructed, for example, by having each router count
the number of other routers (other than the upstream router) with
higher-addressed interfaces on the LAN, and using that count as the
value of J. Then, when the Override Timer is to be set, consider
setting it to the value
Override_timer = Override_interval*ln(1+J)/ln(N-1)
With this choice, the number of other routers that will also send a
JOIN during the interval of length propagation_delay before they
overhear this JOIN will be given by exp((ln(N)/
Override_Interval)*propagation_delay)) - 1 = N ** (propagation_delay/
override_interval). For propagation_delay < override_interval, as it
must be for redundant JOIN suppression to work, this grows less
rapidly than N and hence results in a much smaller number of
redundant JOIN messages when N is large. This formula also
guarantees that the Override Timer will never be set to a value
greater than Override_interval, thus ensuring that an overriding JOIN
is received by the upstream router before the upstream router prunes
its outgoing interface.
3.2. Modifications to PIM-DM Timers [RFC 3973 Sec. 4.8]
In order to implement the optimizations to redundant JOIN suppression
describe above, the subsection of RFC 3973 Sec. 4.8 entitled "Timer
Name: Upstream Override Timer (OT(S,G))" is modified to read as
follows:
Timer Name: Upstream Override Timer (OT(S,G))
+------------+----------------+----------------------------------------+
| Value Name | Value | Explanation |
+------------+----------------+----------------------------------------|
| t_override | OI(I)* | Randomized delay to prevent response |
| | ln(1+J)/ | implosion when sending a join message |
| | ln(N-1) | to override someone else's prune |
+------------+----------------+----------------------------------------+
t_override is a value between 0 and the incoming interface's
Override_Interval (OI(I)) computed from the formula t_override =
OI(I)*ln(1+J)/ln(N-1), where N is the total number of PIM routers
on incoming interface I (including this router), and J is a count
of the number of PIM routers on incoming interface I (other than
the upstream router towards source S) with addresses greater than
the address of this router. If all routers on a LAN are using the
LAN Prune Delay option, the Override_Interval (OI(I)) MUST be set
Weinstein & Keller Expires December 6, 2013 [Page 7]
Internet-Draft PIM-DM Optimizations for Large LANs June 2013
to the largest value on the LAN. Otherwise, the Override_Interval
(OI(I)) MUST be set to 2.5 seconds.
Note that, if N <= 2, no router will ever need to send a join
message to override someone else's prune, and so neither an
attempt to take the logarithm of a negative number nor an attempt
to divide by zero can ever occur in this computation.
3.3. Other modifications
No other sections of RFC 3973 need to be modified to support
optimized redundant JOIN suppression.
4. Redundant PRUNE suppression
4.1. Motivation
When a PIM-DM router with a null outgoing interface list for a source
S and group G receives a data packet for that source and group from
the upstream interface towards that source, and its Prune Limit timer
is not running, then it sends a PRUNE for that source and group to
the upstream router towards that source [RFC3973 Sec. 4.4.1].
Similarly, if a PIM-DM router receives a STATE REFRESH message for a
source S and group G from the upstream router towards that source,
the Prune Indicator bit in that STATE REFRESH message is clear, the
upstream interface is in the Pruned state, and its Prune Limit timer
is not running, then the router sends a PRUNE for that source and
group to the upstream router towards that source [RFC3973 Sec.
4.4.1].
Unfortunately, this behavior can lead to multiple redundant PRUNE
messages being sent if there is more than one router on a LAN with a
null outgoing interface list for the (S,G) pair. Each of these
routers will send its own PRUNE to the upstream router towards S. All
except the first of these, is unnecessary. Even worse, each of these
PRUNEs can trigger a storm of overriding JOIN messages.
These unnecessary PRUNEs can be avoided by employing a strategy
similar to that which is employed for avoiding redundant JOINs.
Instead of sending a PRUNE message upstream as soon as it receives an
incoming data packet from a LAN as specified in RFC 3973, the router
may instead defer sending this PRUNE for a short period of time. The
router then sends this PRUNE if, and only if, it does not overhear a
PRUNE message from another router on the same LAN to the same
upstream router for the same (S,G) pair before this timer expires.
However, in the case where no downstream router has any downstream
group members, it is absolutely essential that the time delay before
Weinstein & Keller Expires December 6, 2013 [Page 8]
Internet-Draft PIM-DM Optimizations for Large LANs June 2013
emission of the first PRUNE by some downstream router be kept to an
absolute minimum. Until that PRUNE is received by the upstream
router, unnecessary user traffic will continue to be admitted to the
LAN and, possibly, upstream networks as well, consuming unnecessary
resources. Forutunately, this time delay is relevant only if no
downstream router has any downstream members. Otherwise the PRUNE
will be overridden anyway by a JOIN from one of the downstream
routers with downstream members, and data will continue to flow.
By using a deterministic algorithm for computing this delay, the
delay before some router sends the first PRUNE can, as required, be
kept to zero for the situation where there are no group members
downstream of any router on the LAN. This cannot be guaranteed by
any probabilistic algorithm.
Since a deterministic algorithm is required, we consider again the
algorithm described in sec. 3.1. above, restated as:
Prune_deferral_timer = Prune_deferral_interval*ln(1+J)/ln(N-1)
As before, J is an index function that assigns to each router on the
LAN (except the upstream router towards the source S) a unique
integer in the interval [0,N-2], where N is the total number of PIM
routers on the LAN. Such a function can be constructed, for example,
by having each router count the number of other routers (other than
the upstream router) with higher-addressed interfaces on the LAN, and
using that count as the value of J.
Due to its deterministic character and the definition of the index
function J, this algorithm guarantees that there will always be some
router on the LAN for which J=0, so that Prune_deferral_timer is also
zero. Consequently, when every router (other than the upstream
router) lacks downstream group members, this algorithm guarantees
that some router will as required emit a PRUNE with zero delay.
Since the time delay before some router emits a first PRUNE is
irrelevant if one or more downstream routers have downstream group
members, the value of Prune_deferral_interval can safely be made
fairly large, thus minimizing the risk that redundant PRUNEs will be
sent. In fact, if there were no concern for the possibility that
PRUNEs might be lost, there would be no need for any router other
than the highest-addressed router to ever send a PRUNE. If it had
failed to send a PRUNE, then it could be concluded that it had
downstream group members, and hence would itself generate an
overriding JOIN if some other router then sent a PRUNE. Using an
algorithm that provides for other routers to send PRUNEs adds a
measure of redundancy, that helps protect against the possibility
that a PRUNE from the highest-addressed router might simply be lost.
Weinstein & Keller Expires December 6, 2013 [Page 9]
Internet-Draft PIM-DM Optimizations for Large LANs June 2013
The order in which a router sees an incoming (S,G) data packet, a
PRUNE emitted by some other router in response to that packet, and an
overriding JOIN from a third router, may vary depending upon
propagation delays in the network. In particular, one must also
address the situation wherein a router sees an overriding JOIN from
another before it sees either the PRUNE that triggered that
overriding JOIN or the data packet that triggered the PRUNE. This
can be handled through judicious use of the Prune Limit Timer.
Hence, to avoid unnecessary PRUNEs without increasing the time delay
before pruning can occur, the operation of the Upstream Prune, Join,
and Graft State Machine [RFC3973 sec. 4.4.1] SHOULD be modified as
described in this section.
4.2. Modifications to the state-machine-specific timers [RFC3973 Sec.
4.4.1]
In the description of the state-machine-specific timers [RFC3973 sec.
4.4.1], the following timer is added:
Prune Deferral Timer (PDT(S,G))
This timer is used to support redundant PRUNE suppression on a
LAN. If a data packet arrives on the upstream interface
towards S (RPF_Interface(S)) while the outgoing interface list
olist(S,G) is NULL and the prune limit timer PLT(S,G) is not
running, and redundant PRUNE suppression is enabled on that
interface, then this timer is set to a value t_deferral between
0 and Prune Deferral Interval. If redundant PRUNE suppression
is not enabled on that interface, then this timer is set to
zero (0) and expires immediately. This timer is also set if a
STATE REFRESH message is received with its Prune Indicator is
clear, and the upstream interface is in the Pruned state. If
an (S,G) PRUNE is seen on the upstream interface towards S
(RPF_Interface(S)) and is directed towards RPF'(S), then this
timer is cancelled. If this timer expires, a PRUNE is sent to
the upstream router towards S (RPF'(s)).
The computation` of t_deferral is described in Sec. 4.8.
Weinstein & Keller Expires December 6, 2013 [Page 10]
Internet-Draft PIM-DM Optimizations for Large LANs June 2013
4.3. Modifications to the tabular description of the state machine
[RFC3973 Sec. 4.4.1]
In the tabular description of the state machine [RFC3973 sec. 4.4.1],
the following entries are added:
+-------------------------------+--------------------------------------+
| | Previous State |
| +------------+------------+------------+
| Event | Forwarding | Pruned | AckPending |
+-------------------------------+------------+------------+------------+
| PDT(S,G) expires | N/A | ->P Send | N/A |
| | | Prune(S,G) | |
+-------------------------------+------------+------------+------------+
Weinstein & Keller Expires December 6, 2013 [Page 11]
Internet-Draft PIM-DM Optimizations for Large LANs June 2013
The following entries, identified by event and previous state, are
modified:
+-------------------------------+--------------------------------------+
| | Previous State |
| +------------+------------+------------+
| Event | Forwarding | Pruned | AckPending |
+-------------------------------+------------+------------+------------+
| Data packet arrives on | ->P Set | ->P Set | N/A |
| RPF_Interface(S) AND | PDT(S,G) | PDT(S,G) | |
| olist(S,G) == NULL AND |Set PLT(S,G)|Set PLT(S,G)| |
| PLT(S,G) not running | | | |
+-------------------------------+------------+------------+------------+
| State Refresh(S,G) received | ->F | ->P Set |->F Cancel |
| from RPF`(S) AND | | PDT(S,G) | GRT(S,G) |
| Prune Indicator == 0 AND | |Set PLT(S,G)| |
| PLT(S,G) not running | | | |
+-------------------------------+------------+------------+------------+
| See Join(S,G) to RPF'(S) | ->F Cancel | ->P Cancel |->AP Cancel |
| | OT(S,G) | PDT(S,G) | OT(S,G) |
| |Set PLT(S,G)|Set PLT(S,G)|Set PLT(S,G)|
+-------------------------------+------------+------------+------------+
| See Prune(S,G) | ->F Set | ->P Cancel |->AP Set |
| | OT(S,G) | PDT(S,G) | OT(S,G) |
| |Set PLT(S,G)|Set PLT(S,G)|Set PLT(S,G)|
+-------------------------------+------------+------------+------------+
| olist(S,G)->non-NULL | N/A | ->AP Send | N/A |
| | | Graft(S,G) | |
| | |Set GRT(S,G)| |
| | |Cancel | |
| | | PDT(S,G)| |
+-------------------------------+------------+------------+------------+
| RPF'(S) Changes AND | ->AP Send | ->AP Send |->AP Send |
| olist(S,G) != NULL | Graft(S,G) | Graft(S,G) | Graft(S,G) |
| |Set GRT(S,G)|Set GRT(S,G)|Set GRT(S,G)|
| | |Cancel | |
| | | PDT(S,G)| |
+-------------------------------+------------+------------+------------+
4.4. Modifications to Transitions from the Forwarding (F) State
[RFC3973 Sec. 4.4.1]
In the description of Transitions from the Forwarding (F) State [RFC
3973 Sec. 4.4.1.1], the following sections are modified as follows:
Data Packet arrives on RPF_Interface(S) AND olist(S,G) == NULL AND
S NOT directly connected
Weinstein & Keller Expires December 6, 2013 [Page 12]
Internet-Draft PIM-DM Optimizations for Large LANs June 2013
The Upstream(S,G) state machine MUST transition to the Pruned
(P) state, set the Prune Deferall Timer PDT(S,G) to t_deferall
seconds, and set PLT(S,G) to t_limit seconds.
See Join(S,G) to RPF'(S)
This event is only relevant if RPF_interface(S) is a shared
medium. This router sees another router on RPF_interface(S)
send a Join(S,G) to RPF'(S,G). If the OT(S,G) is running, then
it means that the router had scheduled a Join to override a
previously received Prune. Another router has responded more
quickly with a Join, so the local router SHOULD cancel its
OT(S,G), if it is running. The Upstream(S,G) state machine
remains in the Forwarding (F) state.
In addition, there is no sense in sending a PRUNE to RPF'(S) in
the near-term, as that PRUNE would only be overridden by
another JOIN. Hence, the upstream (S,G) state machine SHOULD
set the Prune Limit Timer PLT(S,G) to t_limit.
See Prune(S,G) AND S NOT directly connected
This event is only relevant if RPF_interface(S) is a shared
medium. This router sees another router on RPF_interface(S)
send a Prune(S,G). As this router is in Forwarding state, it
must override the Prune after a short random interval. If
OT(S,G) is not running, the router MUST set OT(S,G) to
t_override seconds. The Upstream(S,G) state machine remains in
Forwarding (F) state. The router MAY reset its PLT(S,G) to the
value in the Holdtime field of the received message if it is
greater than the current value of the PLT(S,G).
4.5. Modifications to Transitions from the Pruned (P) State [RFC3973
Sec. 4.4.1]
In the description of Transitions from the Pruned (P) State [RFC 3973
Sec. 4.4.1.1], the following sections are added:
Prune Deferral Timer PDT(S,G) expires
Send a PRUNE(S,G) to RPF'(S)
See Join(S,G) to RPF'(S)
A Join(S,G) is seen on RPF_interface(S) to RPF'(S). The
Upstream(S,G) state machine stays in the Pruned (P) state. If
the Prune Deferral Timer PDT(S,G) is set, it SHOULD be
cancelled.
In addition, there is no sense in sending another PRUNE to
RPF'(S) in the near-term, as that PRUNE would only be
overridden by another JOIN. Hence, the upstream (S,G) state
machine SHOULD set the Prune Limit Timer PLT(S,G) to t_limit.
In the description of Transitions from the Pruned (P) State [RFC 3973
Sec. 4.4.1.1], the following sections are modified as follows:
Weinstein & Keller Expires December 6, 2013 [Page 13]
Internet-Draft PIM-DM Optimizations for Large LANs June 2013
Data arrives on RPF_interface(S) AND PLT(S,G) not running AND S
NOT directly connected
Either another router on the LAN desires traffic from S
addressed to G or a previous Prune was lost. To prevent
generating a Prune(S,G) in response to every data packet, the
Prune Limit Timer (PLT(S,G)) is used. To prevent multiple
routers from generating Prune(S,G) messages, the Prune Deferral
Timer is used (PDT(S,G)). Once the PLT(S,G) expires, some
router needs to send another prune in response to a data packet
not received directly from the source. The Prune Deferral
Timer PDT(S,G) MUST be set to t_deferral, and the PLT(S,G) MUST
be set to t_limit.
State Refresh(S,G) Received from RPF'(S)
The Upstream(S,G) state machine remains in a Pruned state. If
the State Refresh has its Prune Indicator bit set to zero and
PLT(S,G) is not running, The Prune Deferral Timer PDT(S,G) MUST
be set to t_deferral, and the PLT(S,G) MUST be set to t_limit.
If the State Refresh has its Prune Indicator bit set to one,
the router MUST reset PLT(S,G) to t_limit.
See Prune(S,G) to RPF'(S)
A Prune(S,G) is seen on RPF_interface(S) to RPF'(S). The
Upstream(S,G) state machine stays in the Pruned (P) state. If
the Prune Deferral Timer PDT(S,G) is set, it SHOULD be
cancelled. The router MAY reset its PLT(S,G) to the value in
the Holdtime field of the received message if it is greater
than the current value of the PLT(S,G).
olist(S,G)->non-NULL AND S NOT directly connected
The set of interfaces defined by the olist(S,G) macro becomes
non-empty, indicating that traffic from S addressed to group G
must be forwarded. The Upstream(S,G) state machine MUST cancel
PLT(S,G), transition to the AckPending (AP) state and unicast a
Graft message to RPF'(S). The Graft Retry Timer (GRT(S,G))
MUST be set to Graft_Retry_Period. If the Prune Deferral timer
PDT(S,G) is set, it MUST be cancelled.
RPF'(S) Changes AND olist(S,G) == non-NULL AND S NOT directly
connected
Unicast routing or Assert state causes RPF'(S) to change,
including changes to RPF_Interface(S). The Upstream(S,G) state
machine MUST cancel PLT(S,G), cancel the Prune Deferral Timer
PDT(S,G), transition to the AckPending (AP)state, send a Graft
unicast to the new RPF'(S), and set the GraftRetry Timer
(GRT(S,G)) to Graft_Retry_Period.
4.6. Modifications to PIM-DM Timers [RFC 3973 Sec. 4.8]
A new timer must be added to the list of PIM-DM Timers at the head of
RFC 3973 Sec. 4.8 [RFC3973]:
Weinstein & Keller Expires December 6, 2013 [Page 14]
Internet-Draft PIM-DM Optimizations for Large LANs June 2013
Per (S,G) Pair:
...
(S,G) Prune Deferral Timer : PDT(S,G)
A new subsection must be added to RFC 3973 Sec. 4.8 [RFC3973]:
Timer Name: Prune Deferral Timer (PDT(S,G))
+------------+----------------+----------------------------------------+
| Value Name | Value | Explanation |
+------------+----------------+----------------------------------------|
| t_deferral | PDI(I)* | Randomized delay to prevent response |
| | ln(1+J)/ | implosion when sending a prune in |
| | ln(N-1) | response to incoming user traffic or |
| | | an incoming state refresh message |
+------------+----------------+----------------------------------------+
t_deferall is a value between 0 and the incoming interface's
Prune_Deferall_Interval (PDT(I)) computed from the formula
t_deferall = PDT(I)*ln(1+J)/ln(N-1), where N is the total number
of PIM routers on incoming interface I (including this router),
and J is a count of the number of PIM routers on incoming
interface I (other than the upstream router towards source S) with
addresses greater than the address of this router. The
Prune_Deferall_Interval SHOULD be set by configuration and SHOULD
have a value significantly greater than the expected propagation
delay (PD(I)) but significantly less than the Prune Limit Timer.
All routers on the same LAN SHOULD be assigned the same value of
Prune_Deferall_Interval. Recommended default: 10 seconds.
Note that, if N <= 1, there will be no router to send a prune on
the incoming interface I, and so an attempt to take the logarithm
of a negative number will never occur. If N = 2, then the only
possible value of J will be J = 0, so the formula given above will
result in a divide of zero by zero (ln(1) / ln(1) ). This should
be resolved by letting t_deferral = 0 in this case.
4.7. Other modifications
No other modifications to RFC 3973 are required.
5. Compatibility with RFC 3973
PIM-DM routers may implement the optimizations described herein on
certain interfaces, while implementing standard PIM-DM as described
in RFC 3973 on others. Hence, PIM-DM routers supporting the
optimization described herein SHOULD provide a per-interface
Weinstein & Keller Expires December 6, 2013 [Page 15]
Internet-Draft PIM-DM Optimizations for Large LANs June 2013
configuration mechanism for selectively enabling or disabling these
optimizations.
PIM-DM routers implementing the optimizations described herein will
interoperate with those implementing standard PIM-DM, as described in
RFC 3973, even if intermixed on the same LAN. However, to obtain the
full benefit of these optimizations, all PIM routers on the LAN
should implement these optimizations. Mixed LANs may exhibit
degraded performance, especially if several PIM-DM routers
implementing only standard PIM are assigned higher-numbered addresses
on the LAN than any of those implementing these optimizations.
6. Security Considerations
The optimizations described in this document should not have any
impact upon the security of the PIM-DM protocol or the lack thereof.
7. IANA Considerations
There are no IANA considerations.
8. Conclusions
This document has described several optimizations to the PIM-DM
protocol for LANs with large numbers of attached PIM-DM routers.
These optimizations more fully exploit the potential for suppressing
redundant JOIN/PRUNE messages inherent in the use of link-local
multicast to transport these messages. These optimizations provide
maximum benefit when the number of PIM routers attached to a single
LAN is large, and the propagation delay on the LAN is non-trivial but
much shorter than the LAN Prune Delay.
The optimized JOIN/PRUNE suppression described here will in most
cases scale much better to large LANs than possible alternatives in
which JOIN/PRUNE messages are unicast to the upstream router instead
(as proposed by RFC6559 for PIM-SM). The use of unicast obviates all
possibility of redundant JOIN/PRUNE suppression, resulting in an
implosion at the upstream router of N-1 unicast JOIN and PRUNE
messages. By contrast, the offered load from multicast JOIN/PRUNE
messages with the optimizations described here should scale roughly
as N ** (propagation_delay/override_interval), with propagation_delay
<< override_interval.
On a classical Ethernet the resource consumption from unicast and
multicast messages is the same. Consequently, the optimized JOIN/
Weinstein & Keller Expires December 6, 2013 [Page 16]
Internet-Draft PIM-DM Optimizations for Large LANs June 2013
PRUNE suppression described here will almost always be preferable to
use of unicast JOIN/PRUNE messages without redundant message
suppression.
For LANs emulated by multicast-capable IP tunneling across wireless
mobile ad-hoc networks (MANETs) the resource consumption depends upon
the lower-layer mechanisms used to implement multicast, the network
topology, the MAC protocol, the routing protocol, and the
opportunities for spatial reuse [BBN-TM-2031]. Nevertheless, as long
as link-local multicast is implemented at the lower layer via
reversed-shortest-path-forwarding or some other mechanism that is
more efficient than packet replication at the source, the optimized
JOIN/PRUNE suppression will still usually be preferable to use of
unicast JOIN/PRUNE messages without redundant message suppression.
However, exceptions can occur if propagation delay and/or packet loss
on the emulated LAN are too high to permit effective suppression of
redundant JOIN/PRUNE messages.
The authors have examined the behavior of the modified protocol in
LANs with up to 100 attached PIM routers, with propagation delays on
the order of a few tens of milliseconds. We have found that, with
the optimizations proposed herein, it is unusual for any redundant
JOINs or PRUNEs to be emitted. In most cases, only one PRUNE and, if
needed, one overriding JOIN is emitted onto the network in each prune
cycle.
9. References
9.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC3973] Adams, A., Nicholas, J., and W. Siadak, "Protocol
Independent Multicast - Dense Mode (PIM-DM): Protocol
Specification (Revised)", RFC 3973, January 2005.
9.2. Informative References
[BBN-TM-2031]
Ramanathan, R., "An Approximate Model for Analyzing Real-
World Wireless Network Scalability", BBM Technical
Memo 2031, March 2012, <http://www.ir.bbn.com/~ramanath/
pdf/symptotics-techreport.pdf>.
[RFC6559] Farinacci, D., Wijnands, IJ., Venaas, S., and M.
Napierala, "A Reliable Transport Mechanism for PIM",
Weinstein & Keller Expires December 6, 2013 [Page 17]
Internet-Draft PIM-DM Optimizations for Large LANs June 2013
RFC 6559, March 2012.
[Weinstein_In_Progress_a]
Weinstein, J., "Mobility Hiding through IP Tunneling
Across Wireless MANETs", Work in Progress a.
[Weinstein_In_Progress_b]
Weinstein, J., "PIM-DM over Multi-Layer Networks such as
MANETs and VPLS emulated LANs", Work in Progress b.
Authors' Addresses
Joseph Weinstein
Raytheon BBN Technologies
10 Moulton Street
Cambridge, MA 02138
USA
Email: weinstein@bbn.com
Joseph Keller
Raytheon BBN Technologies
10 Moulton Street
Cambridge, MA 02138
USA
Email: jkeller@bbn.com
Weinstein & Keller Expires December 6, 2013 [Page 18]