Network Working Group | Yiqun Cai |
Internet-Draft | Liming Wei |
Intended status: Experimental Protocol | Heidi Ou |
Expires: January 09, 2012 | Cisco Systems, Inc. |
Vishal Arya | |
Sunil Jethwani | |
DIRECTV Inc. | |
July 08, 2011 |
Protocol Independent Multicast ECMP Assert
draft-hou-pim-ecmp-01.txt
A PIM router uses RPF procedure to select an upstream interface and router to build forwarding state. When there are equal cost multiple paths (ECMP), existing implementations often use hash algorithms to select a path. Such algorithms do not allow the spread of traffic among the ECMPs according to administrative metrics. This usually leads to inefficient or ineffective use of network resources. This document introduces the ECMP Assert, a mechanism to improve the RPF procedure over ECMPs. It allows ECMP path selection to be based on administratively selected metrics, such as data transmission delays, path preferences and routing metrics.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 09, 2012.
Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].
A PIM [RFC4601] router uses RPF procedure to select an upstream interface and a PIM neighbor on that interface to build forwarding state. When there are equal cost multiple paths (ECMP) upstream, existing implementations often use hash algorithms to select a path. Such algorithms do not allow the spread of traffic among the ECMP according to administrative metrics. This usually leads to inefficient or ineffective use of network resources. This document introduces the ECMP Assert, a mechanism to improve the RPF procedure over ECMP. It allows ECMP path selection to be based on administratively selected metrics, such as data transmission delays, path preferences and routing metrics, or a combination of metrics.
ECMPs are frequently used in networks to provide redundancy and to increase available bandwidth. A PIM router selects a path in the ECMP based on its own implementation specific choice. The selection is a local decision. One way is to choose the PIM neighbor with the highest IP address, another is to pick the PIM neighbor with the best hash value over the destination and source addresses.
While implementations supporting ECMP have been deployed widely, the existing RPF selection methods have weaknesses. The lack of administratively effective ways to allocate traffic over alternative paths is a major issue. For example, there is no straightforward way to tell two downstream routers to select either the same or different RPF neighbor routers for the same traffic flows.
With the ECMP Assert mechanism introduced here, the upstream routers use a new PIM ECMP Assert message to instruct the downstream routers on how to tie-break among the upstream neighbors. The PIM ECMP Assert message conveys the tie-break information based on metrics selected administratively.
The existing PIM Assert mechanism allows the upstream router to detect the existence of multiple forwarders for the same multicast flow onto the same downstream interface. The upstream router sends a PIM Assert message containing a routing metric for the downstream routers to use for tie-breaking among the multiple upstream forwarders on the same RPF interface.
With ECMP interfaces between the downstream and upstream routers, the PIM ECMP Assert mechanism works in a similar way, but extends the ability to resolve the selection of forwarders among different interfaces in the ECMP.
When a PIM router downstream of the ECMP interfaces creates a new (*,G) or (S,G) entry, it will populate the RPF interface and RPF neighbor information according to the rules specified by [RFC4601]. This router will send its initial joins to that RPF neighbor.
When the RPF neighbor router receives the join message and finds that the receiving interface is one of the ECMP interfaces, it will check if the same flow is already being forwarded out of another ECMP interface. If so, this RPF neighbor router will send a PIM ECMP Assert message onto the interface the join was received on. The PIM ECMP Assert message contains the address of the desired RPF neighbor, an interface ID [INTID], along with other parameters used as tie breakers. In essence, a PIM ECMP Assert message is sent by an upstream router to notify downstream routers to redirect PIM Joins to the new RPF neighbor via a different interface. When the downstream routers receive this message, they should trigger PIM Joins toward the new RPF neighbor specified in the packet.
This new message is named PIM ECMP Assert for the following reasons,
This new message functions in similar ways to the existing PIM Assert message, with the exception that the existing Assert message is used to select an upstream router within the same multi-access network (such as a LAN) while the new message is used to select both a network and an upstream router.
One advantage of this design is that the control messages are only sent when there is need to "re-balance" the traffic. This reduces the amount of control traffic.
The use of ECMP Assert applies to shared trees or source trees built with procedures described in [RFC4601]. The use of ECMP Assert in "Protocol Independent Multicast - Dense Mode" [RFC3973] or in "Bidirectional Protocol Independent Multicast" [RFC5015] is not considered.
The enhancement described in this document can be applicable to a number of scenarios. For example, it allows a network operator to use ECMP paths and have the ability to perform load splitting based on bandwidth. To do this, the downstream routers perform RPF selection with bandwidth instead of IP addresses as a tie breaker. The ECMP Assert mechanism assures that all downstream routers select the desired network link and upstream router whenever possible. Another example is for a network operator to impose a transmission delay limit on certain links. The ECMP Assert mechanism provides a mean for an upstream router to instruct a downstream router to choose a different RPF path.
This specification does not dictate the scope of applications of this mechanism.
An ECMP bundle is a set of PIM enabled interfaces on a router, where all interfaces belonging to the same bundle share the same routing metric. The ECMP paths reside between the upstream and downstream routers over the ECMP bundle.
There can be one or more ECMP bundles on any router, while one individual interface can only belong to a single bundle.
ECMP bundles are created on a router via configuration.
ECMP Asserts are sent by an upstream router in a rate limited fashion, under the following conditions,
In both cases, an ECMP Assert is sent to the non-desired interface. An outgoing interface is considered "non-desired" when,
An upstream router may choose not to send ECMP Asserts if it becomes aware that some of the downstream routers do not support the new message, or unreachable via some links in ECMP bundle.
When a downstream router receives an ECMP Assert, and detects the desired RPF path from its upstream router's point of view is different from its current one, it should choose to prune from the current path and join to the new path. The exact order of such actions is implementation specific.
If a downstream router receives multiple ECMP Asserts sent by different upstream routers, it SHOULD use the Preference, Metric, or other fields as specified below, as the tie breakers to choose the most preferred RPF interface and neighbor.
If an upstream router receives an ECMP Assert from another upstream router, it SHOULD NOT change its forwarding behavior even if the ECMP Assert makes it a less preferred RPF neighbor on the receiving interface.
During a transient network outage with a single link cut in an ECMP bundle, a downstream router may lose connection to its RPF neighbor and the normal ECMP Assert operation may be interrupted temporarily. In such an event, the following actions are recommended.
The down stream router may re-select a new RPF neighbor. Among all ECMP upstream routers, the one on the same LAN as the previous RPF neighbor is preferred.
If there is no upstream router reachable on the same LAN, the down stream router will select a RPF neighbor on a different LAN. Among all ECMP upstream routers, the one served as RPF neighbor before the link failure is preferred. Such a router can be identified by the Router ID which is part of the Interface ID in the PIM ECMP Assert Hello option.
During normal ECMP Assert operations, when PIM Joins for the same (*,G) or (S,G) are received on a different LAN, an upstream router will send ECMP Assert to prune the non-preferred LAN. Such ECMP Asserts during partial network outage can be supressed if the upstream router decides that the non-preferred PIM Join is from a router that is not reachable via the preferred LAN. This check can be performed by retrieving the downstream's Router ID, using the source address in the PIM join, and searching neighbors on the preferred LAN for one with the same router ID.
If a PIM router supports this draft, it MUST send the new Hello option ECMP-Assert-Supported TLV in its PIM Hello messages. A PIM router sends ECMP Asserts on an interface only when it detects that all neighbors have sent this Hello option. If a PIM router detects that any of its neighbor does not support this Hello option, it MUST not send ECMP Asserts, however, it SHOULD still process any ECMP Asserts received.
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type = TBD | Length = 0 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |PIM Ver| Type | Reserved | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Group Address (Encoded-Group format) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Address (Encoded-Unicast format) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Neighbor Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+- ............ Interface ID ........... -+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Preference | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-- ... Metric ... -+-+-+-+-+-+-+-+-+ | | +- .. Metric .. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+
A new PIM Type is required to be assigned to the ECMP Assert messages. According to [PIMREG], this document recommends 11 (0xB) as the new "PIM ECMP Assert Type".
Security of the ECMP Assert is only guaranteed by the security of the PIM packet, so the security considerations for PIM Assert packets as described in [RFC4601] apply here. Spoofed ECMP Assert packets may cause the downstream routers to send PIM Joins to an undesired upstream router, and trigger more ECMP Assert messages.
The authors would like to thank Apoorva Karan for helping with the original idea, Eric Rosen, Isidor Kouvelas, Toerless Eckert and Stig Venaas for their review comments.
[RFC2119] | Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. |
[RFC4601] | Fenner, B., Handley, M., Holbrook, H. and I. Kouvelas, "Protocol Independent Multicast - Sparse Mode (PIM-SM): Protocol Specification (Revised)", RFC 4601, August 2006. |
[RFC3973] | Adams, A., Nicholas, J. and W. Siadak, "Protocol Independent Multicast - Dense Mode (PIM-DM): Protocol Specification (Revised)", RFC 3973, January 2005. |
[RFC5015] | Handley, M., Kouvelas, I., Speakman, T. and L. Vicisano, "Bidirectional Protocol Independent Multicast (BIDIR-PIM)", RFC 5015, October 2007. |
[INTID] | Gulrajani, S. and S. Venaas, "An Interface ID Hello Option for PIM", Internet-Draft draft-gulrajani-pim-hello-intid-01.txt, . |
[PIMREG] | Venaas, S., "A Registry for PIM Message Types", Internet-Draft draft-ietf-pim-registry-04.txt, . |