Internet DRAFT - draft-ravisingh-teas-rsvp-setup-retry
draft-ravisingh-teas-rsvp-setup-retry
TEAS Working Group Ravi Singh
Internet Draft Juniper Networks
Intended status: Best Current Practice Rob Shakir
British Telecom
Vishnu Pavan Beeram
Juniper Networks
Tarek Saad
Cisco Systems
Expires: January 2, 2016 July 2, 2015
RSVP Setup Retry - BCP
draft-ravisingh-teas-rsvp-setup-retry-01
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
This Internet-Draft will expire on January 2, 2016.
Copyright Notice
Copyright (c) 2015 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with
Ravi Singh Expires January 2, 2016 [Page 1]
Internet-Draft RSVP Setup Retry July 2015
respect to this document. Code Components extracted from this
document must include Simplified BSD License text as described in
Section 4.e of the Trust Legal Provisions and are provided without
warranty as described in the Simplified BSD License.
Abstract
This document discusses the best current practices associated with
the implementation of RSVP setup-retry timer.
Conventions used in this document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC-2119 [RFC2119].
Table of Contents
1. Introduction...................................................2
2. Setup-Retry Timer..............................................3
3. Possible ill-effects due to implementation choices.............3
4. Causes of the above ill-effects................................5
5. Solution to the implementation issues..........................5
6. Security Considerations........................................6
7. IANA Considerations............................................6
8. Normative References...........................................6
9. Acknowledgments................................................6
10. Authors' Addresses............................................6
Contributors......................................................7
1. Introduction
In an RSVP-TE network with a very large number of LSPs, link/node
failure(s) may produce a noticeable increase in RSVP-TE control
traffic. As a result, RSVP-TE messages might get delayed by virtue
of being stuck in a queue that is overwhelmed with messages to be
sent or they might get lost forever. For example, a Path message
intended to be sent by a transit router might be stuck in the output
queue to be sent to the next-hop. Alternately, it might have got
dropped on the receive side due to queue overflows. The same could
happen for a Resv message in the reverse direction. Also, in the
absence of reliable delivery of Path-Error messages [RFC2961], an
error that gets generated at transit/egress for an LSP that is in
the process of being setup may never make it to the ingress.
Ravi Singh Expires January 2, 2016 [Page 2]
Internet-Draft RSVP Setup Retry July 2015
Lost/delayed RSVP-TE messages cause the following problems for an
ingress router:
- In the absence of an error indication, how is an ingress to know
that an LSP for which signaling was (re-)initiated and a Resv has
not yet been received, is ever going to come up?
- In the absence of any indication, what action should the ingress
take to support low-latency LSP-setup?
The above problems essentially boil-down to: how long should the
ingress continue to wait before giving up on its attempt to bring up
the LSP, and take some alternative course of action (e.g., try to
bring up the LSP on an alternate path)?. To mitigate this problem,
some implementations use a setup-retry timer mechanism. This
document discusses the issues associated with a particular
implementation of this timer and makes some specific recommendations
to get around these issues.
2. Setup-Retry Timer
The setup-retry timer is usually a configurable timer which (in the
absence of an error indication) goes off when an LSP with a given
LSPID has not received the corresponding Resv in response to its
Path during a pre-configured duration after its first Path had been
sent.
Use of the setup-retry timer is based on the presumption that if
signaling for a given LSP has not been completed within an
"expected" duration, it is not going to be completed at all. The
intent in the use of this timer is to expeditiously take some
alternative course of action when an LSP has not yet completed its
signaling within an "expected" duration of time.
3. Possible ill-effects due to implementation choices
As mentioned in the previous section, the intent in the use of this
timer is to take some alternative course of action when an LSP has
not yet completed its signaling within an "expected" duration of
time. One such course of action is for the ingress router to
initiate tear-down for the previously in-the-process-of-being-
signaled path via a PathTear; run CSPF; and use the outcome of this
CSPF to signal the brand-new path for this tunnel with a different
LSP-ID, typically, bumped up by 1. This section describes the
problems caused by such course of action.
As mentioned in Section 1, in a network with a very large number of
RSVP-TE LSPs, link/node failure(s) may produce a noticeable increase
Ravi Singh Expires January 2, 2016 [Page 3]
Internet-Draft RSVP Setup Retry July 2015
in the volume of RSVP-TE control traffic, which in turn might cause
a router to either drop RSVP-TE messages or alternately cause them
to be sent excessively late.
As a result, the following problems can occur:
- LSP setup latency might be excessively high.
- Error messages that indicate failure in LSP setup might not make
it to the ingress router.
A mix of the above problems can cause the setup-retry timer for a
given LSP (at the ingress router) to fire repeatedly over a period
of time. The situation being such the ingress gets stuck in a cycle
as illustrated below for some/many LSPs:
--------------------------------------------------------------------
Ingress Timeline | [Ingress]---[]---[]...[Transit]...[]---[]-
------------------------|
1. Trigger LSP setup | Path
: | TNL-ID=X
: | LSP-ID=Y
: | -------->
<No Resv (X, Y)> | ------------> Path (X, Y)
: | -------> --------->
: | :
: | :
2. Setup-Retry Timer | :
fires; Recompute | :
path; | :
3. Trigger Teardown | PathTear
| TNL-ID=X
| LSP-ID=Y
| -------->
| ------------> PathTear (X, Y)
| -------> --------->
4. Trigger setup for new| Path
instance of the LSP | TNL-ID=X
(same ERO) | LSP-ID=Y+1
: | -------->
: | ------------> Path (X, Y+1)
: | -------> --------->
: | Resv
<No Resv (X, Y+1)> | TNL-ID=X
: | LSP-ID=Y
: | <---------
: | ResvError
: | No Path
Ravi Singh Expires January 2, 2016 [Page 4]
Internet-Draft RSVP Setup Retry July 2015
: | --------->
5. Repeat loop through | :
2-4 | :
--------------------------------------------------------------------
In the above illustration, notice how the transit router never gets
to completely process the "current" LSP-ID (see [RShakir] for more).
The implementation recommendations made in this document will help
avoid this snowball effect.
4. Causes of the above ill-effects
The implementation issues listed in section 3 end up causing an
increase in the control plane load on a network whose control plane
is already under stress. The foregoing is caused by unnecessarily
doing the following even when there is no change in the computed
path:
- Sending PathTears causes excessive and unjustifiable work on those
downstream routers on the "previous ERO path" that had managed to
bring the LSP UP. In other words, the slowness of a given transit
router should not be the cause to penalize all other transit
routers downstream of it, as doing so just increases the overall
network stress.
- Sending Path for LSPID=Y+1 causes unnecessary work for all routers
on the ERO path including those that were already running slow and
were the real cause of the Resv for LSPDID=Y not having been
received timely by the ingress.
5. Solution to the implementation issues
To eliminate causes of the ill-effects listed in the previous
section and thus to eliminate the ill-effects, this document makes
the following recommendations.
When the setup-retry timer fires:
If there is no change in the computed path (no error indication for
that LSP has been received via a PathErr or a TE update indicating a
failure),
- Do not send PathTear for LSPID=Y
- Just let the Path State get refreshed for LSPID=Y.
The recommended default behavior is to keep retrying until the path
changes or the user intervenes. Implementations MAY choose to
Ravi Singh Expires January 2, 2016 [Page 5]
Internet-Draft RSVP Setup Retry July 2015
provide the user with an option to override this default behavior
and specify a policy to determine when to stop retrying.
Implementations SHOULD use the recommendations listed in this
section to avoid getting stuck in a LSP signaling hysteresis.
6. Security Considerations
This document does not introduce any new security concerns.
7. IANA Considerations
None.
8. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RShakir] Rob Shakir, "The next spring forward",
http://rob.sh/files/the-next-spring-forward_rjs120314.pdf
March 2014.
[RFC2961] Berger, L., "RSVP Refresh Overhead Reduction Extensions",
RFC 2961, April 2001.
9. Acknowledgments
The authors would like to thank Yakov Rekhter and Raveendra Torvi
for their inputs.
10. Authors' Addresses
Ravi Singh
Juniper Networks
Email: ravis@juniper.net
Rob Shakir
British Telecom
Email: rob.shakir@bt.com
Tarek Saad
Cisco Systems
Email: tsaad@cisco.com
Vishnu Pavan Beeram
Ravi Singh Expires January 2, 2016 [Page 6]
Internet-Draft RSVP Setup Retry July 2015
Juniper Networks
Email: vbeeram@juniper.net
Contributors
Markus Jork
Juniper Networks
Email: mjork@juniper.net
Aman Kapoor
Juniper Networks
Email: amanka@juniper.net
Ravi Singh Expires January 2, 2016 [Page 7]