Internet DRAFT - draft-zhangm-ccamp-reroute
draft-zhangm-ccamp-reroute
Network Working Group M. Zhang
Internet-Draft LF. Zhang
Intended status: Informational YF. Ji
Expires: April 21, 2012 YB. Xu
BUPT
Y. Wang
CATR
October 19, 2011
Network Survivability Evaluation Metrics in Multi-domain Generalized
MPLS Networks
draft-zhangm-ccamp-reroute-02
Abstract
The ubiquitous presence of the internet coupled with the increasing
demand for high bandwidth dedicated large scale network has made it
imperative that the multi-domain networks are facilitated by the
development of GMPLS. In such large scale network, the high
performance network survivability is a significant factor to resist
the fault service discontinue and interruption even to decrease
economic loss and the society impact. This document proposes a
series of network survivability evaluation metrics and methodologies
that can be used to demonstrate the network survivability performance
in single and multi-domain GMPLS networks, more specifically, the
network fault restoration performance.
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 21, 2012.
Copyright Notice
Copyright (c) 2011 IETF Trust and the persons identified as the
Zhang, et al. Expires April 21, 2012 [Page 1]
Internet-Draft Survivability Evaluation Metric October 2011
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. motivation . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3
2. Conventions Used in This Document . . . . . . . . . . . . . . 4
3. Overview of Network Survivability Evaluation Metrics . . . . . 4
4. Network Survivability Evaluation Metrics . . . . . . . . . . . 4
4.1. Fault Restoration Time Phases . . . . . . . . . . . . . . 4
4.2. Restoration Schemes and Scenarios . . . . . . . . . . . . 6
4.2.1. Fault types . . . . . . . . . . . . . . . . . . . . . 7
4.2.2. Faults in single domain . . . . . . . . . . . . . . . 7
4.2.3. Faults in multi-domain . . . . . . . . . . . . . . . . 8
4.2.3.1. Faults within a domain . . . . . . . . . . . . . . 8
4.2.3.2. Inter-domain faults . . . . . . . . . . . . . . . 9
5. Methodologies . . . . . . . . . . . . . . . . . . . . . . . . 9
5.1. Fault restoration in single domain network . . . . . . . . 9
5.1.1. Reroute . . . . . . . . . . . . . . . . . . . . . . . 10
5.1.2. Fast Reroute . . . . . . . . . . . . . . . . . . . . . 11
5.2. Fault restoration within a domain in multi-domain
network . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.2.1. Reroute . . . . . . . . . . . . . . . . . . . . . . . 12
5.2.2. Fast Reroute . . . . . . . . . . . . . . . . . . . . . 14
5.3. Inter-domain fault restoration in multi-domain network . . 15
6. Protocol Extension Requirements . . . . . . . . . . . . . . . 17
7. Security Considerations . . . . . . . . . . . . . . . . . . . 17
8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 17
9. Normative References . . . . . . . . . . . . . . . . . . . . . 17
Appendix A. Other Authors . . . . . . . . . . . . . . . . . . . . 18
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18
Zhang, et al. Expires April 21, 2012 [Page 2]
Internet-Draft Survivability Evaluation Metric October 2011
1. Introduction
1.1. motivation
Generalized Multi-Protocol Lable Switching (GMPLS) network is a
promising choice with the use of optical technology in core networks
combined with IP/Multi-Protocol Label Switching (MPLS) solution for
the next generation Internet architecture. The ubiquitous presence
of the internet coupled with the increasing demand for high bandwidth
and dedicated large scale network has made it imperative that the
multi-domain networks are facilitated by the development of GMPLS.
Survivability is the capability of the network to maintain service
continuity in the presence of faults within the network, at the same
time, service influenced could be switched over to free resource.
All kinds of intra-domain and inter-domain faults occurs in multi-
domain GMPLS Networks, therefore, in such large scale network, the
high performance network survivability is a significant factor to
resist the fault service interruption even to decrease economic loss
and the society impact due to faults. Recovery time is a key factor
to measure network survivability performance which has an impact on
the link and service evaluation. The long recovery time could
increase the traffic delay, packet losses, the resource collision,
preemption and service discontinue even the whole network can not
reach the level of reliability required by traffic service. The time
of every recovery phrase is required to be known by a series of
measurement methodologies in order to reduce the fault restoration
time. Certain method could be adopted to reduce the every phrase
time to achieve the aim of reducing the whole recovery time.
Therefore, network survivability evaluation metrics is necessary in
multi-domain Generalized MPLS Networks.
This document proposes a series of network survivability evaluation
metrics and methodologies that can be used to demonstrate the network
survivability performance in single and multi-domain GMPLS networks,
more specifically, the network fault restoration performance. The
time of every fault restoration phase is measured precisely to
evaluate the whole network performance by proposed evaluation
metrics.
1.2. Terminology
LSP: Lable Switched Path.
LSR: Label Switched Router.
QoS: Quality of Service.
Zhang, et al. Expires April 21, 2012 [Page 3]
Internet-Draft Survivability Evaluation Metric October 2011
PSL: Path Switch LSR.
ML: Merge LSR.
NMS: Network Management System.
RSVP: Resource Reserve Protocol.
2. Conventions Used in This Document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].In addition,
the reader is assumed to be familiar with the terminology used in
[RFC3945], [RFC3471], [RFC3473] and referenced as well as in
[RFC4427] and [RFC4426].
3. Overview of Network Survivability Evaluation Metrics
There are two recovery mechanisms (eg. protection and restoration)
and the former is outside the scope of this document currently.
Network survivability evaluation metric is used to measure precise
recovery time which is a key factor during the whole fault recovery
process (eg. fault detection, fault location, fault notification,
fault recovery and reversion). These phases define the sequence of
generic operations that need to be performed when a failure occurs.
The evaluation metrics take the time of every phrase into account and
give the specific measurement steps and methodologies.
4. Network Survivability Evaluation Metrics
High performance of network survivability has become a key issue to
improve and satisfy the increasing requirements of reliability and
Quality of Service (QoS) of the whole network. This section defines
a network survivability evaluation metric in single and multi-domain
Generalized MPLS networks.
4.1. Fault Restoration Time Phases
This section gives several typical definitions of restoration times
and durations as shown in figure 1.
Phase 1: Fault detection.
Phase 2: Fault localization and isolation.
Zhang, et al. Expires April 21, 2012 [Page 4]
Internet-Draft Survivability Evaluation Metric October 2011
Phase 3: Fault notification.
Phase 4: Recovery.
Phase 5: Reversion.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Fault management|Backup path|Recovery|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| TDET TLOC TNOT|TBR TBS TBA TSW TCR |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 1: Failure Restoration Time Phases
A detailed analysis and specific definition is provided for each of
the restoration phases as identified in [RFC4427] and [RFC4428].
o Fault detection time TDET
Fault detection time is defined as the time between occurrence of
fault and detecting the fault and degradation.
TDET depends on several factors pertaining to the link propagation
time, link transmission time, node processing time and node queuing
time.
o Fault Localization and isolation time TLOC
Fault Localization and isolation time is defined as the time the
signal indication information is delivered from fault node to PSL.
o Fault notification time TNOT
Fault notification time is defined as the time to inform the
noderesponsible of the switchover that a failure has occurred.
TNOT depends on failure notification delay and the notification
method used.
o Backup routing time TBR
Backup routing time is defined as the time for new backup creation,
routing (TBR) and signaling (TBS).
TBR depends on the routing method applied.
Zhang, et al. Expires April 21, 2012 [Page 5]
Internet-Draft Survivability Evaluation Metric October 2011
o Backup signaling time TBS
Backup signaling time is defined as the time that is required to
activate the backup path before the switchover.
TBS depends on the signaling method applied.
o Backup activation time TBA
Backup activation time is defined as the time between the settlement
of backup path and the switching over the traffic.
TBA depends on the backup path distance and signaling process.
o Switchover time TSW
Switchover time is defined as the time of switching the traffic from
the working path through which the traffic is flowing, to the
alternative/backup path.
TSW depends on the node technology.
o Restoration completion time TCR
Restoration completion time is defined as the time to complete the
fault recovery, i.e. the time it takes the first packet to arrive
from the backup path to the ML.
TCR depends on the backup distance.
o The total restoration time
The total restoration time is defined as the sum of TDET, TNOT, TBR,
TBS, TBA , TSW and TCR.
4.2. Restoration Schemes and Scenarios
Link restoration could effectively take use of network bandwidth to
eliminate faults. The restoration technique is also referred as
reroute and fast reroute, for instance, no backup path is established
prior to the failure to protect the working path. Therefore,
restoration requires dynamic routing algorithms and bandwidth
allocation to establish a backup path on demand upon network failure.
Once the backup path has been set up, traffic is then switched from
the working path.
Zhang, et al. Expires April 21, 2012 [Page 6]
Internet-Draft Survivability Evaluation Metric October 2011
4.2.1. Fault types
There are three failure types according to the fault level in the
optical network, such as service fault, channel failure and fiber
failure. We only take channel failure and fiber failure into
account.
Service fault : service mistake during the process of the message
packaging.
Channel fault: all the services are influenced if a TE link fault
occurs in the certain wavelength channel due to transmitter or
receiver and so on.
Fiber fault: all the services traversing the link are influenced if a
fiber fault occurs due to fiber cut or other external factor and so
on.
4.2.2. Faults in single domain
There are two restoration methods in allusion to fault in single
domain. Fast reroute mainly provides the local repair function such
as span restoration and segment restoration. The start node of span
and segment restoration is responsible for backup path computation
and traffic switching as the PSL(Path Switch LSR) instead of the
source node in reroute scheme.
o Reroute
In the scenario of single domain, detecting entities in transport
plane detect related fault information when node or link failure
occurs. Failure localization/isolation is triggered immediately
after the failure detection. And then the fault indication signaling
is sent to the source node through the GMPLS-based signaling or
flooding method by the detecting node. In the case of flooding
method, intermediate nodes pertaining to the fault end-to-end LSP are
informed the fault indication signaling between the upstream node and
source node through a notification mechanism. In the signaling-based
technique, detecting node sends fault indication signaling such as
RSVP-TE to each LSP affected by the failure through different
notification mechanism.
After receiving the fault indication signaling, the source node
computes a backup path by a series of routing algorithms or route
pre-computation scheme and then allocates the bandwidth. Path and
RESV signaling are responsible for path establishment and resource
reservation respectively for the new backup path. After that, the
traffic is switched to the backup path from the working path.
Zhang, et al. Expires April 21, 2012 [Page 7]
Internet-Draft Survivability Evaluation Metric October 2011
o Fast reroute
In the scheme of fast reroute, failure localization/isolation is
triggered immediately after the failure detection. And then the
fault indication signaling is sent to the span or segment PSL from
the upstream node of failure link through the GMPLS-based signaling
or flooding method. These two notification methods are described in
reroute part of section 4.2.1. On receiving the fault indication
signaling, PSL is computes a new path by a series of routing
algorithms and allocates the bandwidth to the backup path bypass the
fault LSP. After that, the traffic is switched to the backup path
from the working path.
4.2.3. Faults in multi-domain
There are three types of faults in multi-domain network, such as link
or node failure within the domain, failure of a link at a domain
border and failure of domain border node. Inter-domain and Intra-
domain restoration mechanisms are independent with each other.
4.2.3.1. Faults within a domain
When an intra-domain failure occurs, intra-domain restoration
mechanism is set up first within a domain and the restoration scheme
is similar to that of single domain in the scenario of multi-domain.
Inter-domain restoration mechanism would be triggered only if the
previous restoration mechanism fails.
o Reroute
Detecting entities in transport plane detect related fault
information when node or link failure occurs within a domain.
Failure localization/isolation is triggered immediately after the
failure detection. And then the fault indication signaling is sent
to the source node across intermediate domains through the GMPLS-
based signaling or flooding method. After receiving the fault
indication signaling, the source node computes a new path by a series
of routing algorithms and allocates the bandwidth. Path and RESV
signaling are responsible for path establishment and resource
reservation respectively for the backup path. After that, the
traffic is switched to the backup path from the working path.
o Fast reroute
In the same scenario above, detecting entities in transport plane
detect related fault information when node or link failure occurs
within a domain. Failure localization/isolation is triggered
immediately after the failure detection. And then the fault
Zhang, et al. Expires April 21, 2012 [Page 8]
Internet-Draft Survivability Evaluation Metric October 2011
indication signaling is sent to the local or segment PSL from the
upstream node of failure link through the GMPLS-based signaling or
flooding method. These two notification methods are described in
section 4.2.1.1. After receiving the fault indication signaling,
Path Switch LSR (PSL) is responsible for computing a new path by a
series of routing algorithms and allocates the bandwidth to establish
the backup path bypass the fault LSP. After that, the traffic is
switched to the backup path from the working path.
4.2.3.2. Inter-domain faults
Inter-domain faults comprise inter-domain link fault and border node
fault of the domain. Each domain has its own domain border node, and
these two border nodes are connected by a TE link. TE link is
invalid once the border node fails.
When the LSP traverses multiple domains and inter-domain failure
occurs, the process of failure detection and localization/isolation
is the same to that of single domain whose detail is described in
section 4.2.1. If the fault TE link is the only one between two
domains, the restoration mechanism adopts the end-to-end reroute
restoration scheme. The fault indication signal is sent to source
node by the upstream node along the LSP, and then the source node
computes another path and allocates the resource avoiding the domain
relative to the fault node and link. Otherwise, the restoration
mechanism could adopt either the reroute or the fast reroute scheme
if there is more than one link between two domains. Path and RESV
signaling are responsible for path establishment and resource
reservation respectively between PSL and ML. After that, the traffic
is switched to the backup path from the working path.
5. Methodologies
It is difficult to measure Detection time TDEF which depends on the
monitoring technique and reversion is a normalization process.
Therefore, the methodology of detection and reversion time are
outside the scope of this document.
5.1. Fault restoration in single domain network
This section gives two measurement methods of fault restoration which
are end-to-end reroute and fast reroute respectively in single domain
network. It is assumed that there exits an LSP (1-2-3-4) where data
flow is from node 1 to node 4 as an example shown in figure 2 and 3.
The link fault occurs between node 2 and node 3.
Zhang, et al. Expires April 21, 2012 [Page 9]
Internet-Draft Survivability Evaluation Metric October 2011
5.1.1. Reroute
Generally, when the failure occurs the methodology would proceed as
follows:
o The node 3 sends Channelstatus Message to node 2 indicating the
failure to the corresponding the upstream node.
o Record the timestamp (T1) when the first bit of Channelstatus
Message is sent to the node 2 along the LSP.
o When node 2 receives the ChannelStatus message from node 3, it
returns a ChannelStatusAck message back to node 3 and correlates
the failure locally. When Node 2 correlates the failure and
verifies that the failure is clear, it has localized the failure
to the data link between node 3 and node 2. At that time, node 2
sends a ChannelStatus message to node 3 indicating that the
failure has been localized.
o Then record the timestamp (T2) when the last bit of ChannelStatus
message from node 2 is received by node 3.
o The fault localization delay is T2-T1.
o The node 2 sends the notification information to the source
node(node 1) of the LSP traversing intermediate nodes. Then
record the timestamp (T3) when the first bit of PathErr
information is sent out.
o Record the timestamp (T4) when the node 1 receives the last bit of
the PathErr Message.
o Notification delay is T4-T3.
o Record the timestamp (T5) after node receives the notification
information. Node 1 as the PSL computes a new path through either
a series of route algorithms or pre-computed schemes.
o PATH and RESV signaling are responsible for path establishment
request and resource reservation respectively for a new backup
path. Then the traffic is switched from a working path to the
backup path. Record the timestamp (T6) when the first packet of
traffic arrives at the ML(node 4) through the backup path.
o Recovery time is T6-T5.
o The total fault restoration time is T2+T4+T6-T1-T3-T5.
Zhang, et al. Expires April 21, 2012 [Page 10]
Internet-Draft Survivability Evaluation Metric October 2011
+-------+ +-------+ +-------+ +-------+
+ Node1 +----c-----+ Node2 +--- c ----+ Node3 +----c ---+ Node4 +
--+-------+----------+-------+----##----+-------+---------+-------+->
+ + + + + + + +
+-------+ +-------+ +-------+ +-------+
Figure 2: Reroute of fault in single domain (indicated by ## in the
figure)
5.1.2. Fast Reroute
Generally, when the failure occurs, the methodology would proceed as
follows:
o The process of fault localization is similar to that of reroute
restoration in single domain network which is described in section
5.1.1.
o The fault localization delay is also T2-T1.
o PathErr information is sent to different PSL that differs from
fast reroute restoration scheme. Node 2 is the PSL as the ingress
node of backup path if the span recovery scheme is adopted.
Otherwise, consider other PSL as the ingress node of backup path
if segment restoration scheme is implemented.
o Then record the timestamp (T3) when the first bit of notification
information is sent out by the node 2 to the PSL which is
responsible for switching over the traffic.
o Record the timestamp (T4) when the PSL receives the last bit of
the notification information.
o Notification delay is T4-T3.
o Record the timestamp (T5) after PSL receives the PathErr Message.
The PSL computes a new path through either a series of route
algorithms or pre-computed scheme (eg. 1-2-5-3-4).
o PATH and RESV signaling are responsible for path establishment
request and resource reservation respectively for the backup path.
Then the traffic is switched from a working path to the backup
path. Record the timestamp (T6) when the first packet of traffic
arrives at the ML(node 3) through the backup path.
Zhang, et al. Expires April 21, 2012 [Page 11]
Internet-Draft Survivability Evaluation Metric October 2011
o Recovery time is T6-T5.
o The total fault restoration time is T2+T4+T6-T1-T3-T5.
+-------+ +-------+ +-------+ +-------+
+ Node1 +-----c----+ Node2 +--- c ----+ Node3 +--- c ----+ Node4 +
-+-------+----------+-------+----##----+-------+----------+-------+->
+ + + +--| -+ + + +
+-------+ +-------+ | |+-------+ +-------+
| |
| |
| |-------|
\ +-------+ |
\ + Node5 +--|
+ +
+ +
+-------+
Figure 3: Fast reroute of fault in single domain (indicated by ## in
the figure)
5.2. Fault restoration within a domain in multi-domain network
5.2.1. Reroute
Figure 4 describes the node connection situation. As illustrated
node 1 and node 4 are in domain A and B respectively and node 2 and 3
are all in domain B. Generally, when the failure occurs, the
methodology would proceed as follows:
o The node 3 sends Channelstatus Message to node 2 indicating the
failure to the corresponding upstream node.
o Record the timestamp (T1) when the first bit of Channelstatus
Message is sent to the node 2 along the LSP.
o When node 2 receives the ChannelStatus message from node 3, it
returns a ChannelStatusAck message back to node 3 and correlates
the failure locally. When Node 2 correlates the failure and
verifies that the failure is clear, it has localized the failure
to the data link between Node 3 and node 2. At that time, Node 2
sends a ChannelStatus message to Node 3 indicating that the
failure has been localized.
Zhang, et al. Expires April 21, 2012 [Page 12]
Internet-Draft Survivability Evaluation Metric October 2011
o Then record the timestamp (T2) when the last bit of ChannelStatus
message from node 2 is received by node 3.
o The fault localization delay is T2-T1.
o Node 2 sends the notification information to the source node of
the LSP(node 1) traversing intra-domain nodes and border nodes.
Notification time depends on whether the source and the
destination node are in the same domain or not. Then record the
timestamp (T3) when the first bit of notification information is
sent out.
o Record the timestamp (T4) when the node 1 receives the last bit of
the notification information.
o Notification delay is T4-T3.
o Record the timestamp (T5) after node 1 receives the PathErr
Message. As the PSL, node 1 finds a new path through either a
series of route algorithms or pre-computation scheme.
o PATH and RESV signaling are responsible for path establishment
request and resource reservation respectively for a new backup
path. Then the traffic is switched from a working path to the
backup path. Record the timestamp (T6) when the first packet of
traffic arrives at the destination node (node 4) through the
backup path.
o Recovery time is T6-T5.
o The total fault restoration time is T2+T4+T6-T1-T3-T5.
+-------+ | +-------+ +-------+ | +-------+
+ Node1 +----|c----+ Node2 +--- c ---+ Node3 +----|c ---+ Node4 +
--+-------+----|-----+-------+----##---+-------+----|-----+-------+->
+ + | + + + + | + +
+-------+ | +-------+ +-------+ | +-------+
Domain A | Domain B | Domain C
Figure 4: Reroute of fault within a domain in multi-domain
network(indicated by ## in the figure)
Zhang, et al. Expires April 21, 2012 [Page 13]
Internet-Draft Survivability Evaluation Metric October 2011
5.2.2. Fast Reroute
Figure 5 describes the node connection situation that is node 1 and
node 4 are in domain A and B respectively and node 2 ,3 and 5 are all
in domain B.
Generally, when the failure occurs between node 2 and 3 in domain B,
the methodology would proceed as follows:
o The process of fault localization is similar to that of reroute
restoration in single domain network which is described in section
5.1.2. The fault localization delay is also T2-T1.
o Notification information is sent to different PSL that differs
from fast reroute restoration scheme. Node 2 is the PSL as the
ingress node of restoration path if the span recovery scheme is
adopted. Otherwise, consider other PSL as the ingress node of
restoration path if segment recovery scheme is implemented.
o Then record the timestamp (T3) when the first bit of notification
information is sent out by the node 2 to the PSL which is
responsible for switching over the traffic.
o Record the timestamp (T4) when the PSL receives the last bit of
the PathERR message.
o Notification delay is T4-T3.
o Record the timestamp (T5) after PSL receives the PathErr Message.
The PSL finds a new path through either a series of route
algorithms or pre-computed schemes.
o PATH and RESV signaling are responsible for path establishment
request and resource reservation respectively for a new backup
path. Then the traffic is switched from a working path(2-3) to
the backup path(2-5-3). Record the timestamp (T6) when the first
packet of traffic arrives at the ML(node 3) through the backup
path.
o Recovery time is T6-T5.
o The total fault restoration time is T2+T4+T6-T1-T3-T5.
o If the intra-domain fast reroute mechanism fails, reroute
restoration is triggered whose methodology is illustrated in
section 5.2.1.
Zhang, et al. Expires April 21, 2012 [Page 14]
Internet-Draft Survivability Evaluation Metric October 2011
+-------+ | +-------+ +-------+ | +-------+
+ Node1 +-----|c---+ Node2 +--- c ---+ Node3 +--- |c ---+ Node4 +
--+-------+-----|----+-------+----##---+-------+----|-----+-------+->
+ + | + + | |+ + | + +
+-------+ | +-------+ | |+-------+ | +-------+
| | | |
| | | |
Domain A | | |-------| | Domain C
| \ +-------+ | |
| \ + Node5 +--| |
+ +
+ +
+-------+
Domain B
Figure 5: Fast reroute of fault within a domain in multi-domain
network(indicated by ## in the figure)
5.3. Inter-domain fault restoration in multi-domain network
Figure 6 describes the node connection situation. As illustrated
node 1 and node 4 are in domain A and C respectively and node 2,3 and
5 are all in domain B.
Generally, when the failure between domain A and B occurs, the
methodology would proceed as follows:
o The node 4 sends Channelstatus Message to node 3 indicating the
failure to the corresponding upstream node.
o Record the timestamp (T1) when the first bit of Channelstatus
Message is sent to the node 3 along the LSP.
o When node 3 receives the ChannelStatus message from node 4, it
returns a ChannelStatusAck message back to node 4 and correlates
the failure locally. When Node 3 correlates the failure and
verifies that the failure is clear, it has localized the failure
to the data link between Node 3 and node 4. At that time, Node 3
sends a ChannelStatus message to Node 4 indicating that the
failure has been localized.
o Record the timestamp (T2) when the last bit of ChannelStatus
message from node 3 is received by node 4.
o The fault localization delay is T2-T1.
Zhang, et al. Expires April 21, 2012 [Page 15]
Internet-Draft Survivability Evaluation Metric October 2011
o Measurement method of notification delay is the same to that of
fault reroute restoration within a domain in multi-domain network
as described in section 5.2.1.
o Notification delay is T4-T3.
o Record the timestamp (T5) after node 1 receives the PathErr
Message. Node 1 as the PSL computes a new path through either a
series of route algorithms or pre-computed scheme. Consider to
choose a backup path bypass the upstream domain of fault link if
the fault link is the only link between domain B and domain C.
o PATH and RESV signaling are responsible for path establishment
request and resource reservation respectively for a new backup
path. Then the traffic is switched from a working path to the
backup path. Record the timestamp (T6) when the first packet of
traffic arrives at the destination node (node 4) through the
backup path.
o Recovery time is T6-T5.
o The total fault restoration time is T2+T4+T6-T1-T3-T5.
| |
+-------+ | +-------+ +-------+ | +-------+
+ Node1 +-----|c----+ Node2 +--- c --+ Node3 +--- |c ---+ Node4 +
--+-------+-----|-----+-------+--------+-------+----|##---+-------+->
+ + | + +--| -+ + | + +
+-------+ | +-------+ | |+-------+ | +-------+
| | | | Domain C
| | | |
Domain A | | |--------| |
| \ +-------+ | |
| \ + Node5 +-| |
+ +
+ +
+-------+
Domain B
Figure 5: Inter-domain fault in multi-domain network(indicated by ##
in the figure)
Zhang, et al. Expires April 21, 2012 [Page 16]
Internet-Draft Survivability Evaluation Metric October 2011
6. Protocol Extension Requirements
It is assumed that clock of every control node is synchronous during
the process of measurement. Control plane reports different time to
NMS(Network Management System) which is responsible for computing the
sum of different fault restoration duration time. LMP and RSVP
extensions are required in order to record precise the start and end
time in every restoration phrase.
In the process of fault location measurement, detection entities send
alarm information to upstream neighbor node through signaling of LMP
when it detects the fault in control plane. It is necessary to
extend LMP by adding a FAULT_TIMESTAMP object as a timestamp in the
ChannelStatus Message. The FAULT_TIMESTAMP Object could be used to
record the time when the signaling is sent and received to measure
the precise fault location notification time. Then when the fault
notification is implemented, the fault indicating signal is delivered
to the PSL through the PathErr signal of RSVP. SEND_ERR_TIMESTAMP
and RECEIVE_ERR_TIMESTAMP Objects are added in PathErr signal and
defined to record the time of notification signal sent and received
by upstream node next to the fault and PSL respectively.
7. Security Considerations
As this document is solely for the purpose of providing metric
methodology and describes neither a protocol nor a protocol
implementation, there is no security considerations associated with
this document.
8. Acknowledgments
We wish to thank Jiuyu Xie, Yongli Zhao and Shengwei Meng for their
comments and help.
The RFC text was produced using Marshall Rose's xml2rfc tool.
9. Normative References
[RFC3473] Berger, L., "Generalized Multi-Protocol Label Switching
(GMPLS) Signaling Resource Reservation Protocol-Traffic
Engineering (RSVP-TE) Extensions", RFC 3473, January 2003.
[RFC4090] Pan, P., Swallow, G., and A. Atlas, "Fast Reroute
Extensions to RSVP-TE for LSP Tunnels", RFC 4090,
May 2005.
Zhang, et al. Expires April 21, 2012 [Page 17]
Internet-Draft Survivability Evaluation Metric October 2011
[RFC4204] Lang, Jonathan P., "Link Management Protocol (LMP)",
RFC 4204, October 2005.
[RFC4426] Lang, Jonathan P., "Generalized Multiprotocol Label
Switching (GMPLS)Recovery Functional Specification",
RFC 4426, March 2006.
[RFC4427] Mannie, E. and D. Papadimitriou, "Recovery (Protection and
Restoration) Terminology for Generalized Multi-Protocol
Label Switching (GMPLS)", RFC 4427, March 2006.
[RFC4428] Papadimitriou, D. and E. Mannie, "A Backward-Recursive
PCE-Based Computation (BRPC) Procedure to Compute Shortest
Constrained Inter-Domain Traffic Engineering Label
Switched Paths", RFC 4428, March 2006.
Appendix A. Other Authors
1. Haiyi Zhang
MIIT
No.52 Hua Yuan Bei Lu,Haidian District
Beijing 100083
P.R.China
Phone: +861062300100
Email: Zhanghaiyi@mail.ritt.com.cn
Authors' Addresses
Min Zhang
BUPT
No.10,Xitucheng Road,Haidian District
Beijing 100876
P.R.China
Phone: +8613910621756
Email: mzhang@bupt.edu.cn
URI: http://www.bupt.edu.cn
Zhang, et al. Expires April 21, 2012 [Page 18]
Internet-Draft Survivability Evaluation Metric October 2011
Lifang Zhang
BUPT
No.10,Xitucheng Road,Haidian District
Beijing 100876
P.R.China
Phone: +8615210889041
Email: capricorn7111@hotmail.com
URI: http://www.bupt.edu.cn/
Yuefeng Ji
BUPT
No.10,Xitucheng Road,Haidian District
Beijing 100876
P.R.China
Phone: +8613701131345
Email: jyf@bupt.edu.cn
URI: http://www.bupt.edu.cn/
Yunbin Xu
BUPT
No.52 Hua Yuan Bei Lu,Haidian District
Beijing 100083
P.R.China
Phone: +8613681485428
Email: xuyunbin@mail.ritt.com.cn
URI: http://www.catr.cn/
Yu Wang
CATR
No.52 Hua Yuan Bei Lu,Haidian District
Beijing 100083
P.R.China
Phone: +8613651161646
Email: wangyu@mail.ritt.com.cn
URI: http://www.catr.cn/
Zhang, et al. Expires April 21, 2012 [Page 19]