Internet DRAFT - draft-li-bfd-somfdl
draft-li-bfd-somfdl
Network Working Group
Internet Draft Li Yiwei
Intended Status: Informational XJTU
Zhao Jihong
XJTU
Wang Li
XJTU
Expires: March 2002 September 2011
Service-Oriented Mechanism for Fault Detection and Localization
draft-li-bfd-somfdl-00.txt
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79. This document may contain material
from IETF Documents or IETF Contributions published or made publicly
available before November, 2008. The person(s) controlling the
copyright in some of this material may not have granted the IETF
Trust the right to allow modifications of such material outside the
IETF Standards Process. Without obtaining an adequate license from
the person(s) controlling the copyright in such materials, this
document may not be modified outside the IETF Standards Process, and
derivative works of it may not be created outside the IETF Standards
Process, except to format it for publication as an RFC or to
translate it into languages other than English. This document is an
Internet-Draft and is in full conformance with all provisions of
Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as
Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire in March 2012.
Copyright Notice
Li, Zhao, Wang Expires - March 2012 [Page 1]
SOMFDL September 2011
Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with
respect to this document.
Abstract
This document describes a Service-Oriented Mechanism for Fault
Detection and Localization (SOMFDL). This mechanism brings up the
concepts of "Chord supplementary" and "QoS Trigger". By monitoring
the real-time QoS level of the forward path, network can launch the
mechanism when the QoS level cannot satisfy the requirement of
service transmission, so that fault detection and localization can be
realized. By designing related protocols, the manipuility of this
mechanism can be guaranteed. Utilizing this mechanism, network can
accomplish fault detection and localization in a time scale of
milliseconds with a few datagram sent.
Conventions used in this document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC-2119 [i].
Table of Contents
1. Introduction...................................................3
1.1 Terminology................................................4
2. Application Scenarios..........................................4
3. Specific Operation Process.....................................5
3.1 QoS Trigger................................................5
3.2 Fault Index................................................7
3.3 Fault Confirm..............................................7
3.4 Fault Alarm................................................8
3.5 Fault Localization.........................................8
4. Design of Datagram.............................................9
Security Considerations..........................................11
IANA Considerations..............................................11
References.......................................................12
Acknowledgments..................................................13
Author's Addresses...............................................13
Li, Zhao, Wang Expires - March 2012 [Page 2]
SOMFDL September 2011
1. Introduction
With the development of telecommunication technology as well as the
rise of user's expectation to QoS, service has become the main
driving force which leads the development of telecommunication
network. In this context, new services, such as VoIP, IPTV and 3G
services, emerge in endlessly. However, some suddens may come out
when service data is being transmitted, which result in a serious
decline of network performance, so that the QoS of services cannot be
guaranteed, that is service fault. Service fault includes
interruption of services and congestion of services. The former is
usually caused by wrong configuration of some software or hardware
parameters, meanwhile, the latter is mainly caused by inappropriate
routing policy.
It is necessary to import a service-oriented fault detection and
localization mechanism into IP network. In this way, the reliability
of service data transmitting can be ensured. Moreover, it can help to
improve the survivability of network and provide a good capacity of
fault detection and localization.
Although the existed methods of fault detection, such as Slow-Hello
mechanism and BFD mechanism, are widely applied, they still cannot
finish fault detection within a second. Meanwhile, these mechanisms
will result in network jitter.
The existed method of fault localization, for example, the Loop Back
mechanism, cannot synthesize the localization information together,
which means that with these methods, network cannot locate the fault
accurately in a node, or a link. What's more, all of the existed
fault detection and localization methods are not service-oriented,
they cannot satisfy the needs of the new service control network.
Aiming at these problems, we come up with the Service-Oriented
Mechanism for Fault Detection and Localization. In this mechanism,
new techniques such as "Chord Supplementary" and "QoS Trigger" are
applied, which makes this mechanism have the advantages listed below.
a. This mechanism can realize service oriented fault detection and
localization.
b. This mechanism can be triggered when the network is in error,which
Li, Zhao, Wang Expires - March 2012 [Page 3]
SOMFDL September 2011
reduces the total number of the datagram, that means the
possibility of network jitter will be lower.
c. This mechanism can combine source node and destination node in the
service forwarding path together, so that they can work
collaboratively, share the fault localization information, and
locate the fault accurately.
This mechanism can shorten the time scale of fault detection and
localization to milliseconds.
1.1 Terminology
This document assumes the terminology defined in [RFC2753]. For
convenience, the definition of a few key terms is repeated here:
SOMFDL: The abbreviation of the mechanism, which is short for
Service-Oriented Mechanism for Fault Detection and Localization.
Chord Supplementary: Utilizing the nodes excluded in service
forwarding path to guarantee the connectivity between source node and
destination node.
QoS Trigger:As soon as destination node has known the decline of QoS,
it will generate a QoS trigger datagram, and send it to source node
through the "Chord". This starting method can lower the total number
of datagram, and reduce the network jitter.
Explicit Routing: According to MPLS, we can obtain a explicit routing
which means the service provider (the source node) has been aware of
the information of all the nodes on the service forwarding path. All
of the packets for a single service will be transmitted in this
certain path.
2. Application Scenarios
From the development trend of network, a variety of services are
developing in the direction of IP-based, at the same time, most of
the new services are based on IP technology. In this article, these
new trends are fully taken into consideration, then the service
oriented fault detection and localization mechanism is brought up.
This mechanism can be applied in various IP-based networks, including
traditional IP networks, MPLS networks and new pattern Overlay
networks, and so on.
Li, Zhao, Wang Expires - March 2012 [Page 4]
SOMFDL September 2011
In these network environments, this mechanism can reduce the expend
of detecting and locating faults, diagnose the fault types accurately
with lower cost, estimate the cause of service faults, and locate the
faults in specific sites. In a word, this mechanism can satisfy the
fault detection and localization requirements of different kinds of
services.
3. Specific Operation Process
The working process of SOMFDL includes several procedures listed
below:
a. QoS Trigger:In this section,network will construct the end-to-
end service forwarding path and finish chord supplementary. Then,
destination node will take charge of monitoring the real-time QoS.
As soon as the QoS goes down, destination node will inform this
situation of source node, and trigger the mechanism.
b. Fault Index: In this section, source node will search the EndNode
along downstream path, meanwhile, destination node will do the
same thing along upstream path.
c. Fault Confirm: In Fault Index section, the EndNodes are found out.
Now each EndNode will send confirm datagram to the unreachable
neighbor node periodically so as to estimate what kind of fault
has happened, congestion or interrupt.
d. Fault Alarm: In this section, each EndNode will generate
congestion or interrupt alarm datagram,and send it to source node.
e. Fault Localization: In this section, each EndeNode will generate
fault localization datagram and send it to source node. Then,
source node will analyze the datagrams and find out the cause and
location of service fault.
Next, the specific implemention process of SOMFDL will be described.
3.1 QoS Trigger
When source node receives the request of a service from destination
node, it will synthesize the constraint condition of network and
expected QoS level of the service together, and find the best service
forwarding path from source node to destination node, at the same
time, the chord between source node and destination node will be
supplemented. Then, destination node will be in charge of monitoring
the real-time QoS level of the service. As soon as QoS falls in a
Li, Zhao, Wang Expires - March 2012 [Page 5]
SOMFDL September 2011
sudden, destination node will immediately generate QoS Trigger
datagram, and send it to source node through the chord. When source
node receives the QoS Trigger datagram, the mechanism will be
triggered.
The type of services mentioned above is pretty various, including
common Internet services, IPTV, VoIP, online games, 3G services, and
so on. Although the techniques applied to build up the best
forwarding path for different services vary, it isn`t the main point
we focus in this mechanism.
The indexes of service QoS are mainly consisted of bandwidth, delay,
jitter (variation of delay) and packet loss rate.
Monitoring the real-time QoS level is usually conducted in this way:
firstly, source node divides service data into different levels.Then,
using the last 6 bits of the ToS field in IPv4/IPv6 header, 64
different DSCP values can be generated, which can be utilized to mark
priority of different service data. If all nodes in the network
topology are configured to carry out data classification based on the
DSCP tag, then these nodes can easily identify the service data
packet from source node, and monitor the real-time QoS level of
marked service data. For example, when several end-to-end service
forwarding paths in the network coincide in a node (or a link), the
public node (or the two nodes connected by the public link) can
successfully distinguish different service forwarding paths by
analyzing the DSCP value, source IP address and destination IP
address in the service data packets. Thus, the QoS of different
services can be monitored in time.
QoS diagnosis information is carried by QoS Trigger datagram. The
information includes one or more the sudden falling QoS indexes, such
as the sudden rising packet loss rate.
The process of chord supplementary: After source node has synthesized
the constraint condition of network and expected QoS level of the
service, the best service forwarding path will be built up, it can be
donated by Path. All paths that can transmit data between source node
and destination node constitute aggregate A. Besides Path, all others
belonging to A are called chord, it can be donated by B. The paths in
B may coincide to some extent. Path is an element of A, and B is a
subset of A.
Through the chord B, source node can send Multi_hop Alive datagram
periodically, as long as source node can receive response datagram
from the opposite end, the connectivity between source node and
destination node is good. Here, the mechanism mainly focuses on the
connectivity of the chord, as for the intermediate nodes passed by
the datagram, not within the scope of our attention. Since B is an
aggregate which consists of several connected paths between source
Li, Zhao, Wang Expires - March 2012 [Page 6]
SOMFDL September 2011
node and destination node, the possibility that all paths break down
is absolutely low. So, the connectivity of chord can be well
guaranteed.
3.2 Fault Index
In this section, source node generates Fault Index datagram, and
sends it downstream to each node hop-by-hop. By this, the EndNode on
the forwarding path can be found out. When the downstream node
receives Fault Index datagram, it will reply a response datagram to
the upstream node. At the same time, it will add its healthy
information into Fault Index datagram and send it to the next hop.
This process will not stop until one node cannot receive a response
datagram from the next hop within index time. Meanwhile, destination
node also generates a Fault Index datagram, and sends it upstream to
each node hop-by-hop, specific operation is the same as source node
does.
Source node and destination node both generate Fault Index datagram,
and send them to the direction of fault scenario through service
forwarding path and its inverse path.
The node which doesn`t receive the response datagram is the last node
that can be reached by index datagram, it is called (Upstream or
Downstream) Reachable EndNode.
The index time is (1+0.1*N)*t. Here, t is the interval from the time
when a node sends its neighbor index datagram to the time when it
receives the response datagram from its neighbor node. N is an
arbitrary integer equal or greater than zero, the selection of N will
take the QoS diagnosis information in QoS Trigger datagram into
consideration.
3.3 Fault Confirm
In this section, the Upstream Reachable EndNode generates Fault
Confirm datagram, and sends it to its unreachable next hop
periodically.If the next hop node successfully receives the datagram,
it will immediately reply with a response datagram. If the Reachable
EndNode can get arbitrary response datagram, it means that the
decline of QoS is caused by service congestion, the fault type is
service congestion. On the contrary, if the Reachable EndNode doesn`t
receive any response datadram, it means that service interrupt leads
to the decline of QoS,the fault type is service interrupt. Similarly,
the Downstream Reachable EndNode implements this procedure in the
same way.
Li, Zhao, Wang Expires - March 2012 [Page 7]
SOMFDL September 2011
The confirm time is N*t. The selection of N will take expected QoS
level of this kind of service into consideration.
The cause of service congestion can be various, such as several links
send datagrams to a single link at the same time, high speed link
transmits data to low speed link, non-critical services seize
critical services, services overflow, and so on.
The cause of service interrupt can be node fault or link fault.
After above mentioned section, the name of Reachable EndNodes will be
changed to Fault Localization Nodes. They locates at both sides of
the fault location, just next to the fault location. The one which
locates upstream is called Upstream Fault Localization Node, and the
other is named Downstream Fault Localization Node.
3.4 Fault Alarm
Upstream Fault Localization Node generates Fault Alarm datagram and
sends it to source node, the information of service congestion or
interrupt is contained in the datagram. When source node receives the
Fault Alarm datagram from Upstream Fault Localization Node, it can be
aware of the fault type. In this section, Downstream Localization
Node doesn`t do anything.
3.5 Fault Localization
After fault type is found out, Upstream and Downstream Fault
Localization Nodes both generate Fault Localization datagram and send
them to source node. During this process, the normal nodes passed by
the datagram will add their healthy information to the localization
datagram. After source node receives Fault Localization datagrams
from two directions, it will compute and estimate the place where the
fault happens.
The localization datagram generated by Upstream Localization Node
will be sent to source node through the inverse path of service
forwarding path. While, the one generated by Downstream Localization
Node will be sent to destination node through service forwarding
path, and then , destination node will send it to source node
through the chord.
Further, this technical proposal includes:
Li, Zhao, Wang Expires - March 2012 [Page 8]
SOMFDL September 2011
Source node will compare the localization information (mainly the
healthy information of nodes) with the explicit routing information.
If the localization information lacks the number of a node, it means
the service fault happens at the node.If the information is integral,
it means the link between Upstream and Downstream Localization Node
is in the state of congestion or interrupt. That is, if the Upstream
and Downstream Localization Nodes are neighbor nodes, the fault must
be in the link between them. However, if the two localization nodes
are not neighbors, the fault must be caused by the node between them.
Source node can xor the localization information with the explicit
routing, and gains the ID of fault node.
4. Design of Datagram
The format of the datagram is shown as the figure below.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Vers|M|Q|S|C|A|L|E| QD |C\I| N | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Session Identification Code (ID) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Echo Receive Time Interval (ETI) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Location Diagnostics Information (LDI) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Reverse |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure: Datagram Format
The function of each field is listed below.
Vers field, whose full name is Version. This field is used to
indicate the version of the datagram,its length is 3bit. For example,
if the datagram is the first version, the field will be set to 001.
M field, whose full name is Multi-Hop-Alive. The length of this field
is 1bit. When source node start to transmit data to destination node,
it will be set to 1, representing Multi_hop Alive datagram.
Q field, whose full name is QoS Trigger. The length of this field is
1bit. When destination node discovers a sudden fall of QoS, the field
will be set to 1, representing QoS Trigger datagram. That means the
mechanism has been in the section of QoS Trigger.
S field, whose full name is Search. The length of this field is 1bit.
When source node receive QoS Trigger message from destination node,
Li, Zhao, Wang Expires - March 2012 [Page 9]
SOMFDL September 2011
it will be set to 1, representing Fault Index datagram. That means
the mechanism has been in the section of Fault Index.
C field, whose full name is Confirm.The length of this field is 1bit.
When the Reachable EndNodes are found, this field will be set to 1,
representing Fault Confirm datagram. That means the mechanism has
been in the section of Fault Confirm.
A field, whose full name is Alarm. The length of this field is 1bit.
When fault type is found out, this field will be set to 1,
representing Fault Alarm datagram. That means the mechanism has been
in the section of Fault Alarm, Upstream Fault Localization Node will
send alarm to source node, so that source node will be informed of
the fault type.
L field, whose full name is Location. The length of this field is
1bit. When the fault type has been found out, this field will be set
to 1, representing Fault Localization datagram. That means the
mechanism has been in the section of Fault Localization.
E field, whose full name is Echo. The length of this field is 1bit.
When response datagram is needed, this field will be set to 1. On the
contrary, this field will be set to 0.
QD field, whose full name is Alarm Diagnostic. The length of this
field is 4bit, from the low order to the high order, the bit
successively presents delay, packet loss rate, jitter and bandwidth.
For example, 0001 represents delay, 0010 represents packet loss rate
and 0111 represents the performances of delay, packet loss rate and
jitter go down at the same time. One point must be noticed, that this
field must be used coordinate with Q field to indicate which
performance has fall.
C/I field, whose full name is Service Congestion/Interruption. The
length of this field is 2bit. This field also needs to be used with Q
field to describe the fault type. When the field is set to 01, it
means the fault type is service congestion. When the field is set to
10, it means the fault type is service interrupt. Other condition
means service has no fault.
N field, whose length is 8bit. If this field is used together with S
field, it can select fault confirm time,representing a multiple of t.
If this field is used together with C field, it can select fault
index time, representing a multiple of 0.1*t.
Length field, whose length is 8bit, is set to indicate the length of
fault detection and localization datagram.
Li, Zhao, Wang Expires - March 2012 [Page 10]
SOMFDL September 2011
ID field, whose full name is Session Identification Code. The length
of this field is 32bit. This field is the only non-zero value sent by
the source, used to indicate different sessions.
ETI field, whose full name is Echo Receive Time Interval. The length
of this field is 32bit. This field represents the time interval from
the time when a node sends its neighbor index datagram to the time
when it receives the response datagram from its neighbor node.
LDI field, whose full name is Location Diagnostic Information. The
length of this field is 32bit. This field is used to store the ID of
Upstream and Downstream Localization Node, as well as the healthy
information of the nodes the datagrams pass by.
SDP field, whose full name is Service Data Patch. This field is used
to store service data when sending index and confirm datagram. The
length of this field varies according to different services.
Reserve field, this field is reserved to future use, whose length is
32bit.
Security Considerations
As SOMFDL can be used in IP-based networks, especially MPLS networks,
it may be tied into the stability of the network infrastructure (such
as routing protocols). However, the effects of an attack on a SOMFDL
session will not be so serious as that on a BFD session, for the a
SOMFDL session is triggered to start, not the same as BFD, which
sends detecting datagrams all the time.
Also, the concept of Chord supplementary has made the mechanism more
reliable. The "Chord" is a aggregate, it is hard for the attacks to
damage all the extra links between source and destination node. That
is, the connectivity of the detecting path can be guaranteed, the
attack vulnerability will be reduced to a low level.
IANA Considerations
The document defines the format of the datagram for SOMFDL. In the
packet, different fields should be administered by IANA. The fields
are designed to make the mechanism work better.
The field entitled "M" is to indicate the connectivity of the chord.
The fields of "Q" "S" "A" "C" "L" and "E" are assigned to represent
different types of datagram sent by the network elements in different
working sections of SOMFDL.
Li, Zhao, Wang Expires - March 2012 [Page 11]
SOMFDL September 2011
The "QD" field is the indicator of QoS parameters.
The "C\I" field is designed here just to indicate different fault
types. It is the mark which shows the fault is congestion or
interrupt.
The "ID" field is set to distinguish different sessions of different
types of services. And the "ETI" field comes out to show the
coefficient of time interval which can control the length of each
working section of SOMFDL.
The "LDI" field is set to store the localization information added by
the network nodes on the service forwarding path. When source node
gains the entire information from "LDI" field, it will compute to
find out the place where the fault happens.
Also, we have the "Reserve" field which could be used in
authentication and other necessary activities.
References
Normative Reference
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC3031] Rosen, E., Viswanathan, A. and Callon R., "Multiprotocol
Label Switching Architecture", RFC 3031, January 2001.
[RFC3212] Jamoussi, B., Andersson, L. and Callon, R., et al.,
"Constraint-Based LSP Setup using LDP", RFC 3212, January
2002.
[RFC5880] Katz, D. and Ward, D., "Bidirectional Forwarding
Detection", RFC 5880, June 2010.
[RFC5882] Katz, D. and Ward, D., "Generic Application of
Bidirectional Forwarding Detection (BFD)", RFC 5882, June
2010.
[RFC5883] Katz, D. and Ward, D., "Bidirectional Forwarding
Detection (BFD) for Multihop Paths", RFC 5883, June 2010.
[RFC5884] Aggarwal, R., Kompella, K., Nadeau, T. and Swallow G., "
Bidirectional Forwarding Detection (BFD) for MPLS Label
Switched Paths (LSPs)", RFC 5884, June 2010.
Li, Zhao, Wang Expires - March 2012 [Page 12]
SOMFDL September 2011
Informative References
[RFC2026] Bradner, S., "The Internet Standards Process-Revision 3",
BCP 9, RFC 2026, October 1996.
[RFC2753] Yavatkar, R., Pendarakis, D. and Guerin, R., "A Framework
for Policy-based Admission Control", RFC2753 , January
2000.
[RFC3945] Mannie, E., "Generalized Multi-Protocol Label Switching
(GMPLS) Architecture", RFC 3945, October 2004.
[RFC5881] Katz, D. and Ward, D., "BFD for IPv4 and IPv6 (Single
Hop)", RFC 5881, January 2010.
Acknowledgments
This context is written to provide a service-oriented mechanism for
fault detection and localization.
Demand mode was inspired by draft-ietf-bfd-multihop-09.txt and draft-
ietf-bfd-base-11.txt, both of which are submitted by D. Katz and D.
Ward.
Also, the author would like to thank the writers of the RFCs relevant
to MPLS.
Author's Addresses
Li Yiwei
Xi`an Jiaotong University
No.28, Xianning West Road, Xi'an, Shaanxi, P.R. China
Email: leeeve@stu.xjtu.edu.cn
Zhao Jihong
Xi`an Jiaotong University
No.28, Xianning West Road, Xi'an, Shaanxi, P.R. China
Email: eeleeg@gmail.com
Wang Li
Xi`an Jiaotong University
No.28, Xianning West Road, Xi'an, Shaanxi, P.R. China
Email: wanglee513@gmail.com
Li, Zhao, Wang Expires - March 2012 [Page 13]