Internet DRAFT - draft-yuan-hiprg-failure-detection-recovery
draft-yuan-hiprg-failure-detection-recovery
HIPRG Tao Yuan
Internet Draft Bo Hu
Intended status: Experimental Shanzhi Chen
Expires: July 2, 2012 Qinqin Chu
Zhangfeng Hu
Wenjun Li
BUPT, China
January 2, 2012
HIP-based Failure Detection and Recovery for Multihoming
draft-yuan-hiprg-failure-detection-recovery-03.txt
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
This Internet-Draft will expire on July 2, 2012.
Copyright Notice
Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents carefully,
as they describe your rights and restrictions with respect to this
document.
Yuan, et al. Expires January 5, 2012 [Page 1]
Internet-Draft HIP Failure Detection and Recovery January, 2012
Abstract
Fault tolerance is one of greatest feature of multihomed host. This
document proposes a failure detection and communication recovery
mechanism for Host Identity Protocol. It can be applied to HIP by
using new defined HIP parameters. A HIP-enabled multihomed host can
switch to alternative locator if current link breaks down by adopting
this mechanism.
Yuan, et al. Expires January 5, 2012 [Page 2]
Internet-Draft HIP Failure Detection and Recovery January, 2012
Table of Contents
1. Introduction................................................4
2. Alternative Proposals........................................4
2.1. Reap4hip...............................................4
3. HIP locator management.......................................5
3.1. Locator States.........................................5
3.2. Locators of Multihomed Host.............................6
4. Failure Detection and Communication Recovery for HIP..........7
4.1. Initialization Module...................................7
4.2. Link-Monitor Module.....................................8
4.3. Link-Keep Module........................................9
4.4. Recovery Module.........................................9
5. Packet Formats.............................................12
5.1. New HIP Parameter......................................12
5.1.1. KEEPALIVE_TIMEOUT.................................12
5.2. NOTIFICATION Parameter.................................12
5.2.1. KEEPALIVE........................................12
5.2.2. INVALD_LOCATOR....................................13
6. Security Considerations.....................................13
7. IANA Considerations........................................13
8. References.................................................13
8.1. Normative References...................................13
8.2. Informative References.................................13
Authors' Addresses............................................14
Yuan, et al. Expires January 5, 2012 [Page 3]
Internet-Draft HIP Failure Detection and Recovery January, 2012
1. Introduction
Multihomed host has multiple locators available for fault tolerance.
This means it can use an alternative locator if current link breaks
down without re-establishing connection with correspondent peers. To
achieve such failure tolerance capability, host needs a failure
detection mechanism to decide when to process such a locator
switching, and how to recover communication.
RFC5206 describes HIP support for multihoming. It mainly focused on
how to set up multiple connections between multihomed host and
correspondent peers, but left multihoming support in advanced
scenarios for further study. We propose a failure detection and
recovery extension for HIP in this document. It can be implemented
easily without too many modifications to HIP protocol.
2. Alternative Proposals
2.1. Reap4hip
Reap4hip is a solution which considers fault tolerance capability in
multihomed HIP hosts. In order to support such configurations,
reap4hip introduced REAP protocol which described in RFC5534 for HIP,
and thus updated the behavior for HIP multihomed hosts currently
defined.
The REAP protocol provides failure detection and locator pair
exploration for the Shim6 protocol. Reap4hip applied REAP's features
to HIP. It created a REAP instance to communicate with the ESP and
HIP layers. For the on-going source/destination locator pair,
rep4hip sets up two timers to detect communicating condition. Besides,
a new message named Keepalive message is introduced to maintain the
reachability of a locator. Once the path breaks down, hosts try all
possible locator pairs constantly until a new available locator pair
is found.
To implement reap4hip, a new message named Probe message is defined
for HIP. It is used to probe new locator pair for two communicating
hosts after currently used locator pair become invalid. The probe
process would try possible locator pairs by sending Probe message
constantly until at least one working locator pair is found. Since
the Probe message format is huge, the probe process may negatively
impact the performance of both host and network. Besides, Probe
message and Keepalive message is not a HIP parameter, it may not be
applied to existing HIP code easily.
Yuan, et al. Expires January 5, 2012 [Page 4]
Internet-Draft HIP Failure Detection and Recovery January, 2012
3. HIP locator management
3.1. Locator States
A HIP locator can have three kinds of status. The status is used to
track the reachability of a locator. They are: UNVERIFIED, ACTIVE and
DEPRECATED. UNVERYFIED locator can not be used unless the
reachability of it has been verified. ACTIVE indicates that the
reachability of the locator has been verified and the locator has not
been deprecated. DEPRECATED indicates an expired locator. As
described in RFC5206, the following state changes are allowed:
If current status of a locator is UNVERIFIED, the status can be
changed to ACTIVE if the reachability procedure completes
successfully, or can be changed to DEPRECATED if the lifetime expires.
If current status of a locator is ACTIVE, the status can be changed
to UNVERIFIED if there has been no traffic on the address for some
time, or can be changed to DEPRECATED if the lifetime expires.
If current status of a locator is DEPRECATED, the status can be
changed to UNVERIFIED if the host receives a new lifetime for the
locator.
A host may receive a LOCATOR parameter in R1, I2, UPDATE and NOTIFY
packets. For each locator listed in the LOCATOR parameter, the host
should check whether it is already bound to the SPI indicated. If the
locator is already bound, its lifetime is updated. If the status of
it is DEPRECATED, the status is changed to UNVERIFIED. If the locator
is not already bound, the locator is added, and its status is set to
UNVERIFIED. Mark all locators that were not listed in the LOCATOR
parameter as DEPRECATED. Furthermore, if the LOCATOR parameter
contains a new Preferred locator, the host should initiate a change
of the Preferred locator.
A host must verify the reachability of an UNVERIFIED locator. The
status of a newly learned locator MUST initially be set to UNVERIFIED
unless the new locator is advertised in a R1 packet as a new
Preferred locator. A host may also want to verify the reachability
of an ACTIVE locator again after some time, in which case it would
set the status of the locator to UNVERIFIED and reinitiate address
verification.
Yuan, et al. Expires January 5, 2012 [Page 5]
Internet-Draft HIP Failure Detection and Recovery January, 2012
3.2. Locators of Multihomed Host
Before failure detection operates, the two communicating hosts must
be aware of the set of locators which can be used. As described in
RFC5206, HIP multihomed hosts can convey multiple locators by sending
UPDATE message, and also by exchanging locators in R1 and I2 during
the HIP base exchange.
This locator set contains at least one pair of locators (the
currently used locators) in status of ACTIVE, and may contain several
UNVERIFIED or DEPRECATED locators. Along with the normal
communication, host must verify the reachability of all UNVERIFIED
locators. After the address verification procedure, additional ACTIVE
locators are available. Due to such a locator management mechanism in
HIP, in most case, the locator set of two communicating hosts
contains more than one pair of ACTIVE locators in multihoming
scenario
Furthermore, multihomed host must declare one locator from locator
set as Preferred locator on which it prefers to receive data. By
default, the locators used in the HIP base exchange are the Preferred
locators. A host can change the Preferred outgoing locator for
different reasons, e.g., because receiving a LOCATOR parameter that
has the "P" bit set.
This document considers a HIP multihoming scenario in which the two
communicating hosts have exchanged locator information, and indicated
Preferred locators. As shown in Figure 1, the function of failure
detection and communication recovery is added at the MH block. Once
currently used preferred locators become unreachable, hosts should
choose another locator as Preferred locator to maintain communication.
Yuan, et al. Expires January 5, 2012 [Page 6]
Internet-Draft HIP Failure Detection and Recovery January, 2012
+-------+
| TCP | (sockets bound to HITs)
+-------+
|
+-------+
---------->| ESP | {HIT_s, HIT_d} <-> SPI
| +-------+
| |
+-------+ +-------+
| MH |------>| HIP | {HIT_s, HIT_d, SPI} <-> {IP_s, IP_d, SPI}
+-------+ +-------+
|
+-------+
| IP |
+-------+
Figure 1: Architecture for HIP Mobility and Multi-homing (MH)
4. Failure Detection and Communication Recovery for HIP
The failure detection and communication recovery mechanism divides
HIP multihomed host into two kinds: D (Detection) host and K
(Keepalive) host. D host is responsible for monitoring current used
link, and maintains a Send Timer. When D host sends a payload packet,
it turns on the Send Timer. When receiving a payload packet, D host
turns off the Send Timer. K host is the corresponding host of D host.
Its main task is to keep the reachability of current used locator. If
no other packet has been sent since the payload packet has been
received, K host transmits a HIP NOTIFY message with KEEPALIVE
parameter to D host. K host maintains a Keepalive Timer to control
when to send the NOTIFY message. In addition, another procedure of
failure detection and communication recovery can be added in the
reverse direction when needed.
The failure detection and communication recovery mechanism consists
of four functional modules, which are: initialization module, link-
monitor module, link-keep module, and recovery module. These modules
are added at the MH block. The details of each module are described
as follows.
4.1. Initialization Module
Both D host and K host runs initialization module firstly when D host
triggers failure detection and communication recovery scheme. The
initialization module makes D host communicate the timeout value of
Yuan, et al. Expires January 5, 2012 [Page 7]
Internet-Draft HIP Failure Detection and Recovery January, 2012
its Send Timer to the K host as a KEEPALIVE_TIMEOUT parameter in the
I2, R1, or UPDATE messages.
For K host, upon receive packet that contains KEEPALIVE_TIMEOUT
parameter, initialization module will made it set its Keepalive Timer.
The timeout value of Keepalive Timer is set to the value of
KEEPALIVE_TIMEOUT parameter.
The initialization module is responsible for the initial work of
failure detection and communication recovery. After the work of
initialization module, D host and K host have reached an agreement on
how quickly NOTIFY message will be send and how rapidly detection of
failure will take place. The time value of Send Timer can be modified
in real time in order to adapt to unusual situations. The
communication of the two hosts is capable of fault tolerance by
initialization module's work.
4.2. Link-Monitor Module
In HIP, monitor currently used link aims at providing a mechanism to
detect the path between one pair of Preferred locators. If failure
takes place, a new pair of locators should be chosen as Preferred
locators. D host runs link-monitor module to detect the potential
failure. D host has four states under the control of link-monitor
module. The behavior of D host is described as follows:
Wait state. After the work of initialization module or having
received packet from K host, D host is in Wait state. It is the
initial state of D host. D host is waiting for the arrival of upper
packet. Whenever outgoing payload packets are generated, D host enter
Send Packet state.
Send Packet state. D host in Send Packet state is prepared for
sending packet to K host. It will turn on Send Timer, and then send
out the packet. After that, D host enter Detect state. Detect state.
While in Detect state, D host is monitoring current used link, and
the Send Timer is on. If D host received packet from K host in Detect
state, it will turn off Send Timer, and back to Wait state. Otherwise,
after Send Timer expires, if no return traffic from K host, D host
will assume current used link has broken down, and enter Failure
state to explore a new pair of locators.
Failure state. After the expiration of Send Timer, D host enter
Failure state. It will turn off Send Timer, and get ready for
communication recovery. Then, the recovery module will take in charge
of D host.
Yuan, et al. Expires January 5, 2012 [Page 8]
Internet-Draft HIP Failure Detection and Recovery January, 2012
4.3. Link-Keep Module
K host runs Link-Keep Module to keep the reachability of current used
locator. K host has five states under the control of Link-Keep Module.
The behavior of K host is described as follows:
Wait state. After the work of initialization module, having sent out
packet to D host or the expiration of Keepalive Timer, K host is in
Wait state. It is the initial state of K host. K host is waiting for
the arrival of packet from upper layer protocol or D host. Whenever
outgoing payload packets are generated, D host enter Send Packet
state to send out the packet. Whenever incoming payload packets are
received, K host enter Receive Packet state to bring incoming packets
into the proper process.
Send Packet state. K host in Send Packet state is prepared for
sending packet to D host. At this time, if Keepalive Timer is on, K
host will turn it off, and then send out the packet, back to Wait
state.
Receive Packet state. While incoming payload packets are received, K
host is in Receive Packet state. If the received packet is a HIP
NOTIFY packet with INVALD_LOCATOR parameter, K host will enter
Failure state. If it is a payload packet, K host will enter Keep
state.
Keep state. K host maintains the reachability of current used locator
pair in Keep state. If K host in Keep state is going to send packet,
it will turn off Keepalive Timer, and enter Send Packet state.
Otherwise, K host should turn on Keepalive Timer, and send HIP NOTIFY
packet with KEEPALIVE parameter to keep the reachability of current
used locator pair. This process is under the control of Keepalive
Timer. Usually, K host sends out HIP NOTIFY packet with KEEPALIVE
parameter at the frequency of every one-third to one-half of the time
value of Keepalive Timer until timer expires. After Keepalive Timer
expires, K host back to Wait state.
Failure state. K host is in Failure state for having received a HIP
NOTIFY packet with INVALD_LOCATOR parameter. It will turn off Send
Timer, and get ready for communication recovery. Then, the recovery
module will take in charge of K host.
4.4. Recovery Module
After the failure is detected, D host runs recovery module to start a
process to choose new Preferred locator pair to recover communication.
Yuan, et al. Expires January 5, 2012 [Page 9]
Internet-Draft HIP Failure Detection and Recovery January, 2012
There are two work modes that D host can choose to recovery, which
are Active mode and Passive mode.
A. Active mode
If D host is in active mode, it sends a LOCATOR parameter to K host
in an UPDATE packet as described in RFC5206. In this packet, a new
Preferred locator is set. If no responses are obtained during a
period of time, D host will assume the testing locator does not work.
Then, it may enter passive mode or choose another locator as
Preferred locator, and re-send UPDATE packet. The locator selection
order could be a local policy. This process may continue for a period
of time until available locator pair is found. If D host have tried
all its locators, and could not find any available locator, it must
enter passive mode. The behavior of D host in active mode while under
the control of Recovery Module is described as follows:
Choose Address state. It is the initial state of Active mode. D host
will choose a new address as Preferred locator to send UPDATE packet
to K host. Then it enters Wait Response state. D host can switch to
Passive mode on this state.
Wait Response state. After sending out the first UPDATE packet, D
host is in Wait Response state. If no response received from K host,
D host will back to Choose Address state. Otherwise, once receiving
the replied UPDATE packet from K host, it will enter HIP Update state.
HIP Update state. In this state, D host will finish the address
change with K host as defined in HIP protocol. Then, the
communication can be recovered.
For K host under the control of Recovery Module, after receiving the
UPDATE packet, it will send back an UPDATE packet as described in
RFC5206 to D host. Then, the two hosts continue their communication
using the new Preferred locator pair.
B. Passive mode
If D host is in passive mode, it selects one ACTIVE locator from the
locator set which has been exchanged with K host before. If there is
no ACTIVE locator can be used, D host must enter active mode. D host
sends a NOTIFY packet to K host by using the selected ACTIVE locator.
In this NOTIFY packet, the Notify Message Type is set to
INVALID_LOCATOR, which notifies errors in the communication path. The
NOTIFY packet may fail to reach K host. In such a case, D host may
Yuan, et al. Expires January 5, 2012 [Page 10]
Internet-Draft HIP Failure Detection and Recovery January, 2012
enter active mode or try another ACTIVE locator if no responses are
obtained during a period of time. After trying all ACTIVE locators,
if no available locator is found, D host must enter active mode. The
behavior of D host in passive mode while under the control of
Recovery Module is described as follows:
Choose K host's Address state It is the initial state of passive mode.
D host will choose a new address of K host to send HIP NOTIFY packet
with INVALD_LOCATOR parameter. Then it enters Wait Response state. D
host can switch to Passive mode when in this state.
Wait Response state. After sending out the HIP NOTIFY packet with
INVALD_LOCATOR parameter, D host is in Wait Response state. If no
response received from K host, D host will back to Choose K host's
Address state. Otherwise, once receiving the replied UPDATE packet
from K host, it will enter HIP Update state.
HIP Update: D host and K host will finish the address change as
defined in HIP protocol. Then, the communication can be recovered.
For K host under the control of Recovery Module, after receiving the
NOTIFY packet, it is aware of the failure of previously used path,
and then it enters active mode. By default, K host may use the
source/destination locator pair of the NOTIFY packet as new Preferred
locator to initiate the active mode. Additionally, K host is also
allowed to choose Preferred locator according to local policy. D
host firstly enters active mode or passive mode is a local policy.
The main difference in the two modes is the selection of preferential
changing side of two communicating hosts. No matter which combination
is chosen, the communication can always be recovered.
This mechanism is also suitable for multi-point network: When a D
host detects link failures and then finish recovery, it can send a
broadcast message about the status of the locators of both two ends
(if we run active mode link recovery, the D host will send a
broadcast message about the status of the D host's own locators to
the network; if we run passive mode link recovery, the D host will
send a broadcast message about the status of the K host's locators to
the network;), The connections of the network determine the
specific scope of broadcast. With this mechanism, the other hosts of
the network can select the right locator to set up a link without
having to wait for the timer to send out, and reducing
the signaling overhead.
Yuan, et al. Expires January 5, 2012 [Page 11]
Internet-Draft HIP Failure Detection and Recovery January, 2012
5. Packet Formats
5.1. New HIP Parameter
5.1.1. KEEPALIVE_TIMEOUT
This parameter is used to indicate how quickly KEEPALIVE NOTIFY
message will be send. The value of KEEPALIVE_TIMEOUT is set to the
value of Send Timeout of the host. KEEPALIVE_TIMEOUT can be sent in
the I2, R1, or UPDATE messages. After receiving a packet with
KEEPALIVE_TIMEOUT, the peer maps this value to set its Keepalive
Timer.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type = 10 | Length = 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Reserved | Keepalive Timeout |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 2: KEEPALIVE_TIMEOUT Parameter Format
The fields are exactly the same as defined in RFC5534.
Type: This field identifies the parameter and MUST be set to 10
(Keepalive Timeout).
Length: 4.
Reserved: 16-bit field reserved for future use. Set to zero upon
transmit and MUST be ignored upon receipt.
Keepalive Timeout: Value in seconds corresponding to suggested
Keepalive Timeout value for the peer.
5.2. NOTIFICATION Parameter
5.2.1. KEEPALIVE
This document defines a new kind of NOTIFICATION parameter named
KEEPALIVE. It is sent if there is no outgoing packet to inform
corresponding peer that current path is OK. Types of NOTIFY message
in the range 0-16383 are intended for reporting errors and in the
range 16384-65535 for other status information. Thus, the type value
of KEEPALIVE should range among 16384-65535.
Yuan, et al. Expires January 5, 2012 [Page 12]
Internet-Draft HIP Failure Detection and Recovery January, 2012
5.2.2. INVALD_LOCATOR
This document defines a new kind of NOTIFICATION parameter named
INVALD_LOCATOR. Host in passive mode sends it to corresponding peer
to indicate an error in the path. The type value of INVALD_LOCATOR
should range among 0-16383.
6. Security Considerations
To be done.
7. IANA Considerations
This document defines a KEEPALIVE Notify Message Type and a
INVALD_LOCATOR Notify Message Type for HIP.
8. References
8.1. Normative References
[RFC5206] P. Nikander, T. Henderson, C. Vogt, J.Arkko, "End-Host
Mobility and Multihoming with the Host Identity Protocol",
RFC 5206, April 2008.
[RFC5201] R.Moskowitz, P. Nikander, P. Jokela, T. Henderson, "Host
Identity Protocol", RFC 5201, April 2008.
8.2. Informative References
[REAP4HIP] A. de la Oliva, M. Bagnulo, "Fault tolerance
configurations for HIP multihoming", <draft-oliva-hiprg
reap4hip-00>, July, 2007.
[RFC5534] J. Arkko, I. van Beijnum, "Failure Detection and Locator
Exploration Protocol for IPv6 Multihoming", RFC 5534, June,
2009
[RFC5533] E. Nordmark. and M. Bagnulo, "Shim6: Level 3 Multihoming
Shim Protocol for IPv6", RFC 5533, June 2009
[BFD] D. Katz, D. Ward, "Bidirectional Forwarding Detection",
<draftietf-bfd-base-11(work in progress)>, January, 2010
Yuan, et al. Expires January 5, 2012 [Page 13]
Internet-Draft HIP Failure Detection and Recovery January, 2012
Authors' Addresses
Tao Yuan
State Key Laboratory of Networking and Switching,
Beijing University of Posts and Telecommunications
Beijing, P.R.China, 100876
Email: alberty2004@gmail.com
Bo Hu
State Key Laboratory of Networking and Switching,
Beijing University of Posts and Telecommunications
Beijing, P.R.China, 100876
Email: bj.hubo@gmail.com
Shanzhi Chen
State Key Laboratory of Networking and Switching,
Beijing University of Posts and Telecommunications
Beijing, P.R.China, 100876
Qinqin Chu
State Key Laboratory of Networking and Switching,
Beijing University of Posts and Telecommunications
Beijing, P.R.China, 100876
Email: qinqinchu@gmail.com
Zhangfeng Hu
State Key Laboratory of Networking and Switching,
Beijing University of Posts and Telecommunications
Beijing, P.R.China, 100876
Email: zfhu2001@gmail.com
Wenjun Li
State Key Laboratory of Networking and Switching,
Beijing University of Posts and Telecommunications
Beijing, P.R.China, 100876
liwenjunee@yahoo.cn
Yuan, et al. Expires January 5, 2012 [Page 14]