Network Working Group | L. Zhang |
Internet-Draft | W. Jia |
Intended status: Standards Track | BUPT University |
Expires: March 16, 2015 | Y. Chen |
S. Yan | |
Tsinghua University | |
September 12, 2014 |
Detection of Primary Server Failure in DHCPv6 Failover
draft-zhang-dhc-dhcpv6-failure-detection-00
In DHCPv6 failover scenario an automatical failure detection capability may be desirable. This document describes a detection method, with which the secondary server can detect the link failure between the primary server and clients.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on March 16, 2015.
Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
[RFC7031] describes the requirements of DHCPv6 failover. In the DHCPv6 failover scenario, two DHCPv6 servers, which act as two partners, are serving the clients in the domain. One server should act as the primary server if it is responsible for answering clients' requests. The other one should be the secondary server, which is expected to be responsive in case of the primary server failure.
[I-D.ietf-dhc-dhcpv6-failover-design] defines the notion of auto-partner-down capability, with which the server can automatically enter PARTNER-DOWN state without operator intervention. It also proposes a a timer-based CONTACT message mechanism, with which the server can detect partner unreachability. However, this mechanism is not so suffient, as it cannot resolve the situation in which the connection between the primary and secondary servers is normal, but the primary server is unreachable for clients.
This document describes a method for the secondary server to detect such a situation. The consideration of the potential preference conflict between the responsive secondary server and primary server is also presented.
In some cases the primary server is unreachable for (likely all of the) clients, even if it is still responsive. A (physic) link failure between the primary server and clients may lead to such a situation. In this situation, the secondary server should be able to detect this failure. It should become responsive if the it is allowed (i.e. the auto-partner-down capability).
The CONTACT message based mechanism described in the Section 8.4 of [I-D.ietf-dhc-dhcpv6-failover-design] may not be suffient for the secondary server to detect such a primary server failure, because the connection between the primary and secondary server is not interupted.
Assume that the secondary server is reachable to clients while the primary server is not (at least to part of clients). In this case, some clients may keep advertising SOLICIT or REQUEST messages.
When the secondary server received SOLICIT or REQUEST messages, but does not see any BNDUPD message coming from the primary server for TBD1 period, it may consider that the primary server is unresponsive.
An alternative method is that if the secondary server receives the SOLICIT messages from the same client for TBD2 times, it may consider that the primary server failure occured. This method requires that the secondary server to maintain a state (or leverage the lease base) to record the time of receipt of SOLICIT message from a certain client.
Once the primary server failure is detected, the secondary server may move to PARTNER-DOWN state and start to serve as a responsive server.
The detection method described in this document is likely to lead to a situation in which both the primary (which is most likely in NORMAL state) and secondary server (which is expected to move to PARTNER-DOWN state) are responsive, at least to part of clients. According to Section 9.4.2 of [I-D.ietf-dhc-dhcpv6-failover-design], however, the secondary server cannot enter PARTNER-DOWN state if the primary server is in NORMAL state.
The most imporant reason is that the primary server cannot detect this failure of its own, and thus stays in NORMAL state and prevents the secondary server from being responsive.
A possible solution is that the primary server performs a failure detection process. Once such a failure is detected, the primary server should move to a state in which it will not prevent the secondary server to serve as a responsive server (e.g. the RECOVER-DONE state).
A sort of DoS attack can be performed by a malicious client, which can flood the SOLICIT message in the network, thus pushing the secondary server to enter PARTNER-DOWN state while the primary server is actually responsive to the other clients.
Further security consideration is TBD.
This document does not include an IANA request.
[I-D.ietf-dhc-dhcpv6-failover-design] | Mrugalski, T. and K. Kinnear, "DHCPv6 Failover Design", Internet-Draft draft-ietf-dhc-dhcpv6-failover-design-04, September 2013. |
[RFC2119] | Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. |
[RFC7031] | Mrugalski, T. and K. Kinnear, "DHCPv6 Failover Requirements", RFC 7031, September 2013. |
[RFC6853] | Brzozowski, J., Tremblay, J., Chen, J. and T. Mrugalski, "DHCPv6 Redundancy Deployment Considerations", BCP 180, RFC 6853, February 2013. |