Network Working Group | L. Zhang |
Internet-Draft | W. Wang |
Intended status: Informational | BUPT University |
Expires: November 7, 2015 | Y. Chen |
Tsinghua University | |
L. Sun | |
BUPT University | |
May 6, 2015 |
Detection of Primary Server Failure in DHCPv6 Failover
draft-zhang-dhc-dhcpv6-failure-detection-01
In DHCPv6 failover or other multiple servers deployment scenarios, an automatic failure detection capability may be desirable. This document describes a detection method, with which the secondary server can detect the link failure between the primary server and clients. This document does not define any protocol details.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on November 7, 2015.
Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
[RFC7031] describes the requirements of DHCPv6 failover. In the DHCPv6 failover scenario, two DHCPv6 servers, which act as two partners, are serving the clients in the domain. One server should act as the primary server if it is responsible for answering clients' requests. The other one should be the secondary server, which is expected to be responsive in case of the primary server failure.
Popular implementations of failover design always provide the ability that one server could detect its partner's failure. This goal could be achieved through various mechanisms such as timer-based solution and etc. However, such failure detection methods are not sufficient. Since they cannot work out in a situation that the connection between the primary and secondary servers is normal while the link between the primary server and clients is down. Under this circumstances, it would be desirable that the secondary server could detect such a failure automatically and take the responsibility of providing DHCPv6 services.
This document describes a method for the secondary server to detect such a failure between primary server and clients. The consideration of the potential preference conflict between the responsive secondary server and primary server is also presented. It should be noted that this method can also be applied to the scenario described in [RFC6853] and other multiple DHCPv6 servers deployments.
[RFC3315] allows multiple servers working in one domain for high availability and other benefits. One of the main purposes of multiple DHCPv6 servers deployment and failover is to solve the server failure problems. The server failure could be divided into two categories, the first one is the failure between primary server and secondary server, the second one refers to the failure between primary server and clients. People and existing failover implementations always focused more on the former situation and has already came up with several automatic detection methods.
A common scenario of the second failure is a (physical) link failure between primary server and clients. Such link failure may not do harm to the primary server itself but could actually result in making the primary server unreachable for clients. If the secondary server is not able to detect such a failure, it will assume everything is okay and not provide DHCPv6 service for redundancy.
The failure detection method described in this document is based on the following assumptions.
Based on the assumptions above, if the primary server is not reachable for a client, the client may keep advertising SOLICIT or REQUEST messages (if stateless DHCPv6 is used, the client may keep sending INFORMATION-REQUEST message).
The secondary server should implement a counter locally. This counter will count each time the secondary server receives a duplicated message (e.g. SOLICIT message) from a same client. Also a threshold value should be set at the secondary server side. If the count value is larger than the threshold value, and the secondary server cannot find anything wrong with the primary server, it will consider there exists a failure between primary server and clients. The threshold value may differ in different deployments, thus the specific value of threshold and implementation of counter is out of scope of this document.
The detection method described in this document is likely to lead to a situation that both the primary server and secondary server are responsive, at least to part of clients that their link to the primary server is not down. The reason is that the primary server cannot detect there is a failure between itself and part of clients. Thus it will continue to provide its DHCPv6 service which may cause a conflict with the secondary server. As a result, part of clients may receive two responses from the two servers and cannot decide which should be used.
One possible solution is that every time the secondary server decide to take the responsibility of providing DHCPv6 service,it should inform the primary server about it. Such a notification should be regardless of whether the primary server is available or not. Since the purpose is to make sure there will not be two servers offering service at the same time.
Once the primary server failure is detected and notification process is finished, the secondary server may start to serve as a responsive server or just report the condition but do nothing else.
A sort of DoS attack can be performed by a malicious client, which can flood the SOLICIT message in the network, thus make the secondary server become responsive while the primary server is actually responsive to the other clients.
Further security consideration is TBD.
This document does not include an IANA request.
[RFC2119] | Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. |
[RFC3315] | Droms, R., Bound, J., Volz, B., Lemon, T., Perkins, C. and M. Carney, "Dynamic Host Configuration Protocol for IPv6 (DHCPv6)", RFC 3315, July 2003. |
[RFC7031] | Mrugalski, T. and K. Kinnear, "DHCPv6 Failover Requirements", RFC 7031, September 2013. |
[RFC6853] | Brzozowski, J., Tremblay, J., Chen, J. and T. Mrugalski, "DHCPv6 Redundancy Deployment Considerations", BCP 180, RFC 6853, February 2013. |