Network Working Group | S. Vinapamula |
Internet-Draft | Juniper Networks |
Intended status: Informational | S. Sivakumar |
Expires: September 5, 2015 | Cisco Systems |
M. Boucadair | |
France Telecom | |
T. Reddy | |
Cisco | |
March 4, 2015 |
Application-Initiated Flow High Availability Awareness through PCP
draft-vinapamula-flow-ha-09
This document specifies a mechanism for a host to signal via Port Control Protocol (PCP) which connections should be protected against network failures. These connections will be elected to be subject to high availability mechanisms enabled at the network side.
This approach assumes that applications/users have more visibility about sensitive connections rather than any heuristic that can be enabled at the network side to guess which connections should be check-pointed.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on September 5, 2015.
Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
The risk of Internet service disruption is critical in service providers and enterprise networking environments. Such a risk is often mitigated with the introduction of active/backup systems. This not only contribute to minimize the risk of service disruption, but also facilitate maintenance operations (e.g., hitless S/W or H/W upgrades).
In addition, the nature of some connections leads to the establishment and the maintenance of connection-specific states by some of the network functions invoked when the connection is established. During active/backup failover in case of a network failure, the said states need to be check-pointed by the backup system.
Heuristics based on the protocol, mapping lifetime, etc., are used in the network to elect which connections need to be check-pointed (e.g., by means of HA techniques). This document advocates for an application-initiated approach that would allow applications/users to signal to the network which of their connections are critical.
This document specifies how PCP can be extended to signal which connection should be check-pointed for HA (High Availability). This document does not make any assumption on the PCP-controlled device that will process the PCP-formatted signaling information from PCP clients. These devices are likely to be flow-aware.
The proposed approach is aligned with the networking trends advocating for open network APIs to interact with applications/services (e.g., [RFC7149]). Policy-decision making process at the network side will be enriched with information signaled by application using PCP for instance.
Regardless of the selected technology or design like HA-based designs, reliably securing connections is expensive in terms of memory, CPU and other resources. Also check-pointing may not be required for all connections as all connections may not be critical. But, this leaves a challenge to identify what connections to check-point.
Typically, long-lived connections are identified and, only the states of such connections are check-pointed.
Typically, this is addressed by identifying long lived connections and check-pointing state of only those connections that lived long enough, to the backup for service continuity.
However, check-pointing long lived connections raises the following issues:
This proposal is based on the assumption that an application or user is the best judge to decide which of its connections' are critical.
An application or user may explicitly identify the connections that need to be check-pointed by means of a PCP client, using the CHECKPOINT_REQUIRED option as described in Figure 1.
The entry to be backed up is indicated by the content of a MAP or PEER message.
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Option Code=TBA| Reserved | Option Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Option Name: CHECKPOINT_REQUIRED Number: <TBA> Purpose: Indicate if an entry needs to be check-pointed. Valid for Opcodes: MAP, PEER Length: 0. May appear in: request, response. Maximum occurrences: 1.
Figure 1: CHECKPOINT_REQUIRED PCP Option
The description of the fields is as follows:
An application or user can take advantage of this PCP option to explicitly indicate which of the connections need to be check-pointed and should not be disrupted. The processing of this option by the PCP server will then yield the check-pointing of the corresponding states by the relevant devices or functions dynamically controlled by the PCP server.
Communication between application/user and PCP client is implementation-specific.
Support of the CHECKPOINT_REQUIRED option by PCP servers and PCP clients is optional. This option (Code TBA; see Figure 1) may be included in a PCP MAP/PEER request to indicate a connection is to be protected against network failures.
There is a risk that every PCP client may wish to check-point every connection, which can potentially load the system. Administration SHOULD restrict the number of connections that can be elected to be backed up and the rate of check-pointing on per PCP client.
The PCP client includes a CHECKPOINT_REQUIRED option in a MAP or PEER request to signal that the corresponding mapping is to be protected.
If the PCP client does not receive a CHECKPOINT_REQUIRED option in response to a PCP request that enclosed the CHECKPOINT_REQUIRED option, this means that either the PCP server does not support the option, or the PCP server is configured to ignore the option or the PCP server cannot satisfy the request expressed in this option (e.g., because of a lack of resources).
If the CHECKPOINT_REQUIRED option is not included in the PCP client request, the PCP server MUST NOT include the CHECKPOINT_REQUIRED option in the associated response.
When the PCP server receives a CHECKPOINT_REQUIRED option, the PCP server checks if it can honor this request depending on whether resources are available for check-pointing. If there are no resources available for check-pointing, but there are resources available to honor the MAP/PEER request, a response is sent back to the PCP client without including the CHECKPOINT_REQUIRED option (i.e., the request is processed as any MAP/PEER request that does not convey a CHECKPOINT_REQUIRED option). If check-pointing resources are still available and the quota for this PCP client is not reached, the PCP server tags the corresponding entry as eligible to HA mechanism and sends back the CHECKPOINT_REQUIRED option in the positive answer to the PCP client.
To update the check-pointing behavior of a mapping maintained by the PCP server, the PCP client generates a PCP MAP/PEER renewal request that includes a CHECKPOINT_REQUIRED option to indicate this mapping has to be check-pointed or without including a CHECKPOINT_REQUIRED option to indicate this mapping does not need be check-pointed anymore. Upon receipt of the PCP request, the PCP server proceeds with the same operations to validate a MAP/PEER request updating an existing mapping. If validation checks are passed, the PCP server updates the check-point flag associated with that mapping accordingly (i.e., it is set if a CHECKPOINT_REQUIRED option was included in the update request or it is cleared if no CHECKPOINT_REQUIRED option was included) , and the PCP server returns the response to the PCP client accordingly.
What information to check-point and how to checkpoint is out of scope of this document, and is left for implementations. Also, interest to indicate check-pointing by users/applications in a PCP request, may be automatic, semi-automatic, or human intervened. This behavior is also left for application implementations. In case of manged CPEs, a service provider may influence what connections to be check-pointed.
It is RECOMMENDED to checkpoint state on backup for honored requests before a response is sent to the PCP client.
Below are provided some examples for illustration purposes:
In order to avoid that every application includes a CHECKPOINT_REQUIRED option in its PCP requests, the following items are assumed:
PCP-related security considerations are discussed in [RFC6887].
CHECKPOINT_REQUIRED option can be used by an attacker to identify critical flows. This issue is mitigated if the network on which the PCP messages are to be sent is fully trusted. Means to defend against attackers who can intercept packets between the PCP server and the PCP client should be enabled. In some deployments, access control lists (ACLs) can be installed on the PCP client, PCP server, and the network between them, so those ACLs allow only communications between trusted PCP elements. If the networking environment between the PCP client and PCP server is not secure, means to protect exposing the content of PCP messages (e.g., DTLS [RFC6347]) are recommended.
A network device can always override the end-user signalling, i.e., what is signaled by the PCP client, if the instructions are conflicting with the network policies.
The following PCP Option Code is to be allocated in the the "Specification Required" range (64-95,192-223) (the registry is maintained in http://www.iana.org/ assignments/pcp-parameters):
Thanks to Reinaldo Penno, Stuart Shechire, Dave Thaler, Prashanth Patil, and Christian Jacquenet for their comments.
[RFC2119] | Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. |
[RFC6887] | Wing, D., Cheshire, S., Boucadair, M., Penno, R. and P. Selkirk, "Port Control Protocol (PCP)", RFC 6887, April 2013. |
[RFC6347] | Rescorla, E. and N. Modadugu, "Datagram Transport Layer Security Version 1.2", RFC 6347, January 2012. |
[RFC7149] | Boucadair, M. and C. Jacquenet, "Software-Defined Networking: A Perspective from within a Service Provider Environment", RFC 7149, March 2014. |
It was tempting to include additional fields in the option but this would lead to a more complex design that is not justified, e.g.,: