Network Working Group S. Vinapamula
Internet-Draft Juniper Networks
Intended status: Standards Track S. Sivakumar
Expires: December 15, 2014 Cisco Systems
June 13, 2014

Flow high availability through PCP
draft-vinapamula-flow-ha-01

Abstract

This document describes a mechanism for a host to signal various network functions' High Availability (HA) module to checkpoint interested connections through PCP.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on December 15, 2014.

Copyright Notice

Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.


Table of Contents

1. Introduction

Internet service continuity is critical in service provider environment. To achieve this, most service providers have active-backup systems.

For some of the network functions, a state would be maintained for every connection for processing subsequent packets of that connection. For service continuity of those connections on backup when active fail, that corresponding state had to be checkpointed on the backup.

NAPT is one such network function, where a state is maintained for every connection.

This document describes some of the existing issues in the way checkpointing is done today, and tries to address them.

2. Terminology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

3. Issues with the existing implementations

In a high availability (HA) deployment, it is expensive in terms of memory, CPU and other resources to checkpoint all connections state. Also checkpointing may not be required for all connections as all connections may not be critical. But, this leaves a challenge to identify what connections to checkpoint.

Typically, this is addressed by identifying long lived connections and checkpointing state of only those connections that lived long enough, to the backup for service continuity.

However, following are the issues with that approach:

  1. A connection which could potentially be long-lived would face disruption in service on failure of active system, before it had not lived long enough for it to be checkpointed.
  2. A connection may not be long lived but critical like shorter phone conversations.
  3. Similarly not every long lived connection need to be critical, say a free-service connection of a hosted service need not be checkpointed while a paid-service connection has to be checkpointed.

4. Proposed Solution

This proposal is based on the assumption that an application or user is the best judge to decide which of its connections' are critical.

An application or user MUST indicate that one or more of its connections is critical and disruption is not desired. This will trigger checkpointing of state to the backup.

An application/user may indicate the desire for checkpoint through PCP client, and PCP client MUST mark bit zero of "Reserved" bits in the PCP request header as below. Here after in the document, this bit is referred as HA bit.

All other bits in the "Reserved" field MUST be marked zero on transmissions and MUST be ignored on reception.

	0                   1                   2                   3
	0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	|  Version = 2  |R|   Opcode    |         Reserved             1|
	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	|                 Requested Lifetime (32 bits)                  |
	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	|                                                               |
	|            PCP Client's IP Address (128 bits)                 |
	|                                                               |
	|                                                               |
	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	:                                                               :
	:             (optional) Opcode-specific information            :
	:                                                               :
	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	:                                                               :
	:             (optional) PCP Options                            :
	:                                                               :
	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        

PCP request header with HA bit being the zero bit in the "Reserved" field.

PCP server MAY honor this request depending on whether resources are available for checkpointing. If there are no resources available for checkpointing, but there are resources available to honor request say PCP MAP/PEER request, request is honored and there is no error returned.

What information to checkpoint and how to checkpoint is out of scope of this document, and is left for implementation.

It is RECOMMENDED to checkpoint state on backup for honored requests before a response is sent to the PCP client.

Communication between application/user and PCP client is implementation specific.

5. Some Usage Examples

  1. Disruption in a phone connection is not desired. Application that is initiating a phone connection MUST mark connection HA bit in the header, while initiating a PCP request for checkpointing.
  2. Similarly disruption in media streaming is not desired. A user hosting a media service, MUST mark HA bit in the header while initiating a mapping request, and MAY mark connection associated with that mapping, depending on whether the connection is from a paid subscriber or from a free subscriber through a PEER request. So checkpointing mapping doesn't result in auto checkpointing of connections, as it gives flexibility to the end user to pick specific connections only to checkpoint.

6. Signaling HA for other network function

In conjunction with NAT, other network functions that MAY maintain state for each conneciton such as Stateful Firewall, IPSec, Load balancing etc..., MAY register to PCP server, and MAY be triggered for checkpointing respective state of that connection.

7. Security Considerations

A network function device can always override the end user signalling if the administrator specified rules are not in policy with the end user signaling.

There is a risk that every client may wish to checkpoint every connection, which can potentially load the system. Admin may restrict number of connections and the rate of checkpointing on per PCP client.

8. IANA Considerations

This document does not require any action from IANA.

9. Normative references

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC6887] Wing, D., Cheshire, S., Boucadair, M., Penno, R. and P. Selkirk, "Port Control Protocol (PCP)", RFC 6887, April 2013.

Authors' Addresses

Suresh Vinapamula Juniper Networks 1194 North Mathilda Avenue Sunnyvale, CA 94089 USA Phone: +1 408 936 5441 EMail: sureshk@juniper.net
Senthil Sivakumar Cisco Systems 7100-8 Kit Creek Road Research Triangle Park, NC 27760 USA Phone: +1 919 392 5158 EMail: ssenthil@cisco.com