Operations and Management Area Working Group Q. Wu
Internet-Draft M. Wexler
Intended status: Standards Track Huawei
Expires: December 7, 2014 P. Jain
Nuage Networks
June 5, 2014

Problem Statement and Architecture for Transport Independent OAM in the multiple layer network
draft-ww-opsawg-multi-layer-oam-00.txt

Abstract

Operations, Administration, and Maintenance (OAM) mechanisms [RFC6291] are basic building blocks for every communication layer and technology. The current practice is that many technologies and layers have their own OAM protocols. In the current situation there is a little or no re-use of software and hardware in the existing OAM protocols. Vendors and operators waste a lot through the whole OAM life-cycle when a new technology is introduced. Integration of OAM across multiple technologies is extremely difficult. In many cases it is desirable to have a generic OAM to cover heterogeneous networking technologies. An example to this generic approach is the Bidirectional Forwarding Detection [BFD] mechanism that offers a way to monitor, troubleshoot and maintain the network and services in support multi-layer OAM independent of media, data protocols, and routing protocols. Generic OAM tools can be deployed over various encapsulating protocols, and in various medium types.

An example of an environment in which a generic and integrated OAM protocol would be valuable is Service Function Chaining. A Service Function Chaining is composed by series of service Functions, that can act in different layers but providing an end-to-end chain or path from a source to destination in a given order [I.D-ietf-sfc-problem-statement]. In service function chaining environment it is necessary to provide end to end OAM across certain or all entities and involving many layers. OAM information should be exchanged between service functions in different layers while using various encapsulating protocols. In some cases OAM should cross different administration and/or maintenance domains.

This document sets out the problem statement and architecture for the Generic OAM in the Service Layer Routing. This document will cover at least the basic OAM functions and information such as Connectivity Verification (CV), Path Verification and Continuity Checks (CC),Path Discovery / Fault Localization and Performance Monitoring necessary to monitor and maintain the network.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on December 7, 2014.

Copyright Notice

Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.


Table of Contents

1. Introduction

Operations, Administration, and Maintenance (OAM) mechanisms [RFC6291] are basic building blocks for every communication layer and technology. The basic concepts of OAM and the functional roles in monitoring and diagnosing the behavior of telecommunications networks have been long term studied at the Layer 1&2 & Layer 3 levels. Certain OAM functions are used in many management applications for (i) defect and failure detection, (ii) reporting the defect/failure information, (iii) defect/failure localization, (iv) performance monitoring, and (v) service recovery.

The current practice is that many technologies and layers have their own OAM protocols. There is little or no re-use of software and hardware for each OAM protocol. Vendors and operators waste a lot through the whole OAM life-cycle when a new technology is introduced. Integration of OAM across multiple technologies is extremely difficult. When having networks with more than one technology, maintenance and troubleshooting are done per technology and layer, operation process can be very cumbersome. In many cases it is desirable to have a generic OAM to cover heterogeneous networking technologies. Generic OAM tools should be deployed over various encapsulating protocols, and in various medium types. An example to this generic approach is the Bidirectional Forwarding Detection [BFD] mechanism that offers a way to monitor, troubleshoot and maintain the network and services in support multi- layer OAM independent of media, data protocols, and routing protocols.

An example of an environment in which a generic and integrated OAM protocol would be valuable is Service Function Chaining. A Service Function Chaining is composed by a series of service Functions, that can act in different layers but providing an end-to- end chain or path from a source to destination in a given order [I.D -ietf-sfc-problem-statement]. In service function chaining Environment, it is necessary to provide end to end OAM across certain or all entities and involving many layers. OAM information should be exchanged between service functions in different layers while using various encapsulating protocols. In some cases OAM should cross different administration and/or maintenance domains.

This document sets out the problem statement and architecture for the Generic OAM in the multi-layer network and outlines the problems encountered with existing OAM protocol variety and their impact on introduction of new technologies. The scope of this document will at least cover the basic OAM functions and information (Connectivity Verification (CV), Path Verification and Continuity Checks (CC),Path Discovery / Fault Localization,Performance Monitoring) necessary to monitor and diagnose network.

1.1. What is Generic OAM in the multi-layer network?

In an multi-layer network, generic OAM is the ability to exchange OAM information across layers between nodes along forwarding path and gather and provide it to the management application through unified interface. OAM information includes OAM configuration and operational data abstracted from various network technologies, protocols and layers.

2. Terminology

2.1. Standards Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].

3. Overview of Use Cases

3.1. Fault localization in multi-layer network

A user who wishes to issue a Ping command or a Traceroute or initiate a session monitoring can do so in the same manner regardless of the underlying protocol or technology. Consider a scenario where an IP ping to device B from Device A failed. Between device A and B there are IEEE 802.1 bridges a,b and c. Let's assume a,b and c are using [8021Q] CFM. Upon detecting IP layer ping failure, the user may wish to "go down" to the Ethernet layer and issue the corresponding fault verification (LBM) and fault isolation (LTM) tools, using the same API.

3.2. Multi-layer OAM in support of service function chain

In service function chain, the service packets is steered through a set of service nodes distributed in the network.When the service packet enters the network OAM information needs to be imposed by ingress node of the network into the packet and pass throught the network in the same route as the service nodes. In case several SFs are co-located in the same service node, the packet is processed by all SFs in the service node, Once the packet is successfully handled by one SF, the packet is forwarded to the next SF that is in the same service node. When the packet leave the network, the OAM information needs to stripped out from the packet. To provide unified view of OAM information, these OAM information needs to gathered from various layer using different encapsulation and tunneling techniques and abstracted and provided to the management application via the unified management interface.

4. Problem Statement

OAM mechanisms are usually oriented to a single network technology or a single layer. Each technology or layer has its best suited OAM tools. Some of them providing rich functionality in one protocol, the other providing each function with a different protocol and each technology is developed independently. In the current situation There is little or no re-use of software and hardware for each OAM protocol. Integration of OAM across multiple technologies is extremely difficult. Vendors and operators waste a lot through the whole OAM life-cycle when a new technology is introduced. (1) Design and development: For every new protocol we invest in design and development of data, control and management planes. In some cases, even adding a single OAM function requires the above whole life-cycle (2) Operation and Maintenance: There is a need to train operation people for every new technology or feature. The above causes a slow time-to-market and a waste of time and effort for any new technology and/or OAM function.

Specifically, in service function chaining environment, every function may operate in a different layer and may use different encapsulation and tunneling techniques. When taking into account virtualization related technologies, the number of encapsulation and tunneling options is very high. Still, end-to-end service OAM mechanisms and information exchanges between functions should be provided to operate and maintain the network as a whole. This requires a generic tool-set that can provide all standard tools in context of multi-technology, multi-layer, physical and virtual environments.

An interesting angle to aspect of this problem is how the OAM information at different layer is made available to management application for use and learnt via the unified management interface. For example, in the case of an multi-layer network, OAM information needs to be imposed to the packet and injected into the network and at last abstracted from various layer and provide them to the management application.

4.1. Use of Existing Protocol Mechanisms

OAM information relies on network technology at each layer and may currently be exchanged at each network layer in a domain by using various encapsulation technologies at the Layer 2 & Layer 3 levels. OAM information may be gathered and exported from a domain (for example, northbound) using SNMP,I2RS or NetConf/Yang.

It is desirable that a solution to the problem described in this document does not require the implementation of a new, network-wide protocol or introduce a shim layer to carry OAM information. Instead, it would be advantageous to make use of an existing protocol or functionality that is commonly implemented on routers and is currently deployed. This has many benefits in network stability, time to deployment, and operator training.

It is recognized, however, that existing protocols or functionalities are unlikely to be immediately suitable to this problem space without some protocol extensions. Extending protocols must be done with care and with consideration for the stability of existing deployments. In extreme cases, there is a lack of functionality, although similar mechanisms exist in other technologies, a new protocol can be preferable to a messy hack of an existing protocol.

4.2. Strong Technology dependency

OAM protocols are relying heavily on the specific technology they are associated with. Addressing scheme is a good example for an issue that has a high price for being non-generic. Ping of IPv4 and IPv6 looks different in the addressing scheme as well in the ICMP indication field, but they have the same OAM functionalities.

4.3. Weakness of cross-layer OAM

Troubleshooting is cumbersome due to protocol variety and lack of multi-layer OAM. Usually OAM messages should not cross layer boundaries. Each of the service, network and transport layers possesses its well- discernable and native OAM stream. In addition, OAM messages should not be leaked outside of a management domain within a layer, where a management domain is governed by a single business organization. When having networks with more than one technology, maintenance and troubleshooting are done per technology and layer.

This could in some cases ease the understanding in which technology the operation is done or fault is located. In some cases, when one layer OAM fails, it would be more desirable to drop down to the another layer OAM and issue the corresponding OAM command, using the same API if OAM in multiple layers can be supported. However, in most cases switching tools and layers in the same operation process is cumbersome and not serving the main idea - to find the root cause location. It would be very helpful to have a generic mechanisms that is end to end basis and can ping IPv4 host by an IPv6 source or having one tool to troubleshoot combined IP, MPLS, Ethernet, GRE and VXLAN network.

4.4. Lack of OAM above Layer 3

The Layer 2/3 protocols are quite rich in their functionality, well defined, standardized and heavily used. In the last years a lot of work was done to consider maintenance domains and levels in order to better handle the issues of cross technology, vendor and operator domains to provide smooth interoperability and domain separation.

The above mechanisms are not defined for the technologies above Layer 3. Therefore, in the SFC environment no standard exists as a reference for OAM since when the service packets is steered through a set of service nodes distributed in the network, each service node work at different layers above layer 3.

4.5. Issues of Abstraction

In multi-layer network,OAM function is enabled at different layer and various OAM information needs to be gathered from various layer. Without multi-layer OAM in place, it is hard for management application to understand what these information at different layer stands for. One possible solution to the issues is to abstract the OAM information shared across layers, i.e., using the same tool or API to activate the OAM functions at different layers and retrieve the results.

The trick to this multi-layer problem, is to abstract in a way that retains as much useful information as possible while removing the data that is not needed. An important part of this trick is a clear understanding of what information is actually needed.

4.6. Issue of OAM information gathering from Service Function

When the service packets is steered through a set of service nodes distributed in the network, each service node work at different layers above layer 3 and may have several SFs collocated with itself. When OAM mechanism is applied, it is necessary to allow OAM packet exchanged between these service nodes or service function at different layers. when Service function involved in the SFC doesn't support OAM capability(e.g., SF is SFC-unaware service function), Service node should be responsible for monitor and diagnose the Service function and check service availability to these service function. It is more desirable to allow service function register to service node. Either service function report status to service node or service node perform live check to these service function.

In addtion, service functions usually don't have Layer 2-3 switching/routing capability and therefore are not aware of any OAM function at layer 2-3. Also when there is no OAM functions at service layers at top of layer 3, it is hard to identify layer that can be used to gather OAM information when it comes to a fault situation or degradation of performance. For example, when a data packet is transmitted from one service function to another service function and the data packet may be lost between two service functions or discarded by either of service function, assume two service functions are embedded in two different service nodes, how to detect the fault between them and how to isolate problem to that layer?

5. Existing Work

The following subsections discuss related IETF work and are provided for reference. This section is not exhaustive, rather it provides an overview of few initiatives tackling the pain-points of OAM.

  1. An important work done in [I-D.tissa-netmod-oam] create a YANG unified data model for OAM that is based on IEEE CFM model. This model can be used also for IP OAM functionality. The above work is focused on the management plane of OAM and should be complemented by an accompanying data-plane and/or control-plan work. It may require also some extensions to address wider variety of functions and technologies.
  2. Several works done in the last years tried to address new technologies using existing mechanisms. [I-D.jain-nvo3-overlay-oam] and MPLS-TP OAM documents are only examples for such efforts.

6. Architectural Consideration

6.1. Basic Components

6.1.1. Interconnect OAM at different layers

6.1.2. Interconnect OAM at the same shim layer above layer 3

6.2. OAM Functions in Data Plane

6.2.1. Continuity Check

This type of mechanisms check that the monitored layer and/or entity are alive and providing connectivity from specific point(s) to other point(s). Some examples are BFD and ETH CC.

6.2.2. Connectivity Verification

Verifying that the actual connection is consistent with the required connection and no misconnection occurred. Some examples are IP Ping, VCCV and ETH loopback.

6.2.3. Path Discovery

Used to discover the path that specific service traverses in the network. Some examples are LSP Trace, IP Trace-route and Ethernet Trace.

6.2.4. Performance measurement

A function that monitors the performance parameters of a network entity. Such parameters could be Delay, Delay-variation, loss, availability of services and class of services. Examples are TWAMP/ OWAMP and Y.1731.

6.2.5. Protection Switching

A function that is used to signal protection switching states and commands. Examples are ETH APS messages.

6.2.6. Alarm/defect indication

A function that is used to indicate that a failure occured downstream or upstream within a connection/service. Used also to trigger fast protection or to suppress alarms. Examples are ETH AIS and ETH RDI.

6.2.7. Maintenance commands

A function that is used to signal a maintenance state or command within a connection/service. Examples can be ETH Lockout.

6.3. OAM in Management plane

Management systems play an important role in configuring or provisioning OAM functionality consistently across all devices in the network, and for automating the monitoring and troubleshooting of network faults. However OAM is not provision,In general, Provisioning is used to configure the network to provide new services, whereas OAM is used to keep the network in a state that it can support already existing services.

There are two phases to OAM provision. The first phase is the network provisioning phase, which sets up Maintenance Domains (MD) and Maintenance Intermediate Points (MIP) and enables basic OAM functionality(e.g.,Connectivity Fault Management (CFM)) on the devices.

The second provision phase is the service activation phase,which enable the origin of ping and trace packets, as well as configure continuity-check and cross-check functionalities.

The different OAM tools may be used in one of two basic types of activation:

7. Building on Existing Protocols

8. Scoping Future Work

9. Manageability Considerations

10. Security Considerations

Security considerations are not addressed in this problem statement only document. Given the scope of OAM, and the implications on data and control planes, security considerations are clearly important and will be addressed in the specific protocol and deployment documents.

11. Summary

This document highlights problems associated with OAM in packet technologies today. We detail the problem scope, identified the main OAM functions that should be addressed based on the current aggregated functions.

12. Acknowledgements

The authors would like to thank Romascanu, Dan, Tissa Senevirathne for their valuable reviews and suggestions on this document.

13. References

13.1. Normative References

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", March 1997.

13.2. Informative References

[I-D.ietf-nsc-problem-statement] Quinn, P., Guichard, J. and S. Surendra, "Network Service Chaining Problem Statement", ID draft-quinn-nsc-problem-statement, August 2013.
[I-D.jain-nvo3-overlay-oam] Jain, P., "Generic Overlay OAM and Datapath Failure Detection", ID draft-jain-nvo3-overlay-oam-01, February 2014.
[I-D.tissa-netmod-oam] Senevirathne , T., Finn, N., Kumar , D. and S. Salam , "YANG Data Model for Operations Administration and Maintenance (OAM)", ID draft-tissa-netmod-oam-00, March 2014.

Authors' Addresses

Qin Wu Huawei 101 Software Avenue, Yuhua District Nanjing, Jiangsu 210012 China EMail: bill.wu@huawei.com
Mishael Wexler Huawei EMail: mishael.wexler@huawei.com
Pradeep Jain Nuage Networks 755 Ravendale Drive Mountain View, CA 94043 USA EMail: pradeep@nuagenetworks.net