Operations and Management Area Working Group | Q. Wu |
Internet-Draft | M. Wexler |
Intended status: Informational | Huawei |
Expires: January 5, 2015 | M. Boucadair |
France Telecom | |
S. Aldrin | |
Huawei USA | |
G. Mirsky | |
Ericsson | |
P. Jain | |
Nuage Networks | |
July 4, 2014 |
Problem Statement and Architecture for Transport-Independent Multiple Layer OAM
draft-ww-opsawg-multi-layer-oam-02.txt
Operations, Administration, and Maintenance (OAM) mechanisms are critical building blocks in network operations that are used for service assurance, fulfillment, or service diagnosis, troubleshooting, and repair. The current practice is that many technologies rely on their own OAM protocols that are exclusive to a given layer. There is little consolidation of OAM in either data plane or management plane nor well-documented inter-layer OAM operations. Vendors and Operators dedicate significant resources and effort through the whole OAM life-cycle each time when a new technology is (to be) introduced. This is even exacerbated when dealing with integration of OAM across multiple technologies.
This document describes the problem space and defines an architecture for the generic and integrated OAM with a focus of multi-layer and cross-layer considerations.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 5, 2015.
Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
Operations, Administration, and Maintenance (OAM) mechanisms being understood and used in context of RFC 6291 [RFC6291] are critical building blocks in network operations that are used for service assurance, fulfillment, or service diagnosis, troubleshooting, and repair. The key foundations of OAM and its functional roles in monitoring and diagnosing the behavior of networks have been studied at OSI layers 1, 2 and 3 since a while. As a reminder, OAM functions are used in many management applications for various objectives such as (i) failure detection, (ii) reporting the defect/ failure information, (iii) defect/failure localization, (iv) performance monitoring, and (v) service recovery.
The current practice that consists in enabling OAM techniques for each layer has shown its limits; this is a need for cross-layer and inter-layer OAM considerations [RFC7276]. This need for inter-layer OAM is motivated by the need to achieve: network optimization, efficient enforcement of TE (Traffic Engineering) techniques including ensuring path diversity at distinct layers or computing completely disjoint paths at several layers, fine-grain tweaking, ease of root cause analysis, ability to maintain a network-wise visibility in addition to layer-specific one, etc.
It is worth to mention also that there are two restrictions for multi-layer structure as discussed in [RFC7276]:
Moreover, there is little consolidation of OAM in either data plane or management plane. Vendors and operators dedicate a lot resources and effort through the whole OAM life-cycle each time a new technology is (to be) introduced. Integration of OAM across multiple technologies in either data plane or management plane is extremely difficult to achieve.
When operating networks with more than one technology, maintenance and troubleshooting are achieved per technology and per layer, operation process can be very cumbersome since OAM is not defined to cross layer boundaries. Another challenge is presented by use of different technologies and corresponding OAM on the same layer of adjacent network domains. Interworking between different OAM often not defined and are left to proprietary solutions. In many cases when keeping network complexity down and simplifying OAM is needed, it is desirable to have a generic and integrated OAM to cover heterogeneous networking technologies.
This document defines the problem space and describes an architecture for the generic and integrated OAM in the multi-layer and multi-domain networks. In particular, it outlines the problems encountered with existing OAM protocols and their impact on introduction of new technologies (see Section 3).
This document covers the following:
This document defines the following terms:
OAM mechanisms are usually oriented toward a single network technology or a single layer. Each technology or layer has its best suited OAM tools. Some of them providing rich functionality rely on the capabilities of one protocol, while the others provide each function with a different protocol; In the current situation, there is little, or no re-use, of software and hardware for each OAM protocol.
Integration of OAM across multiple technologies is extremely difficult. Vendors and operators waste a lot through the whole OAM life-cycle when a new technology is introduced:
Specifically, in Service Function Chaining environment, every Service Function may operate at a different layer and may use different encapsulation and tunneling techniques. When taking into account virtualization related technologies, the number of encapsulation and tunneling options increase even more. Still, end-to-end service OAM mechanisms and information exchanges between Service Functions should be provided to operate and maintain the network as a whole. This requires a generic toolkit that can provide all necessary tools in context of multi-technology, multi-layer, physical and virtual environments.
A particular problem is how OAM information at different layer is made available to a management application for use and learnt via the unified management interface. For example, in the case of an multi- layer network, OAM information needs to be imposed to the packet and injected into the network and at last abstracted from various layers and expose them to the management application.
OAM information resides at each layer and may currently be exchanged at each network layer in a domain by using various encapsulation technologies at the Layer 2 & Layer 3 levels. OAM information may be gathered and exported from a domain (for example, northbound) using SNMP [RFC3411]or NETCONF/YANG [RFC6241].
It is desirable that a solution to the problem described in this document does not require the implementation of a new, network-wide protocol or introduce a shim layer to carry OAM information. Instead, it would be advantageous to make use of an existing protocols or functionalities that are commonly implemented and are currently deployed in operational networks. This has many benefits in network stability, time to deployment, and operator training.
It is recognized, however, that existing protocols or functionalities are unlikely to be immediately suitable to this problem space without some protocol extensions. Extending protocols must be done with care and with consideration for the stability of existing deployments. In extreme cases, when there is a lack of functionality, although similar mechanisms exist in other technologies, a new protocol can be preferable to a "messy" hack of an existing protocol.
OAM protocols are relying heavily on the specific network technology they are associated with. For example, ICMP, LSP Ping are using different network technologies but provide the same OAM functionality, i.e., Path Discovery. Another example is BFD,LSP Ping are using different network technologies but provide the same functionality, i.e., Continuity Verification. Figure 1 shows common OAM functionalities shared by various existing IETF OAM protocols.
|--------+------------+--------------+--------------+------------+ | |Continuity | Connectivity| Path | Performance| | | Check | Verification| Discovery | Measurement| +--------+------------+--------------+--------------+------------+ | | | | |-Delay | | ICMP | Echo(Ping) | | Traceroute |-Loss rough | | | | | | measurement| +--------+------------+--------------+--------------+------------+ | | BFD | | | | | BFD | Control | BFD Control | | | | | /Echo | | | | +--------+------------+--------------+--------------+------------+ | LSP | | | | - Delay | | Ping | | Ping | Traceroute | - Packet | | | | | | Loss | +--------+------------+--------------+--------------+------------+ | | | | | -OWAMP | | IPPM | | | | -TWAMP | | | | | | | |--------+------------+--------------+--------------+------------+ | MPLS-TP| | | | | | OAM | CC | CV | Traceroute | -Delay | | |(use of BFD)|(use of BFD) | | -Packet | | | | or LSP Ping) | | Loss | +--------+------------+--------------+--------------+------------+
Figure 1: Examples of IETF OAM tools
Troubleshooting is cumbersome due to protocol variety and lack of multi-layer OAM. Usually OAM messages should not cross layer boundaries. Each of the service, network and transport layers possesses its well-discernible and native OAM stream. In addition, OAM messages should not be leaked outside of a management domain within a layer, where a management domain is governed by a single business organization. When having networks with more than one technology, maintenance and troubleshooting are done per technology and layer.
These rules could in some cases ease the understanding in which technology the operation is done or fault is located. In some cases, when one layer OAM fails, it may be desirable to drop down to the another layer OAM and issue the corresponding OAM command, using the same APIs, if OAM in multiple layers can be supported. However, in most cases switching tools and layers in the same operation process is cumbersome and not serving the main idea - to find the root cause location. It would be very helpful to have a generic mechanisms that is end to end basis, allow management application interact with data plane OAM and can ping IPv4 host by an IPv6 source or having one tool to troubleshoot combined IP, MPLS, Ethernet, GRE and VXLAN network.
In Service Function Chaining environment, it is necessary to provide end-to-end OAM across certain or all entities and involving many layers. Inter-layer OAM considerations are key in an SFC context because problems may occur at the network layer or at the service chaining layer.
The Layer 2/3 OAM protocols are quite rich in their functionality, well defined, standardized and heavily used. In the last years a lot of work was conducted to consider maintenance domains and levels in order to better handle the issues of technology re-use, smooth interoperability and interworking between domains.
The above mechanisms are not defined for the technologies above Layer 3. Therefore, in the SFC environment where a Service Function Chaining is composed by a set of Service Functions, but providing an end-to-end chain or path from a source to destination in a given order [I.D-ietf-sfc-problem-statement], no standard exists as a reference for OAM since when the service packets is steered through a set of service nodes distributed in the network, each service node may act at different layers above layer 3.
In multi-layer network, OAM functions are enabled at different layers and various OAM information needs to be gathered from various layers. Without multi-layer OAM in place, it is hard for management applications to understand what information at different layers stands for. One possible solution to these issues is to abstract the OAM information shared across layers, i.e., using the same tool or API to activate the OAM functions at different layers and retrieve the results.
The challenge is to abstract in a way that retains as much useful information as possible while filtering the data that is not needed to be leaked to other layers. An important part of this effort is a clear understanding of what information is actually needed.
In SFC, the service packets are steered through a set of service nodes (virtual or physical) hosting the service function distributed in the network. In the NVO3 network, the data packet may also traverse a set of overlay nodes distributed in the network. Overlay technologies or other tunneling technologies can be used to stitch these service nodes or overlay node in order to form end to end path.
When any overlay Segment or segment of service chain in the network fails to deliver user traffic, there is a need to provide a tool that would enable users to detect such failures at different layer using various encapsulation protocols and locate faults in the specific part of the network, and a mechanism to isolate these faults. It may also be desirable to test the data path before mapping user traffic to the Overlay Segment or segment of service chain. When multiple layer OAMs are used in the different parts of the network; how these layers OAM interwork at the boundary of each part of network is also a serious issue.
When the service packets are steered through a set of Service Nodes (virtual or physical) hosting the Service Function distributed in the network, each Service Node may work at different layer above layer 3 and may embed several SFs. When OAM mechanism is applied, it is necessary to allow OAM packets to be exchanged:
When Service Functions that are part of the SFC-enabled domain do support the OAM capability (e.g., an SFC-unaware Service Function) and Service Node has OAM capability, Service Nodes may be responsible for monitoring and diagnosing and reporting service availability of these Service Functions. It is more desirable to allow Service Functions register with a Service Node. Either Service Functions report status to the Service Node or the Service Node performs liveness check of the Service Function.
In addition, some Service Functions may not have Layer 2-3 switching/routing capability and therefore are not aware of any OAM function at Layer 2-3. Also when there are no OAM functions at service Layers above layer 3, it is hard to identify the layer that can be used to gather OAM information when it comes to a fault situation or degradation of performance. For example, when a data packet is transmitted from SFC ingress node (i.e.,Classifier) and traverse a set of Service Nodes that host Service Function,the data packet may be discarded either at the SFC ingress node, one specific Service Node or one specific Service Function. Also the data packet may be lost between SFC ingress and one Service Node, or between two Service Nodes, or between one Service Node and one Service Function, how to detect the fault between them and how to isolate problem to that layer?
Editor's Note: Section 3.6.1 is too specific. This text can be presented as an example to illustrate a problem not a problem per se or moved to a use case draft.
Figure 2 shows the reference architecture for Layering OAM. This reference architecture assumes that
In this architecture, three layers are defined:
In the M1 layer, a typical network can be partitioned into several domains. Each domain has at least two MEPs and none or several MIPs. One domain can contain one or more maintenance associations (MAs). MEP is a maintenance functional entity that is implemented into a Network Element at the maintenance domain boundary and can send and receive OAM packets. MIP is a maintenance functional entity that is implemented into a Network Element in the maintenance domain and can forward OAM packets and respond OAM packets only when triggered by a specific OAM function (e.g., Path Discovery or Connectivity Verification). MEPs and MIPs can exist in the same maintenance domain and belong to different MAs. They can also exist at different layers and use various encapsulating protocols.
The M2 contains the interface which management entity uses to manage individual network devices. In this document, we further require management entities to use this interface as uniform interface (API and or UI) to gather OAM information from MEP and MIP in the network devices(either physical or virtual entity) and execute transactions or operations on MEP and MIP across domains, layers and vendors. Protocols that can be used to manipulate the configuration of a network device include SNMP [RFC1157], Command Line Interfaces, NETCONF [RFC6241], and other protocols.
On the M3 layer, there is a uniform interface (API and/or UI) that covers all the managed devices and can execute network-wide transactions. This layer allows applications and operators to execute configuration, monitoring and action tasks across multiple network devices, from a mix of domains, layers, vendors. Still the abstraction level is that of the network elements themselves, so whatever configuration, status, actions and notifications they provide, that is what you get here, but without having to worry about the location and the protocol to reach the device.
Service +-------------------+ ---- Plane | Customer | ^ Layer +-------------------+ | | +-------------+ +-------------+ V | Management | | Management | ---Management | Entity | | Entity | ^ Plane +-------------+ +-------------+ | Layer | | | |----------------------------+ +---------------------+ | |Maintenance Domain 1 | |Maintenance Domain 2 | | | | | | | | | | | | NE| NE NE NE| | NE NE | V +-----+ +-----+ +-----+ +------+ +-----+ +--+--+ ---- | MEP +----+ MIP +--| MIP +----| MEP +-----| MIP +-----+ MEP | +-----+ +-----+ +-----+ ++----++ +-----+ +-----+ Data Plane | | | Layer | | | +----------------------------+ +---------------------+
Figure 2: Architecture for Layering OAM in the management plane
An example of service-specific that depicts OAM layers can be found in [RFC4176] (L3VPN case).
The following discuss related IETF work and is provided for reference. This section is not exhaustive, rather it provides an overview of few initiatives focusing on the pain-points of OAM:
Many OAM functions may require protocol extensions or new protocol development to meet the transport requirements. In the existing OAM tools, Some of them providing rich functionality in one protocol,the other providing each function with a different protocol and each technology is developed independently.
To consolidate OAM in the data plane, the OAM in multi-layer Environment is expect to support the following common OAM functions used in OAM-related standards. These functions are used as building blocks in the data plane OAM standards described in this document.
This type of mechanisms check that the monitored layer and/or entity are alive and providing path from specific point(s) to other point(s). Some examples are IP Ping, BFD [RFC5880] and ETH CC.
Verifying that the actual connection is consistent with the required connection and no mis-connection occurred. Some examples are IP Ping, and ETH loopback.
Used to discover the path that specific service traverses in the network. Some examples are LSP Traceroute, IP Traceroute and ETH-LT/linktrace.
A function that monitors the performance parameters of a network entity. Such parameters could be Delay, Delay-variation, loss, availability of services and class of services. Examples are TWAMP[RFC5357]/ OWAMP[RFC4656] and Y.1731,MPLS Loss and Delay Measurement [RFC6374].
A function that is used to signal protection switching states and commands. Examples are ETH APS messages and MPLS-TP Protection Switching Coordination OAM [RFC6378].
A function that is used to indicate that a failure occurred downstream or upstream within a connection/service. Used also to trigger fast protection or to suppress alarms. Examples are ETH AIS and ETH RDI, MPLS-TP RDI [RFC6428].
A function that is used to signal a maintenance state or command within a connection/service. Examples can be ETH Lockout.
Management systems play an important role in configuring or provisioning OAM functionality consistently across all devices in the network, and for automating the monitoring and troubleshooting of network faults. However OAM is not provision. In general, provisioning is used to configure the network to provide new services, whereas OAM is used to keep the network in a state that it can support already existing services.
As we know each layer has its own OAM protocols. OAM can be used at different levels of hierarchy in the network to form a multi-layer OAM solution [RFC7276]. To support multi-layer OAM covering various heterogeneous transport technologies, the OAM in the management needs to be consolidated as follows:
The different OAM tools may be used in one of two basic types of activation:
This section includes a set of candidate items for activities to be conducted within IETF.
These objectives are not frozen; further discussion is required to target key issues and scope the work to be conducted within IETF accordingly.
Candidate investigation items are listed below:
Security considerations are not addressed in this problem statement only document. Given the scope of OAM, and the implications on data and control planes, security considerations are clearly important and will be addressed in the specific protocol and deployment documents.
The authors would like to thank Romascanu, Dan, Tom Taylor, Tissa Senevirathne,Huub van Helvoort, Yuji Tochio for their valuable reviews and suggestions.
[RFC2119] | Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", March 1997. |
[RFC6291] | Andersson, L., Helvoort, H., Bonica, R., Romascanu, D. and S. Mansfield, "Guidelines for the Use of the "OAM" Acronym in the IETF", RFC 6291, June 2011. |
[RFC7276] | Mizrahi, T. and N. Sprecher, "An Overview of Operations, Administration, and Maintenance (OAM) Tools", RFC 7276, June 2014. |