Routing Area Working Group | G. Mirsky |
Internet-Draft | Ericsson |
Intended status: Informational | E. Nordmark |
Expires: January 9, 2017 | Arista Networks |
C. Pignataro | |
N. Kumar | |
D. Kumar | |
Cisco Systems, Inc. | |
M. Chen | |
Y. Li | |
Huawei Technologies | |
D. Mozes | |
Mellanox Technologies Ltd. | |
S. Pallagatti | |
I. Bagdonas | |
July 8, 2016 |
Operations, Administration and Maintenance (OAM) for Overlay Networks: Gap Analysis
draft-ooamdt-rtgwg-oam-gap-analysis-02
This document provides an overview of the Operations, Administration, and Maintenance (OAM) for overlay networks. The OAM toolset includes set of fault management and performance monitoring capabilities (operating in the data plane) that comply with the Overlay OAM Requirements. Insufficient functional coverage of existing OAM protocols also noted in this document. The protocol definitions for each of the Overlay OAM tools to be defined in separate documents.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 9, 2017.
Copyright (c) 2016 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
Operations, Administration, and Maintenance (OAM) toolset provides methods for fault management and performance monitoring in each layer of the network, in order to improve their ability to support services with guaranteed and strict Service Level Agreements (SLAs) while reducing operational costs.
[RFC7276] provided detailed analysis of OAM protocols. Since its completion several new protocols that define data plane encapsulation were introduced. That presented both need to re-evaluate existing set of OAM tools and opportunity to build it into set of tools that can be used and re-used for different data plane protocols.
[I-D.ooamdt-rtgwg-ooam-requirement] defines the set of requirements for OAM in Overlay networks. The OAM solution for Overlay networks, developed by the design team, has two objectives:
Term "Overlay OAM" used in this document interchangeably with longer version "set of OAM protocols, methods and tools for Overlay networks".
AIS Alarm Indication Signal
BFD Bidirectional Forwarding Detection
BIER Bit-Indexed Explicit Replication
CC Continuity Check
CV Connectivity Verification
FM Fault Management
G-ACh Generic Associated Channel
Geneve Generic Network Virtualization Encapsulation
GUE Generic UDP Encapsulation
MPLS Multiprotocol Label Switching
NTP Network Time Protocol
NVO3 Network Virtalization Overlays
OAM Operations, Administration, and Maintenance
OWAMP One-Way Active Measurement Protocol
PM Performance Measurement
PTP Precision Time Protocol
SFC Service Fundction Chaining
SFP Service Function Path
SLA Service Level Agreement
TWAMP Two-Way Active Measurement Protocol
VxLAN Virtual eXtensible Local Area Network
VxLAN-GPE Generic Protocol Extension for VxLAN
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].
The BIER working group has some WG documents on OAM which are discussed further in this document.
The NVO3 encapsulations (Geneve [I-D.ietf-nvo3-geneve], GUE [I-D.ietf-nvo3-gue], and GPE [I-D.ietf-nvo3-vxlan-gpe]) all have some notion of a OAM bit or flag. In Geneve this is defined to not apply to intermediate (underlay) routers and that the setting of the bit doesn't affect the ECMP hash. The other proposals do not have as succinct constraints on their OAM bit/flag.
There are currently no NVO3 working group OAM protocol specifications. The OAM documents that have been discussed are individual drafts such as [I-D.ashwood-nvo3-oam-requirements], [I-D.nordmark-nvo3-transcending-traceroute], [I-D.pang-nvo3-vxlan-path-detection], [I-D.saum-nvo3-pmtud-over-vxlan], and [I-D.singh-nvo3-vxlan-router-alert].
TBD
It is expected that the encapsulation of an overlay network uses one of methods discussed in [I-D.ietf-rtgwg-dt-encap] to distinctly identify the payload as OAM, i.e. non-user, packet. In its turn all Overlay OAM protocols share the common Overlay OAM Header. Format and processing of the header are outside the scope of this document and will be presented in the solution document.
Protocols that enable Fault Management functions of OAM toolset are comprised of protocols that perform proactive and on-demand defect detection and failure localization.
Bidirectional Forwarding Detection (BFD) has been designed as proactive Continuity Check protocol. [RFC6428] defined extension to support Connectivity Verification in MPLS-TP networks . Following BFD specifications can be used in overlay networks:
. Bit-Indexed Explicit Replication (BIER) provides the multicast service. For that BFD over multipoint network [I-D.ietf-bfd-multipoint] and [I-D.ietf-bfd-multipoint-active-tail] are the most suitable of BFD family Figure 1 presents IP/UDP format of BFD over BIER in MPLS network.
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Label Stack Element | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Label Stack Element | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | BIER-MPLS label | |1| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 1 0 1| Ver | Len | Entropy | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | BitString (first 32 bits) ~ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ ~ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ BitString (last 32 bits) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |OAM| Reserved | Proto | BFIR-id | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ IP Header ~ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Port | Destination Port (3784) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Length | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ BFD control packet ~ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 1: BFD over BIER with IP/UDP format
Proto field MUST be set to IPv4 or IPv6 vlalue. Note that IP Destination address in Figure 1 must follow Section 7 [RFC5884], i.e. ?the destination IP address MUST be randomly chosen from the 127/8 range for IPv4 and from the 0:0:0:0:0:FFFF:7F00/104 range for IPv6.? BFD packets in the reverse direction of the BFD session will be transmitted on IP network to the IP address mapped to the BFIR-id and the destination UDP port number set as source UDP port number of the received BFD packet.
IP/UDP format presents overhead, particularly in case of IPv6 address family. Thus option to avoid use of extra headers for OAM seems attractive. Figure 2 presents G-ACh format of BFD over BIER in MPLS network. Proto field of the BIER header MUST be set to OAM value. BFD control packet follows the BIER OAM header as defined in [I-D.kumarzheng-bier-ping]. According to the Section 3.1 of [I-D.kumarzheng-bier-ping], Ver is set to 1; BFD control packet over multi-point without or with active tail accordingly identified in Message Type Field. The Proto field ?is used to define if there is any data packet immediately following the OAM payload?.
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Label Stack Element | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Label Stack Element | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | BIER-MPLS label | |1| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 1 0 1| Ver | Len | Entropy | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | BitString (first 32 bits) ~ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ ~ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ BitString (last 32 bits) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |OAM| Reserved | Proto | BFIR-id | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Ver | Message Type | Proto | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ BFD control packet ~ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 2: BFD over BIER with G-ACh format
There is currently no WG document on proactive CC/CV. The individual requirements document [I-D.ashwood-nvo3-oam-requirements] covers this and there is a related proposal for BFD over VXLAN in [I-D.spallagatti-bfd-vxlan].
On-demand Continuity Check and Connectivity Verification protocols include:
[I-D.kumarzheng-bier-ping] defines format of Echo Request/Reply control packet and set of TLVs that can be used to perform failure detection and isolation in BIER domain over MPLS network.
There is currently no WG document for on-demand CC/CV.
Individual documents exist for tracing such as [I-D.pang-nvo3-vxlan-path-detection], and [I-D.nordmark-nvo3-transcending-traceroute].
There is currently no WG document on Alarm Indication Signal.
The individual draft [I-D.nordmark-nvo3-transcending-traceroute] suggests reusing ICMP errors for defect indications.
These protocols may be considered for Overlay Performance Measurement (PM) OAM:
Requirements towards PM OAM for overlay networks are listed in the Section 4.2 [I-D.ooamdt-rtgwg-ooam-requirement]. Two sets of performance measurement protocols had been developed at IETF so far:
[RFC6374] can be used as foundation of active PM OAM in overlay networks. The YANG data model [RFC6020] of the packet loss and delay measurement based on [RFC6374] can improve control and increase operational value of active performance measurement in overlay networks.
Currently there is no draft related to active PM OAM in the WG.
Performance management has been discussed in NVO3 but there is currently no draft in the WG.
[I-D.mirsky-bier-pmmm-oam] describes how the Marking Method can be used in BIER domain over MPLS networks.
Marking has been discussed in NVO3 sessions, but there is no draft in the working group.
The Generic Protocol Extension for VXLAN [I-D.ietf-nvo3-vxlan-gpe], Generic Network Virtualization Encapsulation [I-D.ietf-nvo3-geneve], Generic UDP Encapsulation [I-D.ietf-nvo3-gue] are just some examples of the new encapsulations to support network virtualization. NVO3 PM would be used to probe the NV Edge to NV Edge tunnels and NV Edge entity status for a DC network. The main requirement for Performance Management is to be able to support measurement of the frame loss, delay and delay variation between two NV Edge devices that support the same VNI within a given NVO3 domain on per VNI basis. Alternate Marking Method [I-D.ietf-ippm-alt-mark] enables calculation of these metrics but sets forth requirements toward overlay encapsulation to make use of the AMM behave in the network as passive OAM per definition in [RFC7799].
In the SFC architecture SF, SFF, Classifier and NSH Proxy Agent are the elements that can incorporate the measurement agent functionality to support SFC performance measurement. The required OAM Performance Measurement, as described in [I-D.ietf-sfc-oam-framework] highlight the capability to assess the monitoring at SF and SFF or a Set of SF/SFF, both in case of SFC-aware SF and SFC-unaware SF; the monitoring of SFP (and RSP) that comprises a set of SFs that may be ordered or unordered; the monitoring of the Classifiers operation and the monitoring of the SFC as a whole.
Performance measurement includes measuring of packet loss, delay, delay variation and could be performed by the marking method proposed in [I-D.ietf-ippm-alt-mark]. To make use of the marking method behave as passive OAM, as defined in [RFC7799], the overlay network encapsulation should allocate the field, preferrably two bits long, whose value does not affect how a packet is treated by the overlay network.
Excessive use of the in-band OAM channel may affect user flow and thus change network behavior. For example, if operator uses passive measurement exporting massive amount of data over the OAM channel may affect network. I think that a management channel should be used in such case. Obviously it may traverse the same nodes and links but may not require the same QoS. We can refer to LMAP Reference Model [RFC7594] with Controller, Measurement Agent and Data Collector.
[I-D.lapukhov-dataplane-probe] proposes transport independent generic telemetry probe structure.
This document does not propose any IANA consideration. This section may be removed.
TBD
[RFC2119] | Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997. |