Network Working Group | H. Song, Ed. |
Internet-Draft | T. Zhou |
Intended status: Informational | Z. Li |
Expires: September 19, 2018 | Huawei |
March 18, 2018 |
Toward a Network Telemetry Framework
draft-song-ntf-01
This document suggests the necessity for an architectural framework to address network telemetry and articulates the categories and components of such a framework. The requirements, challenges, existing solutions, and future directions are discussed for each category of the framework. The framework for network telemetry helps to set some common ground for the collection of related works and put future developments into perspective.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on September 19, 2018.
Copyright (c) 2018 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
An intent-driven automated network is the logical next step for network evolution, aiming to reduce human labor, make the most efficient use of network resources, and provide better services more aligned with customer requirements. Tools based on machine learning technologies and big data analytics are powerful for fault detection and isolation, identification of anomalies to normal behaviors, patterns, and policy violation detection. Some tools can even predict future events based on historical data. The observation and inference from collected network data can help guide network policy updates for planning, intrusion prevention, optimization, and self-healing. A closed control loop is therefore achieved.
Specifically, we have identified a few key network OAM use cases that network operators need the most. All these use cases involves the data extracted from the network data plane and sometimes from the network control plane and management plane:
These use cases show that the conventional OAM techniques are not enough for the following reasons:
For a long time, network OAM applications have relied upon protocols such as SNMP to monitor the network. SNMP can only provide limited information about the network. Since SNMP is poll-based, it incurs low data rate and high processing overhead. Such drawbacks make SNMP unsuitable for today's automatic network applications.
Network telemetry has emerged as a mainstream technical term to refer to the newer technologies of data collection and consumption in the IDN paradigm, distinguishing itself form the convention technologies for network OAM. It is expected that network telemetry can provide the necessary network visibility for automatic network OAM, address the shortcomings of conventional technologies, and allow for the emergence of new technologies.
Although the network telemetry technologies continue to evolve, several defining characteristics of network telemetry have been well accepted:
In addition, we believe the ideal network telemetry solution should also support the following features:
Big data analytics and machine-learning based AI technologies are applied for network OAM, relying on abundant data from networks. The single-sourced and static data acquisition cannot meet the data requirements. It is desirable to have a framework that integrates multiple telemetry approaches from different layers, and allows flexible combinations for different applications. The framework will benefit application development for the following reasons.
So far, some telemetry related work has been done within IETF. However, this work is fragmented and scattered in different working groups. The lack of coherence makes it difficult to assemble a comprehensive network telemetry system and causes repetitive and redundant work.
A formal network telemetry framework is needed for constructing a working system. The framework should cover the concepts and components from the standardization perspective. This document clarifies the layers on which the telemetry is exerted and decomposes the telemetry system into a set of distinct components that the existing and future work can easily map to.
Telemetry can be applied on the data plane, the control plane, and the management plane in a network, as shown in Figure 1.
+------------------------------+ | | | OAM Applications | | | +------------------------------+ ^ ^ ^ | | | V | V +-----------|---+--------------+ | | | | | Control Pl|ane| | | Telemetry | <---> | | | | | | ^ V | Management | +------|--------+ Plane | | V | Telemetry | | | | | Data Plane <---> | | Telemetry | | | | | +---------------+--------------+
Figure 1: Layer Category of the Network Telemetry Framework
Note that the interaction with OAM applications can be indirect. For example, in the management plane telemetry, the management plane may need to acquire data from the data plane. On the other hand, an OAM application may involve more than one plane simultaneously. For example, an SLA compliance application may require both the data plane telemetry and the control plane telemetry.
At each plane, the telemetry can be further partitioned into five distinct components:
+------------------------------+ | | | Data Analysis | | | +------------------------------+ | ^ | | V | +---------------+--------------+ | | | | Data | Data | | Subscription | Export | | | | +---------------+--------------| | | | Data Generation | | | +------------------------------| | | | Data Source | | | +------------------------------+
Figure 2: Components in the Network Telemetry Framework
Since most existing standard-related work belongs to the first four components, in the remainder of the document, we focus on these components only.
The following table provides a non-exhaustive list of existing works (mainly published in IETF and with the emphasis on the latest new technologies) and shows their positions in the framework.
+-----------+--------------+---------------+--------------+ | | Management | Control | Data | | | Plane | Plane | Plane | +-----------+--------------+---------------+--------------+ | | YANG Data | Control Proto.| Flow/Packet | | Data | Store | Network State | Statistics | | Source | | | States | | | | | | +-----------+--------------+---------------+--------------+ | | gPRC | NETCONF/YANG | NETCONF/YANG | | Data | YANG PUSH | BGP | YANG FSM | | Subscribe | | | | | | | | | +-----------+--------------+---------------+--------------+ | | Soft DNP | Soft DNP | In-situ OAM | | Data | | | IPFPM | | Generation| | | Hard DNP | | | | | | +-----------+--------------+---------------+--------------+ | | gRPC | BMP | IPFIX | | Data | YANG PUSH | | UDP | | Export | UDP | | | | | | | | +-----------+--------------+---------------+--------------+
Figure 3: Existing Work
The management plane of the network element interacts with the Network Management System (NMS), and provides information such as performance data, network logging data, network warning and defects data, and network statistics and state data. Some legacy protocols are widely used for the management plane, such as SNMP and Syslog, but these protocols do not meet the requirements of the automatic network OAM applications.
New management plane telemetry protocols should consider the following requirements:
NETCONF is one popular network management protocol, which is also recommended by IETF. Although it can be used for data collection, NETCONF is good at configurations. YANG Push extends NETCONF and enables subscriber applications to request a continuous, customized stream of updates from a YANG datastore. Providing such visibility into changes made upon YANG configuration and operational objects enables new capabilities based on the remote mirroring of configuration and operational state. Moreover, distributed data collection mechanism via UDP based publication channel provides enhanced efficiency for the NETCONF based telemetry.
gRPC Network Management Interface (gNMI) is a network management protocol based on the gRPC RPC (Remote Procedure Call) framework. With a single gRPC service definition, both configuration and telemetry can be covered. gRPC is an HTTP/2 based open source micro service communication framework. It provides a number of capabilities that makes it well-suited for network telemetry, including:
The control plane runs the routing protocol (e.g., BGP, OSPF, and IS-IS) to calculate the routing table for a network device. The control plane telemetry monitors the routing protocols to ensure they are working properly.
BGP Monitoring Protocol (BMP) is used to monitor BGP sessions and intended to provide a convenient interface for obtaining route views. The data is collected from the Adjacency-RIB-In routing tables, which are the pre-policy tables, meaning that the routes in these tables have not been filtered or modified by routing policies. So the monitoring station can receive all routes, not just the active routes.
An effective data plane telemetry system relies on the data that the network device can expose. The data's quality, quantity, and timeliness must meet some stringent requirements. This raises some challenges to the network data plane devices where the first hand data originate.
Hardware based Dynamic Network Probe (DNP) provides a programmable means to customize the data that an application collects from the data plane. A direct benefit of DNP is the reduction of the exported data. A full DNP solution covers several components including data source, data subscription, and data generation. The data subscription needs to define the custom data which can be composed and derived from the raw data sources. The data generation takes advantage of the moderate in-network computing to produce the desired data.
While DNP can introduce unforeseeable flexibility to the data plane telemetry, it also faces some challenges. It requires a flexible data plane that can be dynamically reprogrammed at runtime. The programming API is yet to be defined.
Traffic on a network can be seen as a set of flows passing through network elements. IP Flow Information Export (IPFIX) provides a means of transmitting traffic flow information for administrative or other purposes. A typical IPFIX enabled system includes a pool of Metering Processes collects data packets at one or more Observation Points, optionally filters them and aggregates information about these packets. An Exporter then gathers each of the Observation Points together into an Observation Domain and sends this information via the IPFIX protocol to a Collector.
Traditional passive and active monitoring and measurement techniques are either inaccurate or resource-consuming. It is preferable to directly acquire data associated with a flow's packets when the packets pass through a network. In-situ OAM (iOAM), a data generation technique, embeds a new instruction header to user packets and the instruction directs the network nodes to add the requested data to the packets. Thus, at the path end the packet's experience on the entire forwarding path can be collected. Such firsthand data is invaluable to many network OAM applications.
However, iOAM also faces some challenges. The issues on performance impact, security, scalability and overhead limits, encapsulation difficulties in some protocols, and cross-domain deployment need to be addressed.
TBD
This document includes no request to IANA.
The other contributors of this document are listed as follows.
TBD.
[RFC1157] | Case, J., Fedor, M., Schoffstall, M. and J. Davin, "Simple Network Management Protocol (SNMP)", RFC 1157, DOI 10.17487/RFC1157, May 1990. |
[RFC2119] | Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997. |
[RFC6241] | Enns, R., Bjorklund, M., Schoenwaelder, J. and A. Bierman, "Network Configuration Protocol (NETCONF)", RFC 6241, DOI 10.17487/RFC6241, June 2011. |
[RFC7011] | Claise, B., Trammell, B. and P. Aitken, "Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of Flow Information", STD 77, RFC 7011, DOI 10.17487/RFC7011, September 2013. |
[RFC7540] | Belshe, M., Peon, R. and M. Thomson, "Hypertext Transfer Protocol Version 2 (HTTP/2)", RFC 7540, DOI 10.17487/RFC7540, May 2015. |
[RFC7854] | Scudder, J., Fernando, R. and S. Stuart, "BGP Monitoring Protocol (BMP)", RFC 7854, DOI 10.17487/RFC7854, June 2016. |
[I-D.brockners-inband-oam-requirements] | Brockners, F., Bhandari, S., Dara, S., Pignataro, C., Gredler, H., Leddy, J., Youell, S., Mozes, D., Mizrahi, T., <>, P. and r. remy@barefootnetworks.com, "Requirements for In-situ OAM", Internet-Draft draft-brockners-inband-oam-requirements-03, March 2017. |
[I-D.ietf-netconf-udp-pub-channel] | Zheng, G., Zhou, T. and A. Clemm, "UDP based Publication Channel for Streaming Telemetry", Internet-Draft draft-ietf-netconf-udp-pub-channel-02, March 2018. |
[I-D.ietf-netconf-yang-push] | Clemm, A., Voit, E., Prieto, A., Tripathy, A., Nilsen-Nygaard, E., Bierman, A. and B. Lengyel, "YANG Datastore Subscription", Internet-Draft draft-ietf-netconf-yang-push-15, February 2018. |
[I-D.kumar-rtgwg-grpc-protocol] | Kumar, A., Kolhe, J., Ghemawat, S. and L. Ryan, "gRPC Protocol", Internet-Draft draft-kumar-rtgwg-grpc-protocol-00, July 2016. |
[I-D.openconfig-rtgwg-gnmi-spec] | Shakir, R., Shaikh, A., Borman, P., Hines, M., Lebsack, C. and C. Morrow, "gRPC Network Management Interface (gNMI)", Internet-Draft draft-openconfig-rtgwg-gnmi-spec-01, March 2018. |
[I-D.song-opsawg-dnp4iq] | Song, H. and J. Gong, "Requirements for Interactive Query with Dynamic Network Probes", Internet-Draft draft-song-opsawg-dnp4iq-01, June 2017. |
[I-D.zhou-netconf-multi-stream-originators] | Zhou, T., Zheng, G., Voit, E., Clemm, A. and A. Bierman, "Subscription to Multiple Stream Originators", Internet-Draft draft-zhou-netconf-multi-stream-originators-01, November 2017. |