OPSAWG | H. Song, Ed. |
Internet-Draft | Futurewei |
Intended status: Informational | F. Qin |
Expires: April 10, 2020 | China Mobile |
P. Martinez-Julia | |
NICT | |
L. Ciavaglia | |
Nokia | |
A. Wang | |
China Telecom | |
October 8, 2019 |
Network Telemetry Framework
draft-ietf-opsawg-ntf-02
Network telemetry is the technology for gaining network insight and facilitating efficient and automated network management. It engages various techniques for remote data collection, correlation, and consumption. This document provides an architectural framework for network telemetry, motivated by the network operation challenges and requirements. As evidenced by some key characteristics and industry practices, network telemetry covers technologies and protocols beyond the conventional network Operations, Administration, and Management (OAM). It promises better flexibility, scalability, accuracy, coverage, and performance and allows automated control loops to suit both today's and tomorrow's network operation. This document clarifies the terminologies and classifies the modules and components of a network telemetry system from several different perspectives. To the best of our knowledge, this document is the first such effort for network telemetry in industry standards organizations. The framework and taxonomy help to set a common ground for the collection of related work and provide guidance for future technique and standard developments.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 10, 2020.
Copyright (c) 2019 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
Network visibility is the ability of management tools to see the state and behavior of a network. It is essential for successful network operation. Network telemetry is the process of measuring, correlating, recording, and distributing information about the behavior of a network. Network telemetry has been considered as an ideal means to gain sufficient network visibility with better flexibility, scalability, accuracy, coverage, and performance than some conventional network Operations, Administration, and Management (OAM) techniques.
However, so far the term of network telemetry lacks a solid and unambiguous definition. The scope and coverage of it cause confusion and misunderstandings. It is beneficial to clarify the concept and provide a clear architectural framework for network telemetry, so we can articulate the technical field, and better align the related techniques and standard works.
To fulfill such an undertaking, we first discuss some key characteristics of network telemetry which set a clear distinction from the conventional network OAM and show that some conventional OAM technologies can be considered a subset of the network telemetry technologies. We then provide an architectural framework from three different perspectives for network telemetry. We show how network telemetry can meet the current and future network operation requirements, and the challenges each telemetry module is facing. Based on the distinction of modules and function components, we can easily map the existing and emerging techniques and protocols into the framework. At last, we outline a road-map for the evolution of the network telemetry system and discuss the potential security concerns for network telemetry.
The purpose of the framework and taxonomy is to set a common ground for the collection of related work and provide guidance for future technique and standard developments. To the best of our knowledge, this document is the first such effort for network telemetry in industry standards organizations.
The term of Big data is used to describe the extremely large volume of data sets that can be analyzed computationally to reveal patterns, trends, and associations. Network is undoubtedly a source of big data because of its scale and all the traffic goes through it. It is easy to see that network OAM can benefit from network big data.
Today one can easily access advanced big data analytics capability through a plethora of commercial and open source platforms (e.g., Apache Hadoop), tools (e.g., Apache Spark), and techniques (e.g., machine learning). Thanks to the advance of computing and storage technologies, network big data analytics gives network operators an unprecedented opportunity to gain network insights and move towards network autonomy. Some operators start to explore the application of Artificial Intelligence (AI) to make sense of network data. Software tools can use the network data to detect and react on network faults, anomalies, and policy violations, as well as predicting future events. In turn, the network policy updates for planning, intrusion prevention, optimization, and self-healing may be applied.
It is conceivable that an intent-driven autonomic network is the logical next step for network evolution following Software Defined Network (SDN), aiming to reduce (or even eliminate) human labor, make the most efficient usage of network resources, and provide better services more aligned with customer requirements. Although it takes time to reach the ultimate goal, the journey has started nevertheless.
However, while the data processing capability is improved and applications are hungry for more data, the networks lag behind in extracting and translating network data into useful and actionable information. The system bottleneck is shifting from data consumption to data supply. Both the number of network nodes and the traffic bandwidth keep increasing at a fast pace. The network configuration and policy change at a much smaller time slot than ever before. More subtle events and fine-grained data through all network planes need to be captured and exported in real time. In a nutshell, it is a challenge to get enough high-quality data out of network efficiently, timely, and flexibly. Therefore, we need to examine the existing network technologies and protocols, and identify any potential technique and standard gaps based on the real network and device architectures.
In the remaining of this section, first we discuss several key use cases for today's and future network operations. Next, we show why the current network OAM techniques and protocols are insufficient for these use cases. The discussion underlines the need of new methods, techniques, and protocols which we may assign under an umbrella term - network telemetry.
These use cases are essential for network operations. While the list is by no means exhaustive, it is enough to highlight the requirements for data velocity, variety, volume, and veracity in networks.
For a long time, network operators have relied upon SNMP, Command-Line Interface (CLI), or Syslog to monitor the network. Some other OAM techniques as described in [RFC7276] are also used to facilitate network troubleshooting. These conventional techniques are not sufficient to support the above use cases for the following reasons:
Before further discussion, we list some key terminology and acronyms used in this documents. We make an intended distinction between network telemetry and network OAM.
Network telemetry has emerged as a mainstream technical term to refer to the newer data collection and consumption techniques, distinguishing itself from the convention techniques for network OAM. The representative techniques and protocols include IPFIX and gPRC. Network telemetry allows separate entities to acquire data from network devices so that data can be visualized and analyzed to support network monitoring and operation. Network telemetry overlaps with the conventional network OAM and has a wider scope than it. It is expected that network telemetry can provide the necessary network insight for autonomous networks and address the shortcomings of conventional OAM techniques.
One difference between the network telemetry and the network OAM is that the network telemetry assumes machines as data consumer rather than human operators. Hence, the network telemetry can directly trigger the automated network operation, while the conventional OAM tools usually help human operators to monitor and diagnose the networks and guide manual network operations. The difference leads to very different techniques.
Although the network telemetry techniques are just emerging and subject to continuous evolution, several characteristics of network telemetry have been well accepted (Note that network telemetry is intended to be an umbrella term covering a wide spectrum of techniques, so the following characteristics are not expected to be held by every specific technique):
In addition, an ideal network telemetry solution may also have the following features or properties:
It is worth noting that, no matter how sophisticated a network telemetry system is, it should not be intrusive to networks, by avoiding the pitfall of the "observer effect". That is, it should not change the network behavior and affect the forwarding performance.
Although in many cases a network telemetry system is akin to the SDN architecture, it is important to understand that network telemetry does not infer the need of any centralized data processing and analytics engine. Telemetry data producers and consumers can perfectly work in distributed or peer-to-peer fashions instead.
Big data analytics and machine-learning based AI technologies are applied for network operation automation, relying on abundant data from networks. The single-sourced and static data acquisition cannot meet the data requirements. It is desirable to have a framework that integrates multiple telemetry approaches from different layers. This allows flexible combinations for different applications. The framework would benefit application development for the following reasons:
A telemetry framework collects together all of the telemetry-related work from different sources and working groups within the IETF. This makes it possible to assemble a comprehensive network telemetry system and to avoid repetitious or redundant work. The framework should cover the concepts and components from the standardization perspective. This document clarifies the layered modules on which the telemetry is exerted and decomposes the telemetry system into a set of distinct components that the existing and future work can easily map to.
Network telemetry techniques can be classified from multiple dimensions. In this document, we provide three unique perspectives: data acquiring mechanisms, data objects, and function components.
Broadly speaking, network data can be acquired through subscription (push) and query (poll). A subscriber may request data when it is ready. It follows a Publish-Subscription (Pub-Sub) mode or a Subscription-Publish (Sub-Pub) mode. In the Pub-Sub mode, pre-defined data are published and multiple qualified subscribers can subscribe the data. In the Sub-Pub mode, a subscriber designates what data are of interest and demands the network devices to deliver the data when they are available.
In contrast, a querier expects immediate feedback from network devices. It is usually used in a more interactive environment. The queried data may be directly extracted from some specific data source, or synthesized and processed from raw data.
There are four types of data from network devices:
The above data types are not mutually exclusive. For example, event-triggered data can be simple or complex, and streaming data can be event triggered. The relationships of these data types are illustrated in Figure 1
+--------------------------+ | +----------------------+ | | | +-----------------+ | | | | | +-------------+ | | | | | | | Simple Data | | | | | | | +-------------+ | | | | | | Complex Data | | | | | +-----------------+ | | | | Event-triggered Data | | | +----------------------+ | | Streaming Data | +--------------------------+
Figure 1: Data Type Relationship
Subscription usually deals with event-triggered data and streaming data, and query usually deals with simple data and complex data. It is easy to see that conventional OAM techniques are mostly about querying simple data only. While these techniques are still useful, advanced network telemetry techniques pay more attention on the other three data types, and prefer event/streaming data subscription and complex data query over simple data query.
Telemetry can be applied on the forwarding plane, the control plane, and the management plane in a network, as well as other sources out of the network, as shown in Figure 2. Therefore, we categorize the network telemetry into four distinct modules with each having its own interface to Network Operation Applications.
+------------------------------+ | | | Network Operation |<-------+ | Applications | | | | | +------------------------------+ | ^ ^ ^ | | | | | V | V V +-----------|---+--------------+ +-----------+ | | | | | | | Control Pl|ane| | | External | | Telemetry | <---> | | Data and | | | | | | Event | | ^ V | Management | | Telemetry | +------|--------+ Plane | | | | V | Telemetry | +-----------+ | Forwarding | | | Plane <---> | | Telemetry | | | | | +---------------+--------------+
Figure 2: Modules in Layer Category of NTF
The rationale of this partition lies in the different telemetry data objects which result in different data source and export locations. Such differences have profound implications on in-network data programming and processing capability, data encoding and transport protocol, and data bandwidth and latency.
We summarize the major differences of the four modules in the following table. They are mainly compared from six aspects: data object, data export location, data model, data encoding, telemetry protocol, and transport method. Data object is the target and source of each module. Because the data source varies, the data export location varies. Because each data export location has different capability, the proper data model, encoding, and transport method cannot be kept the same. As a result, the suitable telemetry protocol for each module can be different. Some representative techniques are shown in some table blocks to highlight the technical diversity of these modules. One cannot expect to use a universal protocol to cover all the network telemetry requirements.
+---------+--------------+--------------+--------------+-----------+ | Module | Control | Management | Forwarding | External | | | Plane | Plane | Plane | Data | +---------+--------------+--------------+--------------+-----------+ |Object | control | config. & | flow & packet| terminal, | | | protocol & | operation | QoS, traffic | social & | | | signaling, | state, MIB | stat., buffer| environ- | | | RIB, ACL | | & queue stat.| mental | +---------+--------------+--------------+--------------+-----------+ |Export | main control | main control | fwding chip | various | |Location | CPU, | CPU | or linecard | | | | linecard CPU | | CPU; main | | | | or fwding | | control CPU | | | | chip | | unlikely | | +---------+--------------+--------------+--------------+-----------+ |Data | YANG, | MIB, syslog, | template, | YANG | |Model | custom | YANG, | YANG, | | | | | custom | custom | | +---------+--------------+--------------+--------------+-----------+ |Data | GPB, JSON, | GPB, JSON, | plain | GPB, JSON | |Encoding | XML, plain | XML | | XML, plain| +---------+--------------+--------------+--------------+-----------+ |Protocol | gRPC,NETCONF,| gPRC,NETCONF,| IPFIX, mirror| gRPC | | | IPFIX,mirror | | | | +---------+--------------+--------------+--------------+-----------+ |Transport| HTTP, TCP, | HTTP, TCP | UDP | HTTP,TCP | | | UDP | | | UDP | +---------+--------------+--------------+--------------+-----------+
Figure 3: Comparison of the Data Object Modules
Note that the interaction with the network operation applications can be indirect. For example, in the management plane telemetry, the management plane may need to acquire data from the data plane. Some of the operational states can only be derived from the data plane such as the interface status and statistics. For another example, the control plane telemetry may need to access the Forwarding Information Base (FIB) in data plane. On the other hand, an application may involve more than one plane simultaneously. For example, an SLA compliance application may require both the data plane telemetry and the control plane telemetry.
The management plane of network elements interacts with the Network Management System (NMS), and provides information such as performance data, network logging data, network warning and defects data, and network statistics and state data. Some legacy protocols, such as SNMP and Syslog, are widely used for the management plane. However, these protocols are insufficient to meet the requirements of the future automated network operation applications.
New management plane telemetry protocols should consider the following requirements:
The control plane telemetry refers to the health condition monitoring of different network protocols, which covers Layer 2 to Layer 7. Keeping track of the running status of these protocols is beneficial for detecting, localizing, and even predicting various network issues, as well as network optimization, in real-time and in fine granularity.
One of the most challenging problems for the control plane telemetry is how to correlate the E2E Key Performance Indicators (KPI) to a specific layer's KPIs. For example, an IPTV user may describe his User Experience (UE) by the video fluency and definition. Then in case of an unusually poor UE KPI or a service disconnection, it is non-trivial work to delimit and localize the issue to the responsible protocol layer (e.g., the Transport Layer or the Network Layer), the responsible protocol (e.g., ISIS or BGP at the Network Layer), and finally the responsible device(s) with specific reasons.
Traditional OAM-based approaches for control plane KPI measurement include PING (L3), Tracert (L3), Y.1731 (L2) and so on. One common issue behind these methods is that they only measure the KPIs instead of reflecting the actual running status of these protocols, making them less effective or efficient for control plane troubleshooting and network optimization. An example of the control plane telemetry is the BGP monitoring protocol (BMP), it is currently used to monitoring the BGP routes and enables rich applications, such as BGP peer analysis, AS analysis, prefix analysis, security analysis, and so on. However, the monitoring of other layers, protocols and the cross-layer, cross-protocol KPI correlations are still in their infancy (e.g., the IGP monitoring is missing), which require substantial further research.
An effective data plane telemetry system relies on the data that the network device can expose. The data's quality, quantity, and timeliness must meet some stringent requirements. This raises some challenges to the network data plane devices where the first hand data originate.
The industry has agreed that the data plane programmability is essential to support network telemetry. Newer data plane chips are all equipped with advanced telemetry features and provide flexibility to support customized telemetry functions.
There can be multiple possible dimensions to classify the data plane telemetry techniques.
Events that occur outside the boundaries of the network system are another important source of telemetry information. Correlating both internal telemetry data and external events with the requirements of network systems, as presented in Exploiting External Event Detectors to Anticipate Resource Requirements for the Elastic Adaptation of SDN/NFV Systems, provides a strategic and functional advantage to management operations.
As with other sources of telemetry information, the data and events must meet strict requirements, especially in terms of timeliness, which is essential to properly incorporate external event information to management cycles. Thus, the specific challenges are described as follows:
Organizing together both internal and external telemetry information will be key for the general exploitation of the management possibilities of current and future network systems, as reflected in the incorporation of cognitive capabilities to new hardware and software (virtual) elements.
At each plane, the telemetry can be further partitioned into five distinct components:
+----------------------------------------+ | | | Data Query, Analysis, & Storage | | | +----------------------------------------+ | ^ | | V | +---------------------+------------------+ | Data Configuration | | | & Subscription | Data Encoding | | (model, template, | & Export | | & program) | | +---------------------+------------------| | | | Data Generation | | & Processing | | | +----------------------------------------| | | | Data Object and Source | | | +----------------------------------------+
Figure 4: Components in the Network Telemetry Framework
The following two tables provide a non-exhaustive list of existing works (mainly published in IETF and with the emphasis on the latest new technologies) and shows their positions in the framework. The details about the mentioned work can be found in Appendix A.
+-----------------+---------------+----------------+ | | Query | Subscription | | | | | +-----------------+---------------+----------------+ | Simple Data | SNMP, NETCONF,| | | | YANG, BMP, | | | | IOAM, PBT,gPRC| | +-----------------+---------------+----------------+ | Complex Data | DNP, YANG FSM | | | | gRPC, NETCONF | | +-----------------+---------------+----------------+ | Event-triggered | | gRPC, NETCONF, | | Data | | YANG PUSH, DNP | | | | IOAM, PBT, | | | | YANG FSM | +-----------------+---------------+----------------+ | Streaming Data | | gRPC, NETCONF, | | | | IOAM, PBT, DNP | | | | IPFIX, IPFPM | +-----------------+---------------+----------------+
Figure 5: Existing Work Mapping I
+--------------+---------------+----------------+---------------+ | | Management | Control | Forwarding | | | Plane | Plane | Plane | +--------------+---------------+----------------+---------------+ | data Config. | gRPC, NETCONF,| NETCONF/YANG | NETCONF/YANG, | | & subscrib. | YANG PUSH | | YANG FSM | +--------------+---------------+----------------+---------------+ | data gen. & | DNP, | DNP, | IOAM, | | processing | YANG | YANG | PBT, IPFPM, | | | | | DNP | +--------------+---------------+----------------+---------------+ | data | gRPC, NETCONF | BMP, NETCONF | IPFIX | | export | YANG PUSH | | | +--------------+---------------+----------------+---------------+
Figure 6: Existing Work Mapping II
As the network is evolving towards the automated operation, network telemetry also undergoes several levels of evolution.
While most of the existing technologies belong to level 0 and level 1, with the help of a clearly defined network telemetry framework, we can assemble the technologies to support level 2 and make solid steps towards level 3.
Given that this document has proposed a framework for network telemetry and the telemetry mechanisms discussed are distinct (in both message frequency and traffic amount) from the conventional network OAM concepts, we must also reflect that various new security considerations may also arise. A number of techniques already exist for securing the data plane, control plane, and the management plane in a network, but the it is important to consider if any new threat vectors are now being enabled via the use of network telemetry procedures and mechanisms.
Security considerations for networks that use telemetry methods may include:
Some of the security considerations highlighted above may be minimized or negated with policy management of network telemetry. In a network telemetry deployment it would be advantageous to separate telemetry capabilities into different classes of policies, i.e., Role Based Access Control and Event-Condition-Action policies. Also, potential conflicts between network telemetry mechanisms must be detected accurately and resolved quickly to avoid unnecessary network telemetry traffic propagation escalating into an unintended or intended denial of service attack.
Further discussion and development of this section will be required, and it is expected that this security section, and subsequent policy section will be developed further.
This document includes no request to IANA.
The other contributors of this document are listed as follows.
We would like to thank Randy Presuhn, Joe Clarke, Victor Liu, James Guichard, Uri Blumenthal, Giuseppe Fioccola, Yunan Gu, Parviz Yegani, Young Lee, Alexander Clemm, Qin Wu, and many others who have provided helpful comments and suggestions to improve this document.
In this non-normative appendix, we provide an overview of some existing techniques and standard proposals for each network telemetry module.
NETCONF is one popular network management protocol, which is also recommended by IETF. Although it can be used for data collection, NETCONF is good at configurations. YANG Push extends NETCONF and enables subscriber applications to request a continuous, customized stream of updates from a YANG datastore. Providing such visibility into changes made upon YANG configuration and operational objects enables new capabilities based on the remote mirroring of configuration and operational state. Moreover, distributed data collection mechanism via UDP based publication channel provides enhanced efficiency for the NETCONF based telemetry.
gRPC Network Management Interface (gNMI) is a network management protocol based on the gRPC RPC (Remote Procedure Call) framework. With a single gRPC service definition, both configuration and telemetry can be covered. gRPC is an HTTP/2 based open source micro service communication framework. It provides a number of capabilities which are well-suited for network telemetry, including:
BGP Monitoring Protocol (BMP) is used to monitor BGP sessions and intended to provide a convenient interface for obtaining route views.
The BGP routing information is collected from the monitored device(s) to the BMP monitoring station by setting up the BMP TCP session. The BGP peers are monitored by the BMP Peer Up and Peer Down Notifications. The BGP routes (including Adjacency_RIB_In, Adjacency_RIB_out, and Local_Rib are encapsulated in the BMP Route Monitoring Message and the BMP Route Mirroring Message, in the form of both initial table dump and real-time route update. In addition, BGP statistics are reported through the BMP Stats Report Message, which could be either timer triggered or event-driven. More BMP extensions can be explored to enrich the applications of BGP monitoring.
The Alternate Marking method is efficient to perform packet loss, delay, and jitter measurements both in an IP and Overlay Networks, as presented in IPFPM and [I-D.fioccola-ippm-multipoint-alt-mark].
This technique can be applied to point-to-point and multipoint-to-multipoint flows. Alternate Marking creates batches of packets by alternating the value of 1 bit (or a label) of the packet header. These batches of packets are unambiguously recognized over the network and the comparison of packet counters for each batch allows the packet loss calculation. The same idea can be applied to delay measurement by selecting ad hoc packets with a marking bit dedicated for delay measurements.
Alternate Marking method needs two counters each marking period for each flow under monitor. For instance, by considering n measurement points and m monitored flows, the order of magnitude of the packet counters for each time interval is n*m*2 (1 per color).
Since networks offer rich sets of network performance measurement data (e.g packet counters), traditional approaches run into limitations. One reason is the fact that the bottleneck is the generation and export of the data and the amount of data that can be reasonably collected from the network. In addition, management tasks related to determining and configuring which data to generate lead to significant deployment challenges.
Multipoint Alternate Marking approach, described in [I-D.fioccola-ippm-multipoint-alt-mark], aims to resolve this issue and makes the performance monitoring more flexible in case a detailed analysis is not needed.
An application orchestrates network performance measurements tasks across the network to allow an optimized monitoring and it can calibrate how deep can be obtained monitoring data from the network by configuring measurement points roughly or meticulously.
Using Alternate Marking, it is possible to monitor a Multipoint Network without examining in depth by using the Network Clustering (subnetworks that are portions of the entire network that preserve the same property of the entire network, called clusters). So in case there is packet loss or the delay is too high the filtering criteria could be specified more in order to perform a detailed analysis by using a different combination of clusters up to a per-flow measurement as described in IPFPM.
In summary, an application can configure end-to-end network monitoring. If the network does not experiment issues, this approximate monitoring is good enough and is very cheap in terms of network resources. However, in case of problems, the application becomes aware of the issues from this approximate monitoring and, in order to localize the portion of the network that has issues, configures the measurement points more exhaustively. So a new detailed monitoring is performed. After the detection and resolution of the problem the initial approximate monitoring can be used again.
Hardware-based Dynamic Network Probe (DNP) provides a programmable means to customize the data that an application collects from the data plane. A direct benefit of DNP is the reduction of the exported data. A full DNP solution covers several components including data source, data subscription, and data generation. The data subscription needs to define the complex data which can be composed and derived from the raw data sources. The data generation takes advantage of the moderate in-network computing to produce the desired data.
While DNP can introduce unforeseeable flexibility to the data plane telemetry, it also faces some challenges. It requires a flexible data plane that can be dynamically reprogrammed at run-time. The programming API is yet to be defined.
Traffic on a network can be seen as a set of flows passing through network elements. IP Flow Information Export (IPFIX) provides a means of transmitting traffic flow information for administrative or other purposes. A typical IPFIX enabled system includes a pool of Metering Processes collects data packets at one or more Observation Points, optionally filters them and aggregates information about these packets. An Exporter then gathers each of the Observation Points together into an Observation Domain and sends this information via the IPFIX protocol to a Collector.
Traditional passive and active monitoring and measurement techniques are either inaccurate or resource-consuming. It is preferable to directly acquire data associated with a flow's packets when the packets pass through a network. In-situ OAM (iOAM), a data generation technique, embeds a new instruction header to user packets and the instruction directs the network nodes to add the requested data to the packets. Thus, at the path end, the packet's experience gained on the entire forwarding path can be collected. Such firsthand data is invaluable to many network OAM applications.
However, iOAM also faces some challenges. The issues on performance impact, security, scalability and overhead limits, encapsulation difficulties in some protocols, and cross-domain deployment need to be addressed.
PBT is an alternative to IOAM. PBT directly exports data at each node through an independent packet. PBT solves several issues of IOAM. It can also help to identify packet drop location in case a packet is dropped on its forwarding path.
To ensure that the information provided by external event detectors and used by the network management solutions is meaningful for the management purposes, the network telemetry framework must ensure that such detectors (sources) are easily connected to the management solutions (sinks). This requires the specification of a simple taxonomy of detectors and match it to the connectors and/or interfaces required to connect them.
Once detectors are classified in such taxonomy, their definitions are enlarged with the qualities and other aspects used to handle them and represented in the ontology and information model (e.g. YANG). Therefore, differentiating several types of detectors as potential sources of external events is essential for the integrity of the management framework. We thus differentiate the following source types of external events:
Additional types of detector types can be added to the system but they will be generally the result of composing the properties offered by these main classes. In any case, future revisions of the network telemetry framework will include the required types that cover new circumstances and that cannot be obtained by composition.
For allowing external event detectors to be properly integrated with other management solutions, both elements must expose interfaces and protocols that are subject to their particular objective. Since external event detectors will be focused on providing their information to their main consumers, which generally will not be limited to the network management solutions, the framework must include the definition of the required connectors for ensuring the interconnection between detectors (sources) and their consumers within the management systems (sinks) are effective.
In some situations, the interconnection between the external event detectors and the management system is via the management plane. For those situations there will be a special connector that provides the typical interfaces found in most other elements connected to the management plane. For instance, the interfaces will accomplish with a specific information model (YANG) and specific telemetry protocol, such as NETCONF, SNMP, or gRPC.