IPPM | H. Song, Ed. |
Internet-Draft | Futurewei |
Intended status: Standards Track | T. Zhou |
Expires: December 14, 2019 | Z. Li |
Huawei | |
J. Shin | |
SK Telecom | |
K. Lee | |
LG U+ | |
June 12, 2019 |
Postcard-based On-Path Flow Data Telemetry
draft-song-ippm-postcard-based-telemetry-04
The Postcard-Based Telemetry (PBT) allows network OAM applications to collect telemetry data about any user packet. Unlike the E2E and trace modes in in-situ OAM (IOAM), PBT does not require user packets to carry any telemetry data, but directly exports the telemetry data from network nodes to a collector through separated OAM packets called postcards. Two variations of PBT, PBT-I and PBT-M, are described. PBT-I requires inserting an instruction header to user packets to guide the data collection. PBT-I is designed as another mode of IOAM, Per-Hop Postcard (PHP), to complement the existing operational modes of IOAM. PBT-M only marks the user packets or configure the flow filter to invoke the data collection. PBT-M also provides a complement to IOAM and address several implementation and deployment challenges of it.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on December 14, 2019.
Copyright (c) 2019 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
In order to gain detailed data plane visibility to support effective network OAM, it is important to be able to examine the trace of user packets along their forwarding paths. Such on-path flow data reflect the state and status of each user packet's real-time experience and provide valuable information for network monitoring, measurement, and diagnosis.
The telemetry data include but not limited to the detailed forwarding path, the timestamp/latency at each network node, and, in case of packet drop, the drop location and reason. The emerging programmable data plane devices allow user-defined data collection[I-D.song-opsawg-dnp4iq] or conditional data collection based on trigger events. Such on-path flow data are from and about the live user traffic, which complement with the data acquired through other passive and active OAM mechanisms such as IPFIX and ICMP.
In-band Network Telemetry (INT) was designed to cater this need (note that although INT has been widely used, the term "in-band" here does not comply with IETF's definition. "on-path" or "in-situ" may be more accurate terms). in-situ OAM (IOAM) represents the related standardization efforts. In essence, INT augments user packets with instructions to tell each network node on their forwarding paths what data to collect. The requested data are inserted into and travel along with the user packets. Some end nodes are responsible to strip off the data trace and export it to a data collector for processing.
While the concept is simple and straightforward, INT faces several technical challenges:
The above issues are inherent to the INT-based solutions. Nevertheless, the on-path data acquired by INT are valuable for network operators. Therefore, alternative approaches which can collect the same data but avoid or mitigate the above issues are desired. This document provides a new approach named Postcard-Based Telemetry (PBT) with two different implementation variations, each having its own trade-off and addressing some or all of the above issues. The basic idea of PBT is simple: at each node, instead of inserting the collected data into the user packets, the data are directly exported through dedicated OAM packets. Such "postcard" approach is in contrast to the "passport stamps" approach adopted by INT [DOI_10.1145_2342441.2342453]. The OAM packets or postcards can be generated by the node's slow path and transported in band or out of band, independent of the original user packets.
This section describes the first variation of PBT. PBT-M aims to address all the challenges of INT listed above and introduce some new benefits. We first list all the design requirements of PBT-M.
In light of the above discussion, the sketch of the proposed solution, PBT-M, is as follows. The user packet, if its path-associated data need to be collected, is marked at the path head node. At each PBT-aware node, if the mark is detected, a postcard (i.e., the dedicated OAM packet triggered by a marked user packet) is generated and sent to a collector. The postcard contains the data requested by the management plane. The requested data are configured by the management plane through data set templates (as in IPFIX). Once the collector receives all the postcards for a single user packet, it can infer the packet's forwarding path and analyze the data set. The path end node is configured to unmark the packets to its original format if necessary.
The overall architecture of PBT-M is depict in Figure 1.
+------------+ +-----------+ | Network | | Telemetry | | Management |(-------| Data | | | | Collector | +-----:------+ +-----------+ : ^ :configurations |postcards (OAM pkts) : | ...............:.....................|........ : : : | : : +---------:---+-----------:---+--+-------:---+ : | : | : | : | V | V | V | V | +------+-+ +-----+--+ +------+-+ +------+-+ usr pkts | Head | | Path | | Path | | End | ====>| Node |====>| Node |====>| Node |====>| Node |====> | | | A | | B | | | +--------+ +--------+ +--------+ +--------+ gen postcards gen postcards gen postcards gen postcards mark usr pkts unmark usr pkts
Figure 1: Architecture of PBT-M
Although PBT-M solves the issues of INT, it introduces a few new challenges.
To address the above challenges, we propose several design details of PBT-M.
To trigger the path-associated data collection, usually a single bit from some header field is sufficient. While no such bit is available, other packet marking techniques are needed. we discuss three possible application scenarios.
By default, all PBT-aware nodes are configured to react to the marked packets by exporting some basic data such as node ID and TTL before a data set template for that flow is configured. This way, the management plane can learn the flow path dynamically.
If the management plane wants to collect the path-associated data for some flow, it configures the head node(s) with a probability or time interval for the flow packet marking. When the first marked packet is forwarded in the network, the PBT-aware nodes will export the basic data to the collector. Hence, the flow path is identified. If other types of data need to be collected, the management plane can further configure the data set template to the target nodes on the flow's path. The PBT-aware nodes would collect and export data accordingly if the packet is marked and a data set template is present.
If for any reason, the flow path is changed. The new path nodes can be learned immediately by the collector, so the management plane controller can be informed to configure the new path nodes. The outdated configuration can be automatically timed out or explicitly revoked by the management plane controller.
The collector needs to correlate all the OAM packets for a single user packet. Once this is done, the TTL (or the timestamp, if the network time is synchronized) can be used to infer the flow forwarding path. The key issue here is to correlate all the postcards for a same user packet.
The first possible approach is to include the flow ID plus the user packet ID in the OAM packets. The flow ID can be the 5-tuple IP header of the user traffic. The user packet ID can be some unique information pertaining to a user packet (e.g., the sequence number of a TCP packet).
If the packet marking interval is large enough, then the flow ID itself is enough to identify the user packet. That is, we can assume all the exported OAM packets for the same flow during a short period of time belong to the same user packet.
Alternatively, if the network is synchronized, then the flow ID plus the timestamp at each node can also infer the postcard affiliation. However, some errors may occur under some circumstances. For example, if two consecutive user packets from the same flows are both marked but one exported postcard from a node is lost, then it is difficult for the collector to decide which user packet the remaining postcard belongs to. In many cases, such rare error has no catastrophic consequence therefore is tolerable.
It is possible to avoid needing to mark user packets yet still allowing in-band flow data collection. We could simply configure the Access Control List (ACL) to filter out the set of target flows. This approach has two potential issues: (1) Since the packet forwarding path is unknown in advance, one needs to configure all the nodes in a network to filter the flows and capture the complete data set. This wastes the precious ACL resource and is not scalable. (2) If a node cannot collect data for all the filtered packets of a flow, it needs to determine which packets to sample independently, so the collector may not be able to receive the full set of postcards for a same user packet.
Nevertheless, since this approach does not require to touch the user packets at all, it has its unique merits: (1) User can freely choose any nodes as vantage points for data collection; (2) No need to worry that any "modified" user packets to leak out of the PBT domain; (3) It has the minimum impact to the forwarding of the user traffic.
No data plane standard is required to support this mode, except the postcard format.
Since PBT-M has some challenges as listed in Section 2.3, this section describes another variation of PBT, which essentially compromises some of the design requirements listed in Section 2.1, yet retains most of the benefits of PBT.
PBT-I can be seen as a trade-off between INT and PBT-M. PBT-I needs to add a fixed length instruction header to user packets for OAM data collection. However, the collected data will be exported through dedicated postcards. On the one hand, PBT-I violates the Req. 1 in Section 2.1. It also makes it harder to meet the Req. 2. On the other hand, the overhead of the instruction header is fixed and user packets will not inflate with path length or telemetry data quantity. We also introduce an optimization to mitigate the impact on Req. 2. In return, PBT-I addresses all the challenges of PBT-M:
The sketch of the proposed solution, PBT-I, is as follows. If the path-associated data need to be collected for a user packet, a PHP header is inserted into the packet at the path head node. At each PBT-aware node, if PHP header is detected, a postcard is generated and sent to a collector. Once the collector receives all the postcards for a single user packet, it can combine and analyze the data set. The path end node is configured to remove the PHP header.
The overall architecture of PBT-I is depict in Figure 2. Note that in the figure we omit the controller which configures the nodes for necessary functions (e.g., head node encapsulation) and information (e.g., IP address of the data collector).
+-----------+ | Telemetry | | Data | | Collector | +-----------+ ^ |postcards (OAM pkts) | | | +--------------+------+-------+--------------+ | | | | | | | | +---+----+ +---+----+ +---+----+ +---+----+ usr pkts | Head | | Path | | Path | | End | ====>| Node |====>| Node |====>| Node |====>| Node |====> | | | A | | B | | | +--------+ +--------+ +--------+ +--------+ insert PHP Hdr remove PHP Hdr gen postcards gen postcards gen postcards gen postcards
Figure 2: Architecture of PBT-I
The proposed format of PHP Header is shown in Figure 3.
0 0 0 1 1 2 2 3 0 7 8 5 6 3 4 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Namespace ID | Flags | Hop Count | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IOAM-Trace-Type | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Flow ID (optional) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number (Optional) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 3: PHP Header Format
Both Flow ID and Sequence Number fields are optional. These two fields are either present or absent simultaneously. Therefore, the PHP header length can be either 8 bytes or 16 bytes, which is indicated by the upper layer encapsulation header. Making these two fields optional is to cater the need of minimizing the header overhead in case the postcard correlation can still be achieved without the help of flow ID and Sequence Number.
Postcard can use the same data export format as that used by IOAM. [I-D.spiegel-ippm-ioam-rawexport] proposes a raw format that can be interpreted by IPFIX.
Several security issues need to be considered.
An IOAM Mode ID for PHP needs to be assigned.
TBD.
We thank Barak Gafni, Mickey Spiegel, Frank Brockners, Zhenqiang Li, and others for reviewing and commenting the earlier version of the draft.
[DOI_10.1145_2342441.2342453] | Handigol, N., Heller, B., Jeyakumar, V., Maziéres, D. and N. McKeown, "Where is the debugger for my software-defined network?", Proceedings of the first workshop on Hot topics in software defined networks - HotSDN '12, DOI 10.1145/2342441.2342453, 2012. |
[I-D.brockners-inband-oam-requirements] | Brockners, F., Bhandari, S., Dara, S., Pignataro, C., Gredler, H., Leddy, J., Youell, S., Mozes, D., Mizrahi, T., Lapukhov, P. and r. Chang, "Requirements for In-situ OAM", Internet-Draft draft-brockners-inband-oam-requirements-03, March 2017. |
[I-D.brockners-inband-oam-transport] | Brockners, F., Bhandari, S., Govindan, V., Pignataro, C., Gredler, H., Leddy, J., Youell, S., Mizrahi, T., Mozes, D., Lapukhov, P. and R. Chang, "Encapsulations for In-situ OAM Data", Internet-Draft draft-brockners-inband-oam-transport-05, July 2017. |
[I-D.brockners-ippm-ioam-geneve] | Brockners, F., Bhandari, S., Govindan, V., Pignataro, C., Gredler, H., Leddy, J., Youell, S., Mizrahi, T., Mozes, D., Lapukhov, P. and R. Chang, "Geneve encapsulation for In-situ OAM Data", Internet-Draft draft-brockners-ippm-ioam-geneve-01, June 2018. |
[I-D.bryant-mpls-synonymous-flow-labels] | Bryant, S., Swallow, G., Sivabalan, S., Mirsky, G., Chen, M. and Z. Li, "RFC6374 Synonymous Flow Labels", Internet-Draft draft-bryant-mpls-synonymous-flow-labels-01, July 2015. |
[I-D.clemm-netconf-push-smart-filters-ps] | Clemm, A., Voit, E., Liu, X., Bryskin, I., Zhou, T., Zheng, G. and H. Birkholz, "Smart filters for Push Updates - Problem Statement", Internet-Draft draft-clemm-netconf-push-smart-filters-ps-00, October 2017. |
[I-D.ietf-ippm-alt-mark] | Fioccola, G., Capello, A., Cociglio, M., Castaldelli, L., Chen, M., Zheng, L., Mirsky, G. and T. Mizrahi, "Alternate Marking method for passive and hybrid performance monitoring", Internet-Draft draft-ietf-ippm-alt-mark-14, December 2017. |
[I-D.ietf-ippm-ioam-data] | Brockners, F., Bhandari, S., Pignataro, C., Gredler, H., Leddy, J., Youell, S., Mizrahi, T., Mozes, D., Lapukhov, P., Chang, R. and d. daniel.bernier@bell.ca, "Data Fields for In-situ OAM", Internet-Draft draft-ietf-ippm-ioam-data-00, September 2017. |
[I-D.ietf-netconf-udp-pub-channel] | Zheng, G., Zhou, T. and A. Clemm, "UDP based Publication Channel for Streaming Telemetry", Internet-Draft draft-ietf-netconf-udp-pub-channel-01, November 2017. |
[I-D.ietf-netconf-yang-push] | Clemm, A., Voit, E., Prieto, A., Tripathy, A., Nilsen-Nygaard, E., Bierman, A. and B. Lengyel, "YANG Datastore Subscription", Internet-Draft draft-ietf-netconf-yang-push-12, December 2017. |
[I-D.ietf-sfc-ioam-nsh] | Brockners, F., Bhandari, S., Govindan, V., Pignataro, C., Gredler, H., Leddy, J., Youell, S., Mizrahi, T., Mozes, D., Lapukhov, P. and R. Chang, "NSH Encapsulation for In-situ OAM Data", Internet-Draft draft-ietf-sfc-ioam-nsh-00, May 2018. |
[I-D.ietf-sfc-nsh] | Quinn, P., Elzur, U. and C. Pignataro, "Network Service Header (NSH)", Internet-Draft draft-ietf-sfc-nsh-28, November 2017. |
[I-D.sambo-netmod-yang-fsm] | Sambo, N., Castoldi, P., Fioccola, G., Cugini, F., Song, H. and T. Zhou, "YANG model for finite state machine", Internet-Draft draft-sambo-netmod-yang-fsm-00, October 2017. |
[I-D.song-ippm-ioam-data-extension] | Song, H. and T. Zhou, "In-situ OAM Data Type Extension", Internet-Draft draft-song-ippm-ioam-data-extension-00, October 2017. |
[I-D.song-ippm-ioam-tunnel-mode] | Song, H., Li, Z., Zhou, T. and Z. Wang, "In-situ OAM Processing in Tunnels", Internet-Draft draft-song-ippm-ioam-tunnel-mode-00, June 2018. |
[I-D.song-mpls-extension-header] | Song, H., Li, Z., Zhou, T. and L. Andersson, "MPLS Extension Header", Internet-Draft draft-song-mpls-extension-header-01, August 2018. |
[I-D.song-opsawg-dnp4iq] | Song, H. and J. Gong, "Requirements for Interactive Query with Dynamic Network Probes", Internet-Draft draft-song-opsawg-dnp4iq-01, June 2017. |
[I-D.spiegel-ippm-ioam-rawexport] | Spiegel, M., Brockners, F., Bhandari, S. and R. Sivakolundu, "In-situ OAM raw data export with IPFIX", Internet-Draft draft-spiegel-ippm-ioam-rawexport-01, October 2018. |
[I-D.talwar-rtgwg-grpc-use-cases] | Specification, g., Kolhe, J., Shaikh, A. and J. George, "Use cases for gRPC in network management", Internet-Draft draft-talwar-rtgwg-grpc-use-cases-01, January 2017. |
[I-D.weis-ippm-ioam-gre] | Weis, B., Brockners, F., crhill@cisco.com, c., Bhandari, S., Govindan, V., Pignataro, C., Gredler, H., Leddy, J., Youell, S., Mizrahi, T., Kfir, A., Gafni, B., Lapukhov, P. and M. Spiegel, "GRE Encapsulation for In-situ OAM Data", Internet-Draft draft-weis-ippm-ioam-gre-00, March 2018. |
[RFC2925] | White, K., "Definitions of Managed Objects for Remote Ping, Traceroute, and Lookup Operations", RFC 2925, DOI 10.17487/RFC2925, September 2000. |
[RFC6241] | Enns, R., Bjorklund, M., Schoenwaelder, J. and A. Bierman, "Network Configuration Protocol (NETCONF)", RFC 6241, DOI 10.17487/RFC6241, June 2011. |
[RFC7011] | Claise, B., Trammell, B. and P. Aitken, "Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of Flow Information", STD 77, RFC 7011, DOI 10.17487/RFC7011, September 2013. |