Internet-Draft Tele Methods Analog Measurement March 2024
Janz & King Expires 4 September 2024 [Page]
Workgroup:
Internet Research Task Force
Internet-Draft:
draft-janzking-nmrg-telemetry-instrumentation-00
Published:
Intended Status:
Informational
Expires:
Authors:
C. Janz
Huawei Canada
D. King
Lancaster University

Telemetry Methodologies for Analog Measurement Instrumentation

Abstract

Evolution toward network operations automation requires systems encompassing software-based analytics and decision-making. Network-based instrumentation provides crucial data for these components and processes. However, the proliferation of such instrumentation and the need to migrate the data it generates from the physical network to "off-the-network" software, poses challenges. In particular, analog measurement instrumentation, which generates time-continuous real number data, may generate significant data volumes.

Methodologies for handling analog measurement instrumentation data will need to be identified and discussed, informed in part by consideration of requirements for the operation of network digital twins, which may be important software-realm consumers of such data.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 4 September 2024.

Table of Contents

1. Introduction

Existing studies for network telemetry typically deal with packet-oriented measurements for generating packet traffic, path, discard, latency and other data [RFC7799], [OPSAWG-IFIT-FRAMEWORK]. However, some networking equipment and network operations scenarios feature or use more physically-oriented measurement instrumentation that generates data of a different character. Here, the particularities of data generated by such "analog" instrumentation are examined, and telemetry methodologies suitable for such data are considered. This consideration is informed by the requirements of specific use cases, including network digital twins.

Optical networks, which are increasingly rich in analog instrumentation, are used as a specific example here. But the telemetry methodologies discussed may apply to instrumentation and telemetry intersecting a wide variety of networks and their related operational software, for example, in support of digital twins that provide modeling of radio-based transmission, thermal characteristics or energy consumption.

This document presents telemetry methodologies tailored for analog measurement instruments, aiming to enhance data accuracy, transmission efficiency, and real-time monitoring capabilities for network digital twins. The findings underscore the potential of these methodologies to for best practice for telemetry digital twin networks that require analog measurement instruments. It provides a state-of-the-art summary, including gaps and possible areas for further research

2. Terminology

Resource:
Any feature, including connectivity, buffers, compute, storage, and content delivery that forms part of or can be accessed through a network. Resources may be shared between users, applications, and clients, or they may be dedicated for use by a unique customer.
Infrastructure Resources:
The hardware and software for hosting and connecting SFs. These resources may include computing hardware, storage capacity, network resources (e.g., links and switching/routing devices enabling network connectivity), and physical assets for radio access.

3. Background

Photonic networks, which transmit data through light signals via fiber optic cables, are fundamental to telecommunications, internet services, data center operations, and many other critical aspects of modern digital infrastructure. A range of measurement instruments are routinely used in the deployment and maintenance of these networks. Key examples include:

The concept of network slicing is a key capability to serve a customer with a wide variety of different service needs expressed as SLOs/SLEs in terms of, e.g., latency, reliability, capacity, and service function-specific capabilities.

This section outlines the key capabilities required to realize network slicing in a TE-enabled IETF technology network.

These instruments play a critical role in the characterization, deployment, optimization, and troubleshooting of optical networks. But their use tends to be restricted to specific operational phases, requires manual operation, and is generally not compatible with application to operating facilities. The term instrumentation refers more properly to "embedded" capability that is both operable on active infrastructure and capable of continuous measurement operation. Such instrumentation is a necessary foundation for telemetry

4. Optical Network Measurement Instrumentation

Optical network instrumentation has typically focused on detecting transmission performance degradation, through measurement of error correction rates in FEC engines, counting of errored OTN frames, etc. Such measurements are typically executed on network elements through time-interval-based counting. The resulting counts may be forwarded to or collected by software on a subscription or polling basis. The data consists of series of integer numbers, or series of time stamp-integer number couplets.

In recent years, however, the nature and scope of optical network instrumentation has broadened and deepened [JIANG]. The idea has been to instrument the optical network more richly to support more effective operations management, including using software-based analytics and modeling. Implicated network operations include network and connection planning and configuration, network and connection fault management (fault and impairment detection, classification, localization, preemption, correction), and others.

The optical network is a high-performance analog transmission network, so, unsurprisingly, much of this new instrumentation is analog; that is, it produces time-continuous real-number data or data sets. Examples include optical loss, optical power (total, channel peak, etc.), optical spectra (narrow-band-filtered power measured at a series of center wavelengths), differential group delay (DGD), polarization mode dispersion (PMD), polarization dependent loss (PDL), Stokes vector components reflecting state of polarization (SOP), linear optical signal-to-noise ratio (OSNR) and generalized optical signal-to-noise ratio (GSNR). Many of these measurements are synthesized by coherent receivers across the network, while some may be synthesized by in-span elements such as amplifiers and ROADMs.

5. Telemetry Use Cases

One application of this data in the software realm is with optical network digital twins (NDTs), used for transmission performance modeling [JANZ], [NMRG-PODTS]. Such NDTs constitute an important class of analytical engine supporting optical network and service planning and other operations, and they rely heavily on data from network instrumentation to enable accurate modeling of optical transmission performance on targeted variations of the actual network and service configuration, state and condition. A default expectation would be that all instrumentation measurements are reflected continuously in the software realm for use by optical NDTs. However, at best only an approximation to this can be achieved (e.g., only a series of sampled measurements may in fact be streamed from the network), so the imperative is to find efficient ways to support sufficiently-accurate such approximations. This imperative grows more compelling the greater the scale of the network and the greater the richness of embedded instrumentation.

A second example application lies in the fault management domain, wherein analysis of rich data, concentrated around the time of a detected evolution in transmission conditions, may be used to classify and localize the origin of the observed evolution [HAHN]. Transient evolutions of transmission performance are commonplace on optical networks and have myriad causes, including extrinsic causes such as lightning strikes, earthworks and construction, weather, road and rail traffic, fires, etc., as well as intrinsic causes including continuous or discrete deteriorations to equipment or fibre plant. Detection, classification, and localization of transmission performance evolutions permit assessment of the likelihood, expected severity, and rate of further deterioration, and planning of timely and cost-effective corrective interventions where indicated. However, successful analysis may depend on the availability of richer data sets in software that may be supported by continuous streaming or required by other applications.

6. Analog Measurement Requirements

[RFC9232]provides a framework for considering concepts, constructs and developments in network telemetry. Many of the methods and mechanisms it discusses or suggests are invoked here.

6.1. Sampling

An analog-to-digital conversion process typically converts analog signals into digital data that can be transmitted, stored, and processed more efficiently. This often involves sampling the signal at a certain rate and quantizing the amplitude into digital values. The "mirroring" (transmission for replication at a different place) of continuous-time real number data, generated by in-network instrumentation, begins with sampling and representing measured values by a scalar or vector of finite-decimal-place numbers. As neither sampling at fixed intervals, nor fixed time alignment or offset among measurement points in the network or between such points and the off-network software realm, can generally be assumed; it is useful that instrumentation should generate, as primary data, a series of couplets or vectors consisting of sample time stamps and corresponding measured data values.

6.2. Time Precision

Inadequate sampling frequency and quantization error are both potential sources of error, in the - literal or effective - "reconstruction"" of the original time-continuous measurement in the software realm. It is possible that sampling frequencies might be varied in response to evolving temporal characteristics of measured parameters; this is one strategy for data reduction (and one reason why sampling may not occur at fixed-period intervals).

Requirements on the precision of reconstructed data, its time basis, and the alignment in time of different reconstructed measurements; are determined by the operational role played by the analytical functions that consume the data. Some operations of interest, such as network and service planning or fault and impairment management, may impose only relatively relaxed requirements on time synchronization among measurement instruments, and between those instruments and the software realm. Other applications, e.g., those concerning operations tending toward closed loop control, may require tighter temporal data alignment among different measurement sources. These considerations have implications in terms of source and synchronization of clocks producing time stamps; but in general, requirements on clock synchronization and precision are far from those required for bit-level operations: i.e. they are generally more like "network time" than "digital time".

Similarly, requirements on the absolute or relative (i.e. among different measurement instruments) precision of reconstructed measured data values may be application-dependent. In many cases, relative precision, or precision consistency, may be more important than absolute precision.

6.3. Reduction and Other Pre-Processing

With telemetric data volume a primary potential challenge, methods for reducing data volume associated with analog measurement instrumentation are of evident interest. Signals may also be filtered to remove noise and unwanted frequencies to improve the data quality.

6.4. Compression

Data compression is an obvious candidate methodology for bandwidth reduction. Methods for lossless compression of series of numerical data have been widely studied, e.g. [RATANAWORABHAN].

Obviously, such compression must be implemented as a "pre-processing" function executed by the telemetric instrumentation itself, or some proxy to it. Similarly, decompression must be implemented as a "post-processing" function within the software realm. Where time stamps are uncompressed, depending on the compression methodology employed, it may be possible to support selective decompression of data, e.g., only on selected time intervals. This might allow for application-driven "as-required" post-processing (decompression) of more limited volumes of telemetric data.

The compressibility of time-based data depends on its evolution in data-entropic terms, resulting in streamed data flows of varying volume or rate. The effective transmission and reception rates of data samples thus may vary and differ at any point from the rate of data generation. This is another reason why data samples may require time stamps.

Other forms of effective data reduction through pre-processing may also be useful, or preferred:

  • Thresholding: Data samples are transmitted only if and when a measured value, or a derivative of the measured value, crosses a threshold. Possible examples include: a) exceeding some absolute or proportional variation from the last transmitted sample value; b) exceeding a previously observed and transmitted maximum or minimum value; or, c) exceeding some time rate-of-change of the measured value.

Post-processing of threshold-driven data may or may not be required by applications. For example, an application may generate a scenario for behavioral analysis by an NDT that requires the "current" data from network instrumentation. To whatever precision is effectively reflected in the details of the operating thresholding mechanisms, that data is simply the most recently transmitted sample from network measurement instruments. Another application, however, perhaps one dealing with fault or impairment management, might require a regular and continuous time series presentation of measured data. In that case, e.g. interpolation or other post-processing of received data samples might be needed.

Other kinds of pre-processing may also be interest, including normalization of data, frequency domain conversion, and computation of statistics.

  • Triggering: An extension or variation of thresholding, triggering may refer to, e.g. the transmission of a series of samples - from a defined set of measurement instruments, over a defined period of time and at defined time intervals - on crossing of a particular threshold (i.e., that threshold crossing "triggers" the transmission of the defined data series). Triggering of this kind may be useful in e.g. fault and impairment management. The detection by instrumentation of some pre-defined circumstance or occurrence - e.g. observation of an unusually large or rapid change in an optical power level or channel SOP - would trigger the transmission of a pre-defined, "rich" set of data covering a time interval around the triggering observation. That data could then be subjected to various forms of "forensic" analysis in software to support detection, classification or localization of transmission performance-impacting events. Required pre-processing includes processing of triggers, and the sliding storage of instrumentation data sample values sufficient to cover the targeted data capture time "window" as well as trigger processing and transmission intervals.

6.5. Programmable Streaming

As discussed in [RFC9232], in-network pre-processing of telemetry data may usefully be "programmed" by telemetry clients (i.e., software applications that are consumers of instrumentation data), including dynamically or variably. The range and nature of software applications and their data requirements may vary among systems, may evolve with time within any given system - based on experience and learning (automated or not) or with the deployment of new capabilities - and may also vary as a function of available instrumentation capabilities on a given network, which themselves may evolve.

6.6. Streaming versus Polling

Streaming - i.e., subscription-based push - is, as identified in [RFC9232] and other works, and as suggested by the discussion above, expected to be the principal, if not exclusive, operational modality for telemetry, including analog instrumentation telemetry. Software clients consume data generated by the network, and having identified which data they require and from where within the network, use subscriptions to place themselves in a position to receive it, on an ongoing basis, without continuing operational steps.

Triggered transmission of "batched" data is aligned with a streaming paradigm, as the telemetry server (i.e., instrumentation) must detect the trigger conditions and react by capturing and transmitting data to subscribing clients.

It is worth considering, however, whether polling can or should be completely dispensed with, or whether it might retain some utility in some cases or circumstances.

The discussion so far supports a view that the data needs of NDTs can be satisfied, and in fact probably are best served by, streaming. However, polling could be used if NDT-based analyses are required relatively infrequently, do not require very rapid execution, and do not draw arbitrarily on historical data. Polling might also be useful as a complementary mechanism to streaming. For example, to reduce data transmission and handling volumes, an NDT might choose to unsubscribe from telemetry it has observed changes little with time. However, for particularly critical analyses, the NDT might want to ensure that all available telemetry data is up-to-date, by polling the unsubscribed instrumentation. Further, if certain kinds of data compression are used, decompression processes can enter into errored regimes e.g. through transmission loss of telemetry data. Periodic polling may be useful to "re-set" absolute data values in such cases. In fact, as suggested in [RFC7799], the possibility of transmission loss of streamed telemetry packets, a concern particularly if unreliable transport paradigms such as UDP are used, may provide a general reason to enable polling as a "failsafe" mechanism.

6.7. Communication Protocols

Communication protocols facilitate the reliable data exchange between telemetry devices and control systems. Depending on the method, streaming and/or polling, various messaging protocols exist to provide efficient delivery of instrumentation data.

6.8. Data Models

A complete framework for analog instrumentation telemetry might require data models supporting:

  • Identification of instrumentation-equipped and telemetry-capable network equipment, the latter's available instrumentation, its available pre-processing, and what aspects of available pre-processing are programmable;
  • Subscription to streaming from specific instrumentation;
  • Programming (or re-programming) of pre-processing on specific subscriptions and instrumentation, including type of pre-processing, applicable thresholds or triggers, and definition of trigger-associated data sets (included data and start/stop interval limits vs. triggering events);
  • Transmission of applicable time stamp-data value couplets, vectors or batches.

7. IANA Considerations

This document makes no requests for action by IANA.

8. Operational Considerations

Operational considerations for Optical Network Measurement Instrumentation involve a range of factors to ensure accurate, reliable, and efficient performance of the optical networks. These considerations are critical for deploying, maintaining, and troubleshooting fiber optic systems. Key operational considerations include:

Future version of this document will expand on the topics above and increase the scope of operational considerations.

9. Security Considerations

The security implications of optical network telemetry are critical, given the increasing reliance on optical networks for data transmission in various sectors. Ensuring the security and integrity of these networks and thetelemetry instrumentation used to measure and maintain them is paramount to prevent unauthorized access, data breaches, potential service disruptions, and use as possible threat vectors and attack surfaces.

Key security considerations include:

Future version of this document will expand on the topics above and increase the scope of security considerations.

10. Acknowledgements

Thanks to discussions in the Network Digital Twin discussions Network Management Research Group that provided further input into this work.

This work is supported by the UK Department for Science, Innovation and Technology under the Future Open Networks Research Challenge project TUDOR (Towards Ubiquitous 3D Open Resilient Network). The views expressed are those of the authors and do not necessarily represent the project

11. References

11.1. Normative References

11.2. Informative References

[HAHN]
Optical Fiber Communications, "On the Spatial Resolution of Location-Resolved Performance Monitoring by Correlation Method", .
[JANZ]
IEEE/IFP Network Operations and Management Symposium, Workshop of Technologies for Network Twins, "Digital Twin for the Optical Network: Key Technologies and Enabled Automation Applications", , <https://ieeexplore.ieee.org/document/9789844>.
[JIANG]
Journal of Lightwave Technology, vol. 40, No. 10, pp. 3128-3136, "Progresses of Pilot Tone Based Optical Performance Monitoring in Coherent Systems", , <https://opg.optica.org/jlt/abstract.cfm?uri=jlt-40-10-3128>.
[NMRG-PODTS]
IETF, "Performance-Oriented Digital Twins for Packet and Optical Networks", , <https://datatracker.ietf.org/doc/draft-paillisse-nmrg-performance-digital-twin/02>.
[OPSAWG-IFIT-FRAMEWORK]
IETF, "Framework for In-Situ Flow Information Telemetry", , <https://datatracker.ietf.org/doc/html/draft-song-opsawg-ifit-framework-21>.
[RATANAWORABHAN]
Data Compression Conference, "Fast Lossless Compression of Scientific Floating-Point Data", .
[RFC7799]
Morton, A., "Active and Passive Metrics and Methods (with Hybrid Types In-Between)", RFC 7799, DOI 10.17487/RFC7799, , <https://www.rfc-editor.org/info/rfc7799>.
[RFC9232]
Song, H., Qin, F., Martinez-Julia, P., Ciavaglia, L., and A. Wang, "Network Telemetry Framework", RFC 9232, DOI 10.17487/RFC9232, , <https://www.rfc-editor.org/info/rfc9232>.

Authors' Addresses

Chris Janz
Huawei Canada
Daniel King
Lancaster University