Internet DRAFT - draft-janzking-nmrg-telemetry-instrumentation
draft-janzking-nmrg-telemetry-instrumentation
Internet Research Task Force C. Janz
Internet-Draft Huawei Canada
Intended status: Informational D. King
Expires: 4 September 2024 Lancaster University
3 March 2024
Telemetry Methodologies for Analog Measurement Instrumentation
draft-janzking-nmrg-telemetry-instrumentation-01
Abstract
Evolution toward network operations automation requires systems
encompassing software-based analytics and decision-making. Network-
based instrumentation provides crucial data for these components and
processes. However, the proliferation of such instrumentation and
the need to migrate the data it generates from the physical network
to "off-the-network" software, poses challenges. In particular,
analog measurement instrumentation, which generates time-continuous
real number data, may generate significant data volumes.
Methodologies for handling analog measurement instrumentation data
will need to be identified and discussed, informed in part by
consideration of requirements for the operation of network digital
twins, which may be important software-realm consumers of such data.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on 4 September 2024.
Copyright Notice
Copyright (c) 2024 IETF Trust and the persons identified as the
document authors. All rights reserved.
Janz & King Expires 4 September 2024 [Page 1]
Internet-Draft Tele Methods Analog Measurement March 2024
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3
3. Background . . . . . . . . . . . . . . . . . . . . . . . . . 3
4. Optical Network Measurement Instrumentation . . . . . . . . . 5
5. Telemetry Use Cases . . . . . . . . . . . . . . . . . . . . . 5
6. Analog Measurement Requirements . . . . . . . . . . . . . . . 6
6.1. Sampling . . . . . . . . . . . . . . . . . . . . . . . . 6
6.2. Time Precision . . . . . . . . . . . . . . . . . . . . . 7
6.3. Reduction and Other Pre-Processing . . . . . . . . . . . 7
6.4. Compression . . . . . . . . . . . . . . . . . . . . . . . 7
6.5. Programmable Streaming . . . . . . . . . . . . . . . . . 9
6.6. Streaming versus Polling . . . . . . . . . . . . . . . . 9
6.7. Communication Protocols . . . . . . . . . . . . . . . . . 10
6.8. Data Models . . . . . . . . . . . . . . . . . . . . . . . 10
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11
8. Operational Considerations . . . . . . . . . . . . . . . . . 11
9. Security Considerations . . . . . . . . . . . . . . . . . . . 11
10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 12
11. References . . . . . . . . . . . . . . . . . . . . . . . . . 12
11.1. Normative References . . . . . . . . . . . . . . . . . . 12
11.2. Informative References . . . . . . . . . . . . . . . . . 12
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13
1. Introduction
Existing studies for network telemetry typically deal with packet-
oriented measurements for generating packet traffic, path, discard,
latency and other data [RFC7799], [OPSAWG-IFIT-FRAMEWORK]. However,
some networking equipment and network operations scenarios feature or
use more physically-oriented measurement instrumentation that
generates data of a different character. Here, the particularities
of data generated by such "analog" instrumentation are examined, and
telemetry methodologies suitable for such data are considered. This
consideration is informed by the requirements of specific use cases,
including network digital twins.
Optical networks, which are increasingly rich in analog
instrumentation, are used as a specific example here. But the
telemetry methodologies discussed may apply to instrumentation and
telemetry intersecting a wide variety of networks and their related
Janz & King Expires 4 September 2024 [Page 2]
Internet-Draft Tele Methods Analog Measurement March 2024
operational software, for example, in support of digital twins that
provide modeling of radio-based transmission, thermal characteristics
or energy consumption.
This document presents telemetry methodologies tailored for analog
measurement instruments, aiming to enhance data accuracy,
transmission efficiency, and real-time monitoring capabilities for
network digital twins. The findings underscore the potential of
these methodologies to for best practice for telemetry digital twin
networks that require analog measurement instruments. It provides a
state-of-the-art summary, including gaps and possible areas for
further research
2. Terminology
Network Digital Twin: A Network Digital Twin is a virtual replica of
a physical network system that allows for the simulation,
monitoring, and analysis of the network's behavior under various
conditions without impacting the actual network.
Network Measurement Instrumentation: Network Measurement
Instrumentation refers to the tools, techniques, and systems used
to collect, monitor, and analyze data about the performance and
behavior of a network. This instrumentation is crucial for
understanding how well the network is functioning, identifying
problems, and making informed decisions to optimize network
performance and reliability.
3. Background
Photonic networks, which transmit data through light signals via
fiber optic cables, are fundamental to telecommunications, internet
services, data center operations, and many other critical aspects of
modern digital infrastructure. A range of measurement instruments
are routinely used in the deployment and maintenance of these
networks. Key examples include:
The concept of network slicing is a key capability to serve a
customer with a wide variety of different service needs expressed as
SLOs/SLEs in terms of, e.g., latency, reliability, capacity, and
service function-specific capabilities.
This section outlines the key capabilities required to realize
network slicing in a TE-enabled IETF technology network.
* Optical Time Domain Reflectometers (OTDRs): These devices are used
to test the integrity of fiber optic cables by sending a series of
light pulses into the fiber and measuring the light that is
Janz & King Expires 4 September 2024 [Page 3]
Internet-Draft Tele Methods Analog Measurement March 2024
scattered or reflected back. OTDRs can detect and locate faults,
splices, and bends in fiber optic cables, and are crucial for both
installation and troubleshooting;
* Optical Spectrum Analyzers (OSAs): OSAs measure the power spectrum
of optical devices to analyze the wavelength or frequency
distribution of light. They are vital for characterizing the
performance of components like lasers and optical amplifiers
within the network;
* Optical Power Meters and Light Sources: Used in tandem, these
instruments measure the loss or attenuation in optical fibers and
verify the power levels to ensure that signals are transmitted
with sufficient strength without exceeding the damage threshold of
the network components;
* Network Analyzers and Bit Error Rate Testers (BERTs): These tools
assess the overall performance of the optical network by analyzing
parameters such as signal integrity, bit error rates, and network
latency. They help in ensuring that the network can reliably
handle the intended data loads;
* Wavelength Division Multiplexing (WDM) Analyzers: WDM technology
combines multiple optical carrier signals on a single optical
fiber by using different wavelengths. WDM analyzers are
specialized tools for testing and maintaining these systems,
ensuring that each channel is transmitted efficiently without
interference;
* Dispersion Analyzers: These are used to measure chromatic and
polarization mode dispersion in fiber optic cables, which can
affect the quality and speed of data transmission. Managing
dispersion is crucial for long-distance and high-data-rate optical
communications.
These instruments play a critical role in the characterization,
deployment, optimization, and troubleshooting of optical networks.
But their use tends to be restricted to specific operational phases,
requires manual operation, and is generally not compatible with
application to operating facilities. The term instrumentation refers
more properly to "embedded" capability that is both operable on
active infrastructure and capable of continuous measurement
operation. Such instrumentation is a necessary foundation for
telemetry
Janz & King Expires 4 September 2024 [Page 4]
Internet-Draft Tele Methods Analog Measurement March 2024
4. Optical Network Measurement Instrumentation
Optical network instrumentation has typically focused on detecting
transmission performance degradation, through measurement of error
correction rates in FEC engines, counting of errored OTN frames, etc.
Such measurements are typically executed on network elements through
time-interval-based counting. The resulting counts may be forwarded
to or collected by software on a subscription or polling basis. The
data consists of series of integer numbers, or series of time stamp-
integer number couplets.
In recent years, however, the nature and scope of optical network
instrumentation has broadened and deepened [JIANG]. The idea has
been to instrument the optical network more richly to support more
effective operations management, including using software-based
analytics and modeling. Implicated network operations include
network and connection planning and configuration, network and
connection fault management (fault and impairment detection,
classification, localization, preemption, correction), and others.
The optical network is a high-performance analog transmission
network, so, unsurprisingly, much of this new instrumentation is
analog; that is, it produces time-continuous real-number data or data
sets. Examples include optical loss, optical power (total, channel
peak, etc.), optical spectra (narrow-band-filtered power measured at
a series of center wavelengths), differential group delay (DGD),
polarization mode dispersion (PMD), polarization dependent loss
(PDL), Stokes vector components reflecting state of polarization
(SOP), linear optical signal-to-noise ratio (OSNR) and generalized
optical signal-to-noise ratio (GSNR). Many of these measurements are
synthesized by coherent receivers across the network, while some may
be synthesized by in-span elements such as amplifiers and ROADMs.
5. Telemetry Use Cases
One application of this data in the software realm is with optical
network digital twins (NDTs), used for transmission performance
modeling [JANZ], [NMRG-PODTS]. Such NDTs constitute an important
class of analytical engine supporting optical network and service
planning and other operations, and they rely heavily on data from
network instrumentation to enable accurate modeling of optical
transmission performance on targeted variations of the actual network
and service configuration, state and condition. A default
expectation would be that all instrumentation measurements are
reflected continuously in the software realm for use by optical NDTs.
However, at best only an approximation to this can be achieved (e.g.,
only a series of sampled measurements may in fact be streamed from
the network), so the imperative is to find efficient ways to support
Janz & King Expires 4 September 2024 [Page 5]
Internet-Draft Tele Methods Analog Measurement March 2024
sufficiently-accurate such approximations. This imperative grows
more compelling the greater the scale of the network and the greater
the richness of embedded instrumentation.
A second example application lies in the fault management domain,
wherein analysis of rich data, concentrated around the time of a
detected evolution in transmission conditions, may be used to
classify and localize the origin of the observed evolution [HAHN].
Transient evolutions of transmission performance are commonplace on
optical networks and have myriad causes, including extrinsic causes
such as lightning strikes, earthworks and construction, weather, road
and rail traffic, fires, etc., as well as intrinsic causes including
continuous or discrete deteriorations to equipment or fibre plant.
Detection, classification, and localization of transmission
performance evolutions permit assessment of the likelihood, expected
severity, and rate of further deterioration, and planning of timely
and cost-effective corrective interventions where indicated.
However, successful analysis may depend on the availability of richer
data sets in software that may be supported by continuous streaming
or required by other applications.
6. Analog Measurement Requirements
[RFC9232]provides a framework for considering concepts, constructs
and developments in network telemetry. Many of the methods and
mechanisms it discusses or suggests are invoked here.
6.1. Sampling
An analog-to-digital conversion process typically converts analog
signals into digital data that can be transmitted, stored, and
processed more efficiently. This often involves sampling the signal
at a certain rate and quantizing the amplitude into digital values.
The "mirroring" (transmission for replication at a different place)
of continuous-time real number data, generated by in-network
instrumentation, begins with sampling and representing measured
values by a scalar or vector of finite-decimal-place numbers. As
neither sampling at fixed intervals, nor fixed time alignment or
offset among measurement points in the network or between such points
and the off-network software realm, can generally be assumed; it is
useful that instrumentation should generate, as primary data, a
series of couplets or vectors consisting of sample time stamps and
corresponding measured data values.
Janz & King Expires 4 September 2024 [Page 6]
Internet-Draft Tele Methods Analog Measurement March 2024
6.2. Time Precision
Inadequate sampling frequency and quantization error are both
potential sources of error, in the - literal or effective -
"reconstruction"" of the original time-continuous measurement in the
software realm. It is possible that sampling frequencies might be
varied in response to evolving temporal characteristics of measured
parameters; this is one strategy for data reduction (and one reason
why sampling may not occur at fixed-period intervals).
Requirements on the precision of reconstructed data, its time basis,
and the alignment in time of different reconstructed measurements;
are determined by the operational role played by the analytical
functions that consume the data. Some operations of interest, such
as network and service planning or fault and impairment management,
may impose only relatively relaxed requirements on time
synchronization among measurement instruments, and between those
instruments and the software realm. Other applications, e.g., those
concerning operations tending toward closed loop control, may require
tighter temporal data alignment among different measurement sources.
These considerations have implications in terms of source and
synchronization of clocks producing time stamps; but in general,
requirements on clock synchronization and precision are far from
those required for bit-level operations: i.e. they are generally more
like "network time" than "digital time".
Similarly, requirements on the absolute or relative (i.e. among
different measurement instruments) precision of reconstructed
measured data values may be application-dependent. In many cases,
relative precision, or precision consistency, may be more important
than absolute precision.
6.3. Reduction and Other Pre-Processing
With telemetric data volume a primary potential challenge, methods
for reducing data volume associated with analog measurement
instrumentation are of evident interest. Signals may also be
filtered to remove noise and unwanted frequencies to improve the data
quality.
6.4. Compression
Data compression is an obvious candidate methodology for bandwidth
reduction. Methods for lossless compression of series of numerical
data have been widely studied, e.g. [RATANAWORABHAN].
Janz & King Expires 4 September 2024 [Page 7]
Internet-Draft Tele Methods Analog Measurement March 2024
Obviously, such compression must be implemented as a "pre-processing"
function executed by the telemetric instrumentation itself, or some
proxy to it. Similarly, decompression must be implemented as a
"post-processing" function within the software realm. Where time
stamps are uncompressed, depending on the compression methodology
employed, it may be possible to support selective decompression of
data, e.g., only on selected time intervals. This might allow for
application-driven "as-required" post-processing (decompression) of
more limited volumes of telemetric data.
The compressibility of time-based data depends on its evolution in
data-entropic terms, resulting in streamed data flows of varying
volume or rate. The effective transmission and reception rates of
data samples thus may vary and differ at any point from the rate of
data generation. This is another reason why data samples may require
time stamps.
Other forms of effective data reduction through pre-processing may
also be useful, or preferred:
* Thresholding: Data samples are transmitted only if and when a
measured value, or a derivative of the measured value, crosses a
threshold. Possible examples include: a) exceeding some absolute
or proportional variation from the last transmitted sample value;
b) exceeding a previously observed and transmitted maximum or
minimum value; or, c) exceeding some time rate-of-change of the
measured value.
Post-processing of threshold-driven data may or may not be required
by applications. For example, an application may generate a scenario
for behavioral analysis by an NDT that requires the "current" data
from network instrumentation. To whatever precision is effectively
reflected in the details of the operating thresholding mechanisms,
that data is simply the most recently transmitted sample from network
measurement instruments. Another application, however, perhaps one
dealing with fault or impairment management, might require a regular
and continuous time series presentation of measured data. In that
case, e.g. interpolation or other post-processing of received data
samples might be needed.
Other kinds of pre-processing may also be interest, including
normalization of data, frequency domain conversion, and computation
of statistics.
* Triggering: An extension or variation of thresholding, triggering
may refer to, e.g. the transmission of a series of samples - from
a defined set of measurement instruments, over a defined period of
time and at defined time intervals - on crossing of a particular
Janz & King Expires 4 September 2024 [Page 8]
Internet-Draft Tele Methods Analog Measurement March 2024
threshold (i.e., that threshold crossing "triggers" the
transmission of the defined data series). Triggering of this kind
may be useful in e.g. fault and impairment management. The
detection by instrumentation of some pre-defined circumstance or
occurrence - e.g. observation of an unusually large or rapid
change in an optical power level or channel SOP - would trigger
the transmission of a pre-defined, "rich" set of data covering a
time interval around the triggering observation. That data could
then be subjected to various forms of "forensic" analysis in
software to support detection, classification or localization of
transmission performance-impacting events. Required pre-
processing includes processing of triggers, and the sliding
storage of instrumentation data sample values sufficient to cover
the targeted data capture time "window" as well as trigger
processing and transmission intervals.
6.5. Programmable Streaming
As discussed in [RFC9232], in-network pre-processing of telemetry
data may usefully be "programmed" by telemetry clients (i.e.,
software applications that are consumers of instrumentation data),
including dynamically or variably. The range and nature of software
applications and their data requirements may vary among systems, may
evolve with time within any given system - based on experience and
learning (automated or not) or with the deployment of new
capabilities - and may also vary as a function of available
instrumentation capabilities on a given network, which themselves may
evolve.
6.6. Streaming versus Polling
Streaming - i.e., subscription-based push - is, as identified in
[RFC9232] and other works, and as suggested by the discussion above,
expected to be the principal, if not exclusive, operational modality
for telemetry, including analog instrumentation telemetry. Software
clients consume data generated by the network, and having identified
which data they require and from where within the network, use
subscriptions to place themselves in a position to receive it, on an
ongoing basis, without continuing operational steps.
Triggered transmission of "batched" data is aligned with a streaming
paradigm, as the telemetry server (i.e., instrumentation) must detect
the trigger conditions and react by capturing and transmitting data
to subscribing clients.
It is worth considering, however, whether polling can or should be
completely dispensed with, or whether it might retain some utility in
some cases or circumstances.
Janz & King Expires 4 September 2024 [Page 9]
Internet-Draft Tele Methods Analog Measurement March 2024
The discussion so far supports a view that the data needs of NDTs can
be satisfied, and in fact probably are best served by, streaming.
However, polling could be used if NDT-based analyses are required
relatively infrequently, do not require very rapid execution, and do
not draw arbitrarily on historical data. Polling might also be
useful as a complementary mechanism to streaming. For example, to
reduce data transmission and handling volumes, an NDT might choose to
unsubscribe from telemetry it has observed changes little with time.
However, for particularly critical analyses, the NDT might want to
ensure that all available telemetry data is up-to-date, by polling
the unsubscribed instrumentation. Further, if certain kinds of data
compression are used, decompression processes can enter into errored
regimes e.g. through transmission loss of telemetry data. Periodic
polling may be useful to "re-set" absolute data values in such cases.
In fact, as suggested in [RFC7799], the possibility of transmission
loss of streamed telemetry packets, a concern particularly if
unreliable transport paradigms such as UDP are used, may provide a
general reason to enable polling as a "failsafe" mechanism.
6.7. Communication Protocols
Communication protocols facilitate the reliable data exchange between
telemetry devices and control systems. Depending on the method,
streaming and/or polling, various messaging protocols exist to
provide efficient delivery of instrumentation data.
6.8. Data Models
A complete framework for analog instrumentation telemetry might
require data models supporting:
* Identification of instrumentation-equipped and telemetry-capable
network equipment, the latter's available instrumentation, its
available pre-processing, and what aspects of available pre-
processing are programmable;
* Subscription to streaming from specific instrumentation;
* Programming (or re-programming) of pre-processing on specific
subscriptions and instrumentation, including type of pre-
processing, applicable thresholds or triggers, and definition of
trigger-associated data sets (included data and start/stop
interval limits vs. triggering events);
* Transmission of applicable time stamp-data value couplets, vectors
or batches.
Janz & King Expires 4 September 2024 [Page 10]
Internet-Draft Tele Methods Analog Measurement March 2024
7. IANA Considerations
This document makes no requests for action by IANA.
8. Operational Considerations
Operational considerations for Optical Network Measurement
Instrumentation involve a range of factors to ensure accurate,
reliable, and efficient performance of the optical networks. These
considerations are critical for deploying, maintaining, and
troubleshooting fiber optic systems. Key operational considerations
include:
* Calibration and Signal Integrity
* Dynamic Range and Sensitivity
* Resolution and Accuracy
* Scalability
* Bandwidth and storage of instrumentation data
Future version of this document will expand on the topics above and
increase the scope of operational considerations.
9. Security Considerations
The security implications of optical network telemetry are critical,
given the increasing reliance on optical networks for data
transmission in various sectors. Ensuring the security and integrity
of these networks and thetelemetry instrumentation used to measure
and maintain them is paramount to prevent unauthorized access, data
breaches, potential service disruptions, and use as possible threat
vectors and attack surfaces.
Key security considerations include:
* Encryption of sensitive telemetry data
* Secure configuration and management of telemetry functions
* Network monitoring and anomaly detection
* Secure data handling and storage
Future version of this document will expand on the topics above and
increase the scope of security considerations.
Janz & King Expires 4 September 2024 [Page 11]
Internet-Draft Tele Methods Analog Measurement March 2024
10. Acknowledgements
Thanks to discussions in the Network Digital Twin discussions Network
Management Research Group that provided further input into this work.
This work is supported by the UK Department for Science, Innovation
and Technology under the Future Open Networks Research Challenge
project TUDOR (Towards Ubiquitous 3D Open Resilient Network). The
views expressed are those of the authors and do not necessarily
represent the project
11. References
11.1. Normative References
11.2. Informative References
[HAHN] Optical Fiber Communications, "On the Spatial Resolution
of Location-Resolved Performance Monitoring by Correlation
Method", 1 March 2023.
[JANZ] IEEE/IFP Network Operations and Management Symposium,
Workshop of Technologies for Network Twins, "Digital Twin
for the Optical Network: Key Technologies and Enabled
Automation Applications", 1 April 2022,
<https://ieeexplore.ieee.org/document/9789844>.
[JIANG] Journal of Lightwave Technology, vol. 40, No. 10, pp.
3128-3136, "Progresses of Pilot Tone Based Optical
Performance Monitoring in Coherent Systems", 1 October
2023, <https://opg.optica.org/jlt/abstract.cfm?uri=jlt-
40-10-3128>.
[NMRG-PODTS]
IETF, "Performance-Oriented Digital Twins for Packet and
Optical Networks", 1 October 2023,
<https://datatracker.ietf.org/doc/draft-paillisse-nmrg-
performance-digital-twin/02>.
[OPSAWG-IFIT-FRAMEWORK]
IETF, "Framework for In-Situ Flow Information Telemetry",
1 October 2023, <https://datatracker.ietf.org/doc/html/
draft-song-opsawg-ifit-framework-21>.
[RATANAWORABHAN]
Data Compression Conference, "Fast Lossless Compression of
Scientific Floating-Point Data", 1 May 2006.
Janz & King Expires 4 September 2024 [Page 12]
Internet-Draft Tele Methods Analog Measurement March 2024
[RFC7799] Morton, A., "Active and Passive Metrics and Methods (with
Hybrid Types In-Between)", RFC 7799, DOI 10.17487/RFC7799,
May 2016, <https://www.rfc-editor.org/info/rfc7799>.
[RFC9232] Song, H., Qin, F., Martinez-Julia, P., Ciavaglia, L., and
A. Wang, "Network Telemetry Framework", RFC 9232,
DOI 10.17487/RFC9232, May 2022,
<https://www.rfc-editor.org/info/rfc9232>.
Authors' Addresses
Chris Janz
Huawei Canada
Email: christopher.janz@huawei.com
Daniel King
Lancaster University
Email: d.king@lancaster.ac.uk
Janz & King Expires 4 September 2024 [Page 13]