Internet DRAFT - draft-kll-yang-label-tsdb

draft-kll-yang-label-tsdb







Network Working Group                                         K. Larsson
Internet-Draft                                          Deutsche Telekom
Intended status: Standards Track                         18 October 2023
Expires: 20 April 2024


               Mapping YANG Data to Label-Set Time Series
                      draft-kll-yang-label-tsdb-00

Abstract

   This document proposes a standardized approach for representing YANG-
   modeled configuration and state data, for storage in Time Series
   Databases (TSDBs) that identify time series using a label-set.  It
   outlines procedures for translating YANG data representations to fit
   within the label-centric structures of TSDBs and vice versa.  This
   mapping ensures clear and efficient storage and querying of YANG-
   modeled data in TSDBs.

Discussion Venues

   This note is to be removed before publishing as an RFC.

   Source for this draft and an issue tracker can be found at
   https://github.com/plajjan/draft-kll-yang-label-tsdb.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 20 April 2024.

Copyright Notice

   Copyright (c) 2023 IETF Trust and the persons identified as the
   document authors.  All rights reserved.




Larsson                   Expires 20 April 2024                 [Page 1]

Internet-Draft               yang-label-tsdb                October 2023


   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Specification of the Mapping Procedure  . . . . . . . . . . .   3
     2.1.  Example: Packet Counters in IETF Interfaces Model . . . .   3
     2.2.  Mapping values  . . . . . . . . . . . . . . . . . . . . .   4
     2.3.  Choice  . . . . . . . . . . . . . . . . . . . . . . . . .   4
     2.4.  Host / device name  . . . . . . . . . . . . . . . . . . .   4
   3.  Querying YANG modeled time series data  . . . . . . . . . . .   5
     3.1.  1. *Basic Queries*  . . . . . . . . . . . . . . . . . . .   5
     3.2.  2. *Filtering by Labels*  . . . . . . . . . . . . . . . .   5
     3.3.  3. *Time-based Queries* . . . . . . . . . . . . . . . . .   6
     3.4.  4. *Aggregations* . . . . . . . . . . . . . . . . . . . .   6
     3.5.  5. *Combining Filters*  . . . . . . . . . . . . . . . . .   6
     3.6.  6. *Querying Enumeration Types* . . . . . . . . . . . . .   6
   4.  Requirements on time series databases . . . . . . . . . . . .   7
     4.1.  Support for String Values . . . . . . . . . . . . . . . .   7
     4.2.  Sufficient Path Length  . . . . . . . . . . . . . . . . .   7
     4.3.  High Cardinality  . . . . . . . . . . . . . . . . . . . .   8
   5.  Normative References  . . . . . . . . . . . . . . . . . . . .   8
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .   8

1.  Introduction

   The aim of this document is to define rules for representing
   configuration and state data defined using the YANG data modeling
   language [RFC7950] as time series using a label-centric model.

   The majority of modern Time Series Databases (TSDBs) employ a label-
   centric model.  In this structure, time series are identified by a
   set of labels, each consisting of a key-value pair.  These labels
   facilitate efficient querying, aggregation, and filtering of data
   over time intervals.  Such a model contrasts with the hierarchical
   nature of YANG-modeled data.  The challenge, therefore, lies in
   ensuring that YANG-defined data, with its inherent structure and
   depth, can be seamlessly integrated into the flat, label-based
   structure of most contemporary TSDBs.





Larsson                   Expires 20 April 2024                 [Page 2]

Internet-Draft               yang-label-tsdb                October 2023


   This document seeks to bridge this structural gap, laying out rules
   and guidelines to ensure that YANG-modeled configuration and state
   data can be effectively stored, queried, and analyzed within label-
   centric TSDBs.

2.  Specification of the Mapping Procedure

   Instances of YANG data nodes are mapped to metrics.  Only nodes that
   carry a value are mapped.  This includes leafs and presence
   containers.  The hierarchical path to a value, including non-presence
   containers and lists, form the path that is used as the name of the
   metric.  The path is formed by joining YANG data nodes using _.
   Special symbols, e.g. -, in node names are replaced with _.

   List keys are mapped into labels.  The path to the list key is
   transformed in the same way as the primary name of the metric.
   Compound keys have each key part as a separate label.

2.1.  Example: Packet Counters in IETF Interfaces Model

   Consider the in-unicast-pkts leaf from the IETF interfaces model that
   captures the number of incoming unicast packets on an interface:

   Original YANG Instance-Identifier: yang
   /interfaces/interface[name='eth0']/statistics/in-unicast-pkts

   Following the mapping rules defined:

   1.  The path components, including containers and list names, are
       transformed into the metric name by joining the node names with
       _. Special symbols, e.g. - are replaced with _.

   Resulting Metric Name:
   interfaces_interface_statistics_in_unicast_pkts

   1.  The list key "predicate", which in this case is the interface
       name (eth0), is extracted and stored as a separate label.  The
       label key represents the complete path to the key.

   Resulting Label: interfaces_interface_name = eth0

   1.  The leaf value, which represents the actual packet counter,
       remains unchanged and is directly mapped to the value in the time
       series database.

   For instance, if the packet counter reads 5,432,100 packets:

   Value: 5432100



Larsson                   Expires 20 April 2024                 [Page 3]

Internet-Draft               yang-label-tsdb                October 2023


   1.  As part of the standard labels, a server identification string is
       also included.  A typical choice of identifier might be the
       hostname.  For this example, let's assume the device name is
       router-01:

   Label: host = router-01

   Final Mapping in the TSDB:

   *  Metric: interfaces_interface_statistics_in_unicast_pkts

   *  Value: 5432100

   *  Labels:

      -  host = router-01

      -  interfaces_interface_name = eth0

2.2.  Mapping values

   Leaf values are mapped based on their intrinsic type:

   *  All integer types are mapped to integers and retain their native
      representation

      -  some implementations only support floats for numeric values

   *  decimal64 values are mapped to floats and the value should be
      rounded and truncated as to minimize the loss of information

   *  Enumeration types are mapped using their string representation.

   *  String types remain unchanged.

2.3.  Choice

   Choice constructs from YANG are disregarded and not enforced during
   the mapping process.  Given the temporal nature of TSDBs, where data
   spans across time, different choice branches could be active in a
   single data set, rendering validation and storage restrictions
   impractical.

2.4.  Host / device name

   There is an implicit host label identifying the server, typically set
   to the name of the host originating the time series data.




Larsson                   Expires 20 April 2024                 [Page 4]

Internet-Draft               yang-label-tsdb                October 2023


   Instance data retrieved from YANG-based servers do not generally
   identify the server it originates from.  As a time series database is
   likely going to contain data from multiple servers, the host label is
   used to identify the source of the data.

3.  Querying YANG modeled time series data

   The process of storing YANG-modeled data in label-centric TSDBs, as
   defined in the previous sections, inherently structures the data in a
   way that leverages the querying capabilities of modern TSDBs.  This
   chapter provides guidelines on how to construct queries to retrieve
   this data effectively.

3.1.  1. *Basic Queries*

   To retrieve all data points related to incoming unicast packets from
   the IETF interfaces model:

   *  *InfluxQL*: sql SELECT * FROM
      interfaces_interface_statistics_in_unicast_pkts

   *  *PromQL*: promql interfaces_interface_statistics_in_unicast_pkts

3.2.  2. *Filtering by Labels*

   To retrieve incoming unicast packets specifically for the interface
   eth0:

   *  *InfluxQL*: sql SELECT * FROM
      interfaces_interface_statistics_in_unicast_pkts WHERE
      interfaces_interface_name = 'eth0'

   *  *PromQL*: promql interfaces_interface_statistics_in_unicast_pkts{i
      nterfaces_interface_name="eth0"}

   Similarly, to filter by device / host name:

   *  *InfluxQL*: sql SELECT * FROM
      interfaces_interface_statistics_in_unicast_pkts WHERE host =
      'router-01'

   *  *PromQL*: promql
      interfaces_interface_statistics_in_unicast_pkts{host="router-01"}








Larsson                   Expires 20 April 2024                 [Page 5]

Internet-Draft               yang-label-tsdb                October 2023


3.3.  3. *Time-based Queries*

   *  *InfluxQL*: sql SELECT * FROM
      interfaces_interface_statistics_in_unicast_pkts WHERE time > now()
      - 24h

   Prometheus fetches data based on the configured scrape interval and
   retention policies, so time-based filters in PromQL often center
   around the range vectors.  For data over the last 24 hours:

   *  *PromQL*: promql
      interfaces_interface_statistics_in_unicast_pkts[24h]

3.4.  4. *Aggregations*

   To get the average number of incoming unicast packets over the last
   hour:

   *  *InfluxQL*: sql SELECT MEAN(value) FROM
      interfaces_interface_statistics_in_unicast_pkts WHERE time > now()
      - 1h GROUP BY time(10m)

   *  *PromQL*: promql
      avg_over_time(interfaces_interface_statistics_in_unicast_pkts[1h])

3.5.  5. *Combining Filters*

   To retrieve the sum of incoming unicast packets for eth0 on router-01
   over the last day:

   *  *InfluxQL*: sql SELECT SUM(value) FROM
      interfaces_interface_statistics_in_unicast_pkts WHERE
      interfaces_interface_name = 'eth0' AND host = 'router-01' AND time
      > now() - 24h

   *  *PromQL*: promql sum(interfaces_interface_statistics_in_unicast_pk
      ts{interfaces_interface_name="eth0", host="router-01"})[24h]

3.6.  6. *Querying Enumeration Types*

   In YANG models, enumerations are defined types with a set of named
   values.  The oper-status leaf in the IETF interfaces model is an
   example of such an enumeration, representing the operational status
   of an interface.

   For instance, the oper-status might have values such as up, down, or
   testing.




Larsson                   Expires 20 April 2024                 [Page 6]

Internet-Draft               yang-label-tsdb                October 2023


   To query interfaces that have an oper-status of up:

   *  *InfluxQL*: sql SELECT * FROM interfaces_interface_oper_status
      WHERE value = 'up'

   *  *PromQL*: promql interfaces_interface_oper_status{value="up"}

   Similarly, to filter interfaces with oper-status of down:

   *  *InfluxQL*: sql SELECT * FROM interfaces_interface_oper_status
      WHERE value = 'down'

   *  *PromQL*: promql interfaces_interface_oper_status{value="down"}

   This approach allows us to effectively query interfaces based on
   their operational status, leveraging the enumeration mapping within
   the TSDB.

4.  Requirements on time series databases

   This document specifies a mapping to a conceptual representation, not
   a particular concrete interface.  To effectively support the mapping
   of YANG-modeled data into a label-centric model, certain requirements
   must be met by the Time Series Databases (TSDBs).  These requirements
   ensure that the data is stored and retrieved in a consistent and
   efficient manner.

4.1.  Support for String Values

   Several YANG leaf types carry string values, including the string
   type itself and all its descendants as well as enumerations which are
   saved using their string representation.

   The chosen TSDB must support the storage and querying of string
   values.  Not all TSDBs inherently offer this capability, and thus,
   it's imperative to ensure compatibility.

4.2.  Sufficient Path Length

   YANG data nodes, especially when representing deep hierarchical
   structures, can result in long paths.  When transformed into metric
   names or labels within the TSDB, these paths might exceed typical
   character limits imposed by some databases.  It's essential for the
   TSDB to accommodate these potentially long names to ensure data
   fidelity and avoid truncation or loss of information.






Larsson                   Expires 20 April 2024                 [Page 7]

Internet-Draft               yang-label-tsdb                October 2023


4.3.  High Cardinality

   Given the possibility of numerous unique label combinations
   (especially with dynamic values like interface names, device names,
   etc.), the chosen TSDB should handle high cardinality efficiently.
   High cardinality can impact database performance and query times, so
   it's essential for the TSDB to have mechanisms to manage this
   efficiently.

5.  Normative References

   [RFC7950]  Bjorklund, M., Ed., "The YANG 1.1 Data Modeling Language",
              RFC 7950, DOI 10.17487/RFC7950, August 2016,
              <https://www.rfc-editor.org/info/rfc7950>.

Author's Address

   Kristian Larsson
   Deutsche Telekom
   Email: kristian@spritelink.net































Larsson                   Expires 20 April 2024                 [Page 8]