Internet DRAFT - draft-li-nmrg-dtn-data-generation-optimization
draft-li-nmrg-dtn-data-generation-optimization
Internet Research Task Force M. Li
Internet-Draft C. Zhou
Intended status: Informational D. Chen
Expires: 21 April 2024 China Mobile
19 October 2023
Data Generation and Optimization for Digital Twin Network Performance
Modeling
draft-li-nmrg-dtn-data-generation-optimization-01
Abstract
Digital Twin Network (DTN) can be used as a secure and cost-effective
environment for network operators to evaluate network performance in
various what-if scenarios. Recently, AI models, especially neural
networks, have been applied for DTN performance modeling. The
quality of deep learning models mainly depends on two aspects: model
architecture and data. This memo focuses on how to improve the model
from the data perspective.
Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on 21 April 2024.
Copyright Notice
Copyright (c) 2023 IETF Trust and the persons identified as the
document authors. All rights reserved.
Li, et al. Expires 21 April 2024 [Page 1]
Internet-Draft Data Generation and Optimization for DTN October 2023
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components
extracted from this document must include Revised BSD License text as
described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Revised BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Acronyms & Abbreviations . . . . . . . . . . . . . . . . . . 3
3. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 3
4. Framework of Data Generation and Optimization . . . . . . . . 4
4.1. Data Generation Stage . . . . . . . . . . . . . . . . . . 5
4.2. Data Optimization Stage . . . . . . . . . . . . . . . . . 6
5. Data Generation . . . . . . . . . . . . . . . . . . . . . . . 6
5.1. Network Topology . . . . . . . . . . . . . . . . . . . . 6
5.2. Routing Policy . . . . . . . . . . . . . . . . . . . . . 7
5.3. Traffic Matrix . . . . . . . . . . . . . . . . . . . . . 7
6. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 8
7. Security Considerations . . . . . . . . . . . . . . . . . . . 8
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 8
9.1. Informative References . . . . . . . . . . . . . . . . . 8
9.2. Normative References . . . . . . . . . . . . . . . . . . 9
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 9
1. Introduction
Digital twin is a virtual instance of a physical system (twin) that
is continually updated with the latter's performance, maintenance,
and health status data throughout the physical system's life cycle.
Digital Twin Network (DTN) is a digital twin that is used in the
context of networking [I-D.irtf-nmrg-network-digital-twin-arch]. DTN
can be used as a secure and cost-effective environment for network
operators to evaluate network performance in various what-if
scenarios. Recently, AI models, especially neural networks, have
been applied for DTN performance modeling.
The quality of AI models mainly depends on two aspects: model
architecture and data. This memo focuses on the impact of training
data on the model. The quality of training data will directly affect
the accuracy and generalization ability of the model. This memo
focuses on how to design data generation and optimization methods for
DTN performance modeling, which can generate simulated network data
to solve the problem of practical data shortage and select high-
Li, et al. Expires 21 April 2024 [Page 2]
Internet-Draft Data Generation and Optimization for DTN October 2023
quality data from various data sources. Using high-quality data for
training can improve the accuracy and generalization ability of the
model.
2. Acronyms & Abbreviations
DTN: Digital Twin Network
AI: Artificial Intelligence
AIGC: AI-Generated Content
ToS: Type of Service
OOD: Out-of-Distribution
FIFO: First In First Out
SP: Strict Priority
WFQ: Weighted Fair Queuing
DRR: Deficit Round Robin
BFS: Breadth-First Search
CBR: Constant Bit Rate
3. Requirements
Performance modeling is vital in DTN, which is involved in typical
network management scenarios such as planning, operation,
optimization, and upgrade. Recently, some studies have applied AI
models to DTN performance modeling, such as RouteNet [RouteNet] and
MimicNet [MimicNet]. AI is a data-driven technology whose
performance heavily depends on data quality.
Network data sources are diverse and of varying quality, making it
difficult to directly serve as training data for DTN performance
models:
* Practical data from production networks: Data from production
networks usually have high value, but the quantity, type, and
accuracy are limited. Moreover, it is not practical in production
networks to collect data under various configurations;
Li, et al. Expires 21 April 2024 [Page 3]
Internet-Draft Data Generation and Optimization for DTN October 2023
* Network simulators: Network simulators (e.g., NS-3 and OMNeT++)
can be used to generate simulated network data, which can solve
the problems of quantity, diversity, and accuracy to a certain
extent. However, simulation is usually time-consuming. In
addition, there are usually differences between simulated data and
practical data from production networks, which hinders the
application of trained models to production networks;
* Generative AI models: With the development of AI-Generated Content
(AIGC) technology, generative AI models (e.g., GPT and LLaMA) can
be used to generate simulated network data, which can solve the
problems of quantity and diversity to a certain extent. However,
the accuracy of the data generated by generative AI models is
limited and often has gaps with practical data from production
networks.
Therefore, data generation and optimization methods for DTN
performance modeling are needed, which can generate simulated network
data to solve the problem of practical data shortage and select high-
quality data from multi-source data. High-quality data meets the
requirements of high accuracy, diversity, and fitting the actual
situation of practical data. Training with high-quality data can
improve the accuracy and generalization of DTN performance models.
4. Framework of Data Generation and Optimization
The framework of data generation and optimization for DTN performance
modeling is shown in Figure 1, which includes two stages: the data
generation stage and the data optimization stage.
Li, et al. Expires 21 April 2024 [Page 4]
Internet-Draft Data Generation and Optimization for DTN October 2023
Data generation Data optimization
+---------------------------+ +-------------------------------------+
| | | |
| +---------+ | | +---------+ |
| | | | | +----------+ | | |
| | Network | | | | Practical| | Easy | |
| | topology| +-----------+ | | | data | | samples | |
| | | | | | | +-----+----+ | | |
| | | | Network | | | | | | +--------+ |
| | | | simulator | | | +-----v----+ | | | | |
| | Routing | | | | | | | | Hard | | High | |
| | policy +-> +-+-+-> Candidate+-> samples +-> quality| |
| | | | | | | | data | | | | data | |
| | | | Generative| | | | | | | | | |
| | | | AI model | | | +----------+ | | +--------+ |
| | Traffic | | | | | | OOD | |
| | matrix | +-----------+ | | | samples | |
| | | Data generator| | | (remove)| |
| +---------+ | | | | |
| Network | | +---------+ |
| configuration | | Data selection |
| | | |
+---------------------------+ +-------------------------------------+
Figure 1: Framework of Data Generation and Optimization for DTN
Performance Modeling
4.1. Data Generation Stage
The data generation stage aims to generate candidate data (simulated
network data) to solve the problem of the shortage of practical data
from production networks. This stage first generates network
configurations and then imports them into data generators to generate
the candidate data.
* Network configurations: Network configurations typically include
network topology, routing policy, and traffic matrix. These
configurations need to be diverse to cover as many scenarios as
possible. Topology configurations include the number and
structure of nodes and edges, node buffers' size and scheduling
strategy, link capacity, etc. Routing policy determines the path
of a packet from the source to the destination. The traffic
matrix describes the traffic entering/leaving the network, which
includes the traffic's source, destination, time and packet size
distribution, Type of Service (ToS), etc.
Li, et al. Expires 21 April 2024 [Page 5]
Internet-Draft Data Generation and Optimization for DTN October 2023
* Data generators: Data generators can be network simulators (e.g.,
NS-3 and OMNeT++) and/or the generative AI models (e.g., GPT and
LLaMA). Network configurations are imported into data generators
to generate candidate data.
4.2. Data Optimization Stage
The data optimization stage aims to optimize the candidate data from
various sources to select high-quality data.
* Candidate data: Candidate data includes simulated network data
generated in the data generation stage and the practical data from
production networks.
* Data selection: The data selection module investigates the
candidate data to filter out the easy, hard, and Out-of-
Distribution (OOD) samples. Hard examples refer to samples that
are difficult for the model to accurately predict. During the
training process, exposing the model to more hard examples will
enable it to perform better on such samples later on. Then the
easy samples and hard samples are considered valid samples and
added to the training data. OOD samples are considered invalid
and removed.
* High-quality data: High-quality data needs to meet the
requirements of high accuracy, diversity, and fitting the actual
situation of practical data, which can be verified by expert
knowledge (such as the ranges of delay, queue utilization, link
utilization, and average port occupancy).
5. Data Generation
This section will describe how to generate network configurations,
including network topology, routing policy, and traffic matrix. Then
these configurations will be imported into data generators to
generate the candidate data.
5.1. Network Topology
Network topologies are generated using the Power-Law Out-Degree
algorithm, where parameters are set according to real-world
topologies in the Internet Topology Zoo.
Li, et al. Expires 21 April 2024 [Page 6]
Internet-Draft Data Generation and Optimization for DTN October 2023
When the flow rate exceeds the link bandwidth or the bandwidth set
for the flow, the packet is temporarily stored in the node buffer. A
larger node buffer size means a larger delay and possibly a lower
packet loss rate. The node scheduling policy determines the time and
order of packet transmission, which is randomly selected from the
policies such as First In First Out (FIFO), Strict Priority (SP),
Weighted Fair Queuing (WFQ), and Deficit Round Robin (DRR).
A larger link capacity means a smaller delay and less congestion. To
cover diverse link loads to get good coverage of possible scenarios,
we set the link capacity to be proportional to the total average
bandwidth of the flows passing through the link.
5.2. Routing Policy
Routing policy plays a crucial role in routing protocols, which
determines the path of a packet from the source to the destination.
* Default: We set the weight of all links in the topology to be the
same, that is, equal to 1. Then we use the Dijkstra algorithm to
generate the shortest path configuration. Dijkstra algorithm uses
Breadth-First Search (BFS) to find the single source shortest path
in a weighted digraph.
* Variants: We randomly select some links (the same link can be
chosen more than once) and add a small weight to them. Then we
use the Dijkstra algorithm to generate a series of variants of the
default shortest path configuration based on the weighted graph.
These variants can add some randomness to the routing
configuration to cover longer paths and larger delays.
5.3. Traffic Matrix
The traffic matrix is very important for network performance
modeling. The traffic matrix can be regarded as a network map, which
describes the traffic entering/leaving the network, including the
source, destination, distribution of the traffic, etc.
We generate traffic matrix configurations with variable traffic
intensity to cover low to high loads.
The parameters packet sizes, packet size probabilities, and ToS are
generated according to the validation dataset analysis to have
similar distributions.
The arrival of packets for each source-destination pair is modeled
using one of the time distributions such as Poisson, Constant Bit
Rate (CBR), and ON-OFF.
Li, et al. Expires 21 April 2024 [Page 7]
Internet-Draft Data Generation and Optimization for DTN October 2023
6. Discussion
Several topics related to data generation and optimization for DTN
performance modeling require further discussion.
* Data generation methods: 1) Generate configurations that cover
enough scenarios and scale from small to large networks. 2) Choose
data generators that consider accuracy, speed, fidelity, etc. 3)
Use data augmentation technology to expand the training data by
using a small amount of practical data to generate similar data
through prior knowledge.
* Data optimization methods: 1) Select data from multi-source
candidate data, including hard sample mining, OOD detection, etc.
2) Verify whether the data quality meets the requirements.
* Deployment: 1) Time/space complexity and explainability of the
data generation and optimization methods. 2) Provide feedback for
data collection to form a closed loop.
7. Security Considerations
TBD
8. IANA Considerations
This document has no requests to IANA.
9. References
9.1. Informative References
[I-D.irtf-nmrg-network-digital-twin-arch]
Zhou, C., Yang, H., Duan, X., Lopez, D., Pastor, A., Wu,
Q., Boucadair, M., and C. Jacquenet, "Digital Twin
Network: Concepts and Reference Architecture", Work in
Progress, Internet-Draft, draft-irtf-nmrg-network-digital-
twin-arch-03, 27 April 2023,
<https://datatracker.ietf.org/doc/html/draft-irtf-nmrg-
network-digital-twin-arch-03>.
[MimicNet] Zhang, Q. Zhang., NG, K. K.W. NG., Kazer, C. W. Kazer.,
Yan, S. Yan., Sedoc, J. Sedoc., and V. Liu. Liu,
"MimicNet: Fast Performance Estimates for Data Center
Networks with Machine Learning. In ACM SIGCOMM 2021
Conference (SIGCOMM ’21).", August 2021.
Li, et al. Expires 21 April 2024 [Page 8]
Internet-Draft Data Generation and Optimization for DTN October 2023
[RouteNet] Rusek, K. Rusek., Suárez-Varela, J. Suárez-Varela.,
Almasan, P. Almasan., Barlet-Ros, P. Barlet-Ros., and A.
Cabellos-Aparicio. Cabellos-Aparicio, "RouteNet:
Leveraging Graph Neural Networks for network modeling and
optimization in SDN. IEEE Journal on Selected Areas in
Communication (JSAC), vol. 38, no. 10", October 2020.
9.2. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.
Authors' Addresses
Mei Li
China Mobile
Beijing
100053
China
Email: limeiyjy@chinamobile.com
Cheng Zhou
China Mobile
Beijing
100053
China
Email: zhouchengyjy@chinamobile.com
Danyang Chen
China Mobile
Beijing
100053
China
Email: chendanyang@chinamobile.com
Li, et al. Expires 21 April 2024 [Page 9]