Internet DRAFT - draft-wei-nmrg-gnn-based-dtn-modeling
draft-wei-nmrg-gnn-based-dtn-modeling
Internet Research Task Force Y. Cui
Internet-Draft Y. Wei
Intended status: Informational Z. Xu
Expires: 17 October 2023 Tsinghua University
P. Liu
Z. Du
China Mobile
15 April 2023
Graph Neural Network Based Modeling for Digital Twin Network
draft-wei-nmrg-gnn-based-dtn-modeling-00
Abstract
This draft introduces the scenarios and requirements for performance
modeling of digital twin networks, and explores the implementation
methods of network models, proposing a network modeling method based
on graph neural networks (GNNs). This method combines GNNs with
graph sampling techniques to improve the expressiveness and
granularity of the model. The model is generated through data
training and validated with typical scenarios. The model performs
well in predicting QoS metrics such as network latency, providing a
reference option for network performance modeling methods.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on 17 October 2023.
Copyright Notice
Copyright (c) 2023 IETF Trust and the persons identified as the
document authors. All rights reserved.
Cui, et al. Expires 17 October 2023 [Page 1]
Internet-Draft Network Modeling for DTN April 2023
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components
extracted from this document must include Revised BSD License text as
described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Revised BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Definition of Terms . . . . . . . . . . . . . . . . . . . . . 3
3. Scenarios, Requirements and Challenges of Network Modeling for
DTN . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.1. Scenarios . . . . . . . . . . . . . . . . . . . . . . . . 3
3.2. Requirements . . . . . . . . . . . . . . . . . . . . . . 3
3.3. Main Challenges . . . . . . . . . . . . . . . . . . . . . 4
4. Modeling Digital Twin Networks . . . . . . . . . . . . . . . 5
4.1. Consideration/Analysis on Network Modeling Methods . . . 5
4.2. Network Modeling Framework . . . . . . . . . . . . . . . 6
4.3. Building a Network Model . . . . . . . . . . . . . . . . 7
4.3.1. Networking System as a Relation Graph . . . . . . . . 7
4.3.2. Message-passing on the Heterogeneous Graph . . . . . 7
4.3.3. State Transition Learning . . . . . . . . . . . . . . 8
4.3.4. Model Training . . . . . . . . . . . . . . . . . . . 9
4.4. Model Performance in Data Center Networks and Wide Area
Networks . . . . . . . . . . . . . . . . . . . . . . . . 9
4.4.1. QoS Inference in Data Center Networks . . . . . . . . 9
4.4.2. Time-Series Prediction in Data Center Networks . . . 10
4.4.3. Steady-State QoS Inference in Wide Area Networks . . 10
5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 10
6. Security Considerations . . . . . . . . . . . . . . . . . . . 10
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11
8. Informative References . . . . . . . . . . . . . . . . . . . 11
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 11
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 11
1. Introduction
Digital twin networks are virtual images (or simulations) of physical
network infrastructures that can help network designers achieve
simplified, automated, elastic, and full-lifecycle operations. The
task of network modeling is to predict how network performance
metrics, such as throughput and latency, change in various "what-if"
scenarios[I-D.irtf-nmrg-network-digital-twin-arch], such as changes
in traffic conditions and reconfigurations of network devices. In
this paper, we propose a network performance modeling framework based
Cui, et al. Expires 17 October 2023 [Page 2]
Internet-Draft Network Modeling for DTN April 2023
on graph neural networks, which supports modeling various network
configurations including topology, routing, and caching, and can make
time-series predictions of flow-level performance metrics.
2. Definition of Terms
This document makes use of the following terms:
DTN: Digital twin networks.
GNN: Graph neural network.
NGN: Networking Graph Networks.
3. Scenarios, Requirements and Challenges of Network Modeling for DTN
3.1. Scenarios
Digital twin networks are digital virtual mappings of physical
networks, and some of their main applications include network
technology experiments, network configuration validation, network
performance optimization, etc. All of these applications require
accurate network models in the twin network to enable precise
simulation and prediction of the functionality and performance
characteristics of the physical network.
This document mainly focuses on network performance modeling, while
the modeling for network configuration validation is not within the
scope.
3.2. Requirements
Physical networks are composed of various network elements and links
between them, and different network elements have different
functionalities and performance characteristics. In the early
planning stages of the network lifecycle, the physical network does
not fully exist, but the network owner hopes to predict the network's
capabilities and effects based on the network model and its
simulation, to determine whether the network can meet the future
application requirements running on it, such as network throughput
capacity and network latency requirements, and to build the network
at the optimal cost. During the network operation stage, network
performance modeling can work in conjunction with the online physical
network to achieve network changes and optimization, and reduce
network operation risks and costs. Therefore, network modeling
requires the ability of various performance-related factors in the
physical network and striving for accuracy as much as possible. This
also puts higher demands on network modeling, including the following
Cui, et al. Expires 17 October 2023 [Page 3]
Internet-Draft Network Modeling for DTN April 2023
aspects.
(1) In order to produce accurate predictions, a network model must
have sufficient expressiveness to include as many influencing factors
related to network performance indicators as possible. Otherwise, it
will inevitably fail to generalize more general network environments.
Among these factors, network configuration can span various different
levels of operation from end hosts to network devices. For example,
congestion control at the host level, scheduling strategies, and
active queue management at the queue level, bandwidth and propagation
delay at the link level, shared buffer management strategies at the
device level, as well as topology and routing schemes at the network
level. In addition, there are complex interactions between these
factors.
(2) In different network scenarios, the granularity of concern for
operators may vary greatly. In wide area network scenarios,
operators primarily focus on the long-term average performance of
aggregated traffic, where path-level steady-state modeling is usually
sufficient to guide the planning process (e.g., traffic engineering).
In local area networks and cloud data center networks, operators are
more concerned with meeting performance metrics such as latency and
throughput, as well as network infrastructure utilization. However,
fine-grained network performance observation is a goal that network
operators and cloud providers continuously strive for, in order to
provide precise information about when and which traffic is being
interfered with. This requires network models to support flow-level
time series performance prediction.
3.3. Main Challenges
(1) Challenges related to the large state space. Corresponding to
the requirement of expressiveness of the large state space, the
number of potential scenarios that the network model faces is large.
This is because network systems typically consist of dozens to
hundreds of network nodes, each of which may contain multiple
configurations, leading to an explosion in the combination of
potential states. One simple solution to build a network model is to
construct a large neural network that takes flat feature vectors
containing all configuration information as input. However, the
input size of such a neural network is fixed, and it cannot be scaled
to handle information from an arbitrary number of nodes and
configurations. The final complexity of the neural network will
increase with the number of configurations, making it difficult to
train and generalize.
Cui, et al. Expires 17 October 2023 [Page 4]
Internet-Draft Network Modeling for DTN April 2023
(2) Challenges related to modeling granularity. Unlike aggregated
end-to-end path-level traffic, the transmission behavior of flows
undergoes cascading effects since it is typically controlled by some
control loop (e.g., congestion control). Once the configurations
related to control (e.g., ECN threshold, queue buffer size) change
during flow transmission, the resulting flow traffic measurements
(e.g., throughput and packet loss) will experience significant
changes, and the measured traffic state at this time will not reflect
the results of these changes. Therefore, predicting flow-level
performance from traffic measurements may be more difficult than
inferring QoS from traffic measurements. Here, we use traffic
measurements as input to predict the corresponding QoS, which we call
"inference", while using traffic demand as another input together to
output flow-level performance (e.g., flow completion time) in
"prediction" for the hypothetical scenario.
4. Modeling Digital Twin Networks
4.1. Consideration/Analysis on Network Modeling Methods
Traditional network modeling typically uses methods such as queuing
theory and network calculus, which mainly model from the perspective
of queues and their forwarding capabilities. In the construction of
operator networks, network elements come from different device
vendors with varying processing capabilities, and these differences
lack precise quantification. Therefore, modeling networks built with
these devices is a very complex task. In addition to queue
forwarding behavior, the network itself is also influenced by various
configuration policies and related network features (such as ECN,
Policy Routing, etc.), and coupled with the flexibility of network
size, this method is difficult to adapt to the modeling requirements
of digital twin networks.
Cui, et al. Expires 17 October 2023 [Page 5]
Internet-Draft Network Modeling for DTN April 2023
In recent years, the academic community has proposed data-driven
graph neural network (GNN) methods, which extend existing neural
networks for systems represented in graph form. Networks themselves
are a kind of graph structure, and GNNs can be used to learn the
complex network behavior from the data. The advantage of GNN is its
ability to model non-linear relationships and adapt to different
types of data, improving the expressiveness and granularity of
network modeling. By combining GNN with graph sampling techniques,
the method improves the expressiveness and granularity of network
models. This method involves sampling subgraphs from the original
network based on specific criteria, such as the degree of
connectivity and centrality. Then, these subgraphs are used to train
a GNN model that captures the most relevant network features.
Experimental results show that this method can improve the accuracy
and granularity of network modeling compared to traditional
techniques.
This document will introduce a method of network modeling using graph
neural networks (GNNs) as a technical option for providing network
modeling for DTN.
4.2. Network Modeling Framework
+--------------------+
| +----------------+ | +----------------------+ +-----------------+
| | Intent |-->|Network Graph Abstract|-->|NGN Configuration|
| +----------------+ | +----------^-----------+ +-------+---------+
| | | |
| +----------------+ | | +--------V---------+
| |Domain Knowledge|--------------+ | State Transition |
| +----------------+ | |Model Construction|
| | +--------+---------+
| | |
| +----------------+ | +---------------+ +---------V---------+
| | Data |----->|Model Training |<----| Network Model Desc|
| +----------------+ | +-------+-------+ +-------------------+
| | |
| Target Network | +-------V-------+
+--------------------+ | Network Model |
+---------------+
Figure 1: Network modeling design process
Network modeling design process:
1. Before modeling, determine the network configurations and
modeling granularity based on the modeling intent.
Cui, et al. Expires 17 October 2023 [Page 6]
Internet-Draft Network Modeling for DTN April 2023
2. Use domain knowledge from network experts to abstract the network
system into a network relationship graph to represent the complex
relationships between different network entities.
3. Build the network model using configurable graph neural network
modules and determine the form of the aggregation function based on
the properties of the relationships.
4. Use a recurrent graph neural network to model the changes in
network state between adjacent time steps.
5. Train the model parameters using the collected data.
4.3. Building a Network Model
Describing the process and results of network modeling, i.e., the
four steps (Steps 2 to 5) in Section 4.2 of the network modeling
design process.
4.3.1. Networking System as a Relation Graph
Representing a network system as a heterogeneous relationship graph
(referred to as "graph" hereafter) to provide a unified interface to
simulate various network configurations and their complex
relationships. Network entities related to performance are mapped as
graph nodes with relevant characteristics. Heterogeneous nodes
represent different network entities based on their attributes or
configurations. Edges in the graph connect nodes that are considered
directly related. There are two types of nodes in the graph,
physical nodes representing specific network entities with local
configurations (e.g., switches with buffers of a certain size), and
virtual nodes representing performance-related entities (e.g., flows
or paths), thus allowing final performance metrics to be attached to
the graph. Edges reflect the relationships between entities and can
be used to embed domain knowledge-induced biases. Specifically,
edges can be used to model local or global configurations.
4.3.2. Message-passing on the Heterogeneous Graph
Use Networking Graph Networks (NGN) [battaglia2018] as the
fundamental building block for network modeling. An NGN module is
defined as a "graph-to-graph" module with heterogeneous nodes that
takes an attribute graph as input and, after a series of message-
passing steps, outputs another graph with different attributes.
Attributes represent the features of nodes and are represented as
tensors of fixed dimensions. Each NGN block contains multiple
configurable functions, such as aggregation, transformation, and
update functions, which can be implemented using standard neural
Cui, et al. Expires 17 October 2023 [Page 7]
Internet-Draft Network Modeling for DTN April 2023
networks and shared among same-type nodes. The aggregation function
can take the form of a simple sum or an RNN, while the transformation
function can map the information of heterogeneous nodes to the same
hidden space of the target type nodes, allowing for unified
operations in the update function and no limitation on the modeling
capability of GNNs.
One feed-forward NGN pass can be viewed as one step of message
passing on the graph. In each round of message passing, nodes
aggregate same-type messages using the corresponding aggregation
function and transform the aggregated messages using the type
transformation function to handle heterogeneous nodes. The
transformed messages are then fed into the update function to update
the node's state. After a specified number of rounds of message
passing, a readout function is used to predict the final performance
metric.
Typically, NGNs first perform a global update and then independent
local updates for nodes in each local domain. Circular dependencies
between different update operations can be resolved through multiple
rounds of message passing.
4.3.3. State Transition Learning
The network model needs to support fine-grained prediction
granularity and transient prediction (such as the state of a flow) at
short time scales. To achieve this, this document uses the recurrent
form of the NGN module to learn to predict future states from the
current state. The model runs at a time step and has an "encoder-
processor-decoder" structure.
+-------------------+
| +--------------+ |
| | +----------+ | |
G_hidden(t-1)---^----->| +>| NGN_core |-+ |------+----->G_hidden(t)
| | +----------+ | |
+------+----+ |Message passing x M| +----V------+
G_in(t)->|NGN_encoder| +-------------------+ |NGN_decoder|->G_out(t)
+-----------+ Processor +-----------+
Figure 2: State transition learning
These three components are NGN modules with the same abstract graph
but different neural network parameters.
Cui, et al. Expires 17 October 2023 [Page 8]
Internet-Draft Network Modeling for DTN April 2023
Encoder: converts the input state into a fixed-dimensional vector,
independently encoding different nodes, ignoring relationships
between nodes, and not performing message passing.
Processor: performs M rounds of message passing, with the input being
the output of the encoder and the previous output of the processor;
Decoder: independently decodes different nodes as the readout
function, extracting dynamic information from the hidden graph,
including the current performance metrics and the state used for the
next step state update. Note that the next graph G_(t+1) is updated
according to G_out(t), which is not shown in Figure 2.
To support state transition modeling, the model distinguishes between
the static and dynamic features of the network system and represents
them as different graphs. The static graph contains the static
configuration of the system, including physical node configurations
(such as queue priorities and switch buffer sizes) and virtual node
configurations (such as flow sizes). The dynamic graph contains the
temporary state of the system, mainly related to virtual nodes (such
as the remaining size of a flow or end-to-end delay of a path). In
addition, when considering dynamic configurations (such as time-
varying ECN thresholds), the actions taken (i.e., new configurations)
should be placed in the dynamic graph and input at each time step.
4.3.4. Model Training
The L2 loss between the predicted values and the corresponding true
values is used to supervise the output features of each node
generated by the decoder for model training. To generate long-term
prediction trajectories, the model iteratively feeds back the updated
absolute state prediction values to the model as input. As a data
preprocessing and postprocessing step, we standardized the input and
output of the NGN model.
4.4. Model Performance in Data Center Networks and Wide Area Networks
4.4.1. QoS Inference in Data Center Networks
This use case aims to verify whether the model can accurately perform
time-series inference and generalize to unseen configurations,
demonstrating the application of online performance monitoring. The
network model needs to infer the evolution of path-level latency in
the time series given real-time measurements of traffic on the given
path. The datasets used in this scenario is generated by ns-3
[NS-3]. Under specific experimental settings, the MAPE of path-level
latency can be controlled below 7% [wang2022].
Cui, et al. Expires 17 October 2023 [Page 9]
Internet-Draft Network Modeling for DTN April 2023
4.4.2. Time-Series Prediction in Data Center Networks
This use case verifies whether the model can provide flow-level time-
series modeling capability under different configurations. Unlike
the previous case, the behavior of the network model in this case is
like a network simulator, which needs to predict the Flow Completion
Time (FCT) without traffic collection information, only using flow
descriptions and static topology information as input. The datasets
used in this scenario is generated by ns-3 [NS-3]. Under specific
experimental settings, the predicted FCT distribution matches the
true distribution well, with a Pearson correlation coefficient of 0.9
[wang2022]. In addition, the model can also predict throughput,
latency, and other path/flow-level metrics in time-series prediction.
This use case verifies the model's ability in time-series prediction,
and theoretical analysis combined with experimental verification
shows that the model does not have cumulative errors in long-term
time-series prediction.
4.4.3. Steady-State QoS Inference in Wide Area Networks
This use case aims to verify that the model can work in the Wide Area
Network (WAN) scenario and demonstrate that the model can effectively
model and generalize to global and local configurations, which
reflects the application of offline network planning. It is worth
noting that the WAN scenario has more topology changes compared to
the data center network scenario, which imposes higher demand on the
model's performance. Public network modeling dataset [NM-D] is used
in this scenario for evaluation. Under specific experimental
settings, the model is experimentally verified in three different WAN
topologies, including NSFnet, GEANT2, and RedIRIS, and achieves a
50th percentile APE of 10% for path-level latency, which is
comparable to the performance of the domain-specific model RouteNet
[rusek2019]. This use case verifies the model's generalization in
topology and configuration and its versatility in the scenario.
5. Conclusion
This draft implements a network performance modeling method based on
graph neural networks, addressing the problems and challenges in
network modeling in terms of expressiveness and modeling granularity.
The model's versatility and generalization are verified in typical
network scenarios, and good simulation performance prediction is
achieved.
6. Security Considerations
TBD.
Cui, et al. Expires 17 October 2023 [Page 10]
Internet-Draft Network Modeling for DTN April 2023
7. IANA Considerations
TBD.
8.
Informative References
[battaglia2018]
Battaglia, P. W., Hamrick, J. B., Bapst, V., Sanchez-
Gonzalez, A., Zambaldi, V., Malinowski, M., Tacchetti, A.,
Raposo, D., Santoro, A., Faulkner, R., and others,
"Relational inductive biases, deep learning, and graph
networks", 2018.
[I-D.irtf-nmrg-network-digital-twin-arch]
Zhou, C., Yang, H., Duan, X., Lopez, D., Pastor, A., Wu,
Q., Boucadair, M., and C. Jacquenet, "Digital Twin
Network: Concepts and Reference Architecture", Work in
Progress, Internet-Draft, draft-irtf-nmrg-network-digital-
twin-arch-02, 24 October 2022,
<https://datatracker.ietf.org/doc/html/draft-irtf-nmrg-
network-digital-twin-arch-02>.
[NM-D] "Network Modeling Datasets",
<https://github.com/BNN-UPC/NetworkModelingDatasets>.
[NS-3] "Network Simulator, NS-3", <https://www.nsnam.org/>.
[rusek2019]
Rusek, K., Suarez-Varela, J., Mestres, A., Barlet-Ros, P.,
and A. Cabellos-Aparicio, "Unveiling the potential of
Graph Neural Networks for network modeling and
optimization in SDN", 2019.
[wang2022] Liu., M. W. L. H. Y. C. R. L. Z., "xNet: Improving
Expressiveness and Granularity for Network Modeling with
Graph Neural Networks. IEEE INFOCOM,", 2022.
Acknowledgements
Authors' Addresses
Yong Cui
Tsinghua University
30 Shuangqing Rd, Haidian District
Beijing
China
Cui, et al. Expires 17 October 2023 [Page 11]
Internet-Draft Network Modeling for DTN April 2023
Email: cuiyong@tsinghua.edu.cn
Yunze Wei
Tsinghua University
30 Shuangqing Rd, Haidian District
Beijing
100876
China
Email: yunzewei@outlook.com
Zhiyong Xu
Tsinghua University
30 Shuangqing Rd, Haidian District
Beijing
100876
China
Email: xuzhiyong@tsinghua.edu.cn
Peng Liu
China Mobile
No.32 XuanWuMen West Street
Beijing
100053
China
Email: liupengyjy@chinamobile.com
Zongpeng Du
China Mobile
No.32 XuanWuMen West Street
Beijing
100053
China
Email: duzongpeng@foxmail.com
Cui, et al. Expires 17 October 2023 [Page 12]