Internet-Draft NDT Challenges July 2024
Liebsch, et al. Expires 9 January 2025 [Page]
Workgroup:
Network Management Research Group (NMRG)
Internet-Draft:
draft-liest-nmrg-ndt-challenges-00
Published:
Intended Status:
Informational
Expires:
Authors:
M. Liebsch
NEC
M. Stiemerling
h_da
N. Schark
h_da

Challenge: Network Digital Twin - Practical Considerations and Thoughts

Abstract

This document focuses on practical challenges associated with the design, creation and use of Network Digital Twins. According to the identified challenges, a set of suitable functional elements are described to overcome some of these challenges. Experiences from the design, development and evaluation of an SDN-based Network Digital Twin are described and conclude this document.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 9 January 2025.

Table of Contents

1. Terminology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

2. Introduction

Digital Twins are well known from Industry applications, where, for example, a robotic system or production machine with sensors and actuators is digitally represented on a computer. The twin is used for monitoring and simulation purpose. In the meantime, the computation and use of a digital twin of a data network gets more traction and is discussed in the research community in particular in the view of automated network management, also in the view of beyond 5th Generation (5G) of a mobile communication system.

The creation of a Network Digital Twin (NDT) implies many challenges, incl. the continuous collection and computation of relevant data points from network components of a possibly large network topology with many network functions and nodes. As a key use case, the user of an NDT may apply changes, such as configurations or load, first to the digital representation of the network and evaluate and assess these changes' impact to the network operation in terms of performance, latency, or stability. Such simulation requires proper modelling of the network's information and behavior. The IRTF's Network Management Research Group (NMRG) is discussing various use cases and is working on a suitable reference architecture for NDTs [I-D.irtf-nmrg-network-digital-twin-arch].

This document focuses on practical challenges associated with the design, creation and use of NDTs. Furthermore, some technology directions and methodologies are discussed as possible solution to overcome some of these challenges. Experiences from the design, development and evaluation of an prototype implementation, that realizes an NDT of a Software-defined Network (SDN) completes this document.

3. Practical Challenges in operating a Network Digital Twin

A user of a Network Digital Twin may be a person, which uses a suitable frontend, or an automated process, such as a network OAM and Orchestration system. The user is probably aware of the detailed or an abstracted view of the physical network topology. The NDT Instance comprises all functions that are needed to monitor the physical network and to generate a digital representation of it, to expose a digital representation of the network to the user, take probe requests from a user to simulate changes in the digital representation of the network, and provide simulation results back to the user. To build a digital representation of the network, the NDT instance monitors network functions in the physical network and collects relevant data points. The resulting digital representation of the network is in the following denoted as Twin Model.

The following assumptions apply:

o The NDT Instance is aware of the network nodes, functions and segments that are in scope of its duty to build and maintain a Twin Model
o The NDT Instance is aware of relevant data points and how to collect them from the physical network. The NDT Instance may probe periodically for these data points or schedule periodic or event-based reporting of these data points, e.g. in a network controller.
o The NDT Instance and the user share the same descriptors and level of detail of the Twin Model's topology.
o The NDT Instance utilizes at least one method and model to apply a user's probe request for changes in the network for simulation. The model may be aligned with tools for discrete event simulation, mathematical models or models from the Artificial Intelligence and Machine Learning Science. A model may be pre-provisioned or automatically generated and improved throughout the NDT Instance operation.
o Based on the provided and assessed simulation results, a user may decide to apply the previously probed or adjusted changes to the physical network. Different options exist for a concrete implementation of such enforcement. The user may have access to a network controller northbound interface or any other API to apply changes to a physical network. One option is based on the principles of Intent Based Networking (IBN), where the NDT Instance provides an API to its user to issue probe requests and further semantic, that allows the NDT Instance to assess the simulation result and enforce changes in the physical network on its own, without providing simulation results back to the user.

Figure 1 depicts generalized principles of generating and using a Network Digital Twin. The NDT Instance maintains a model of the physical network by means of constantly monitored data points. These data points are monitored on the relevant network nodes and network functions (NF) and transmitted directly or indirectly (via a network controller) to the NDT Instance. The current state of a network is exposed to the NDT user. Change requests in the network are issued to the NDT Instance. Such requests may, for example, tune a parameter in a network node, such as load, performance, routing table entry, or load balancing strategy. The simulation results may by provided back to the user to take further action.

The functional architecture of an NDT as described in [I-D.irtf-nmrg-network-digital-twin-arch] comprises northbound interfaces for users or applications to leverage the NDT's features, and southbound interfaces towards the physical network and its components to collect data points and enforce control decisions.

                                         +-------------+
   +--+                                  |             |
   |NF|- - - - - - - - - t1  - - - - - ->|             |  expose +------+
   +--+                                  |             | ------> |      |
            +--+                         |             |  probe  |      |
            |NF|- - - - -t2 - - - - - - >|     NDT     | <------ | NDT  |
            +--+                         |  Instance & |  result | User |
+--+                                     |    Model    | ------> |      |
|NF|- - - - - - - - - - -t3 - - - - - - >| Computation |         +------+
+--+                                     |             |
                           +--+          |             |
                           |NF|- - t4- ->|             |
                           +--+          +-------------+

Figure 1: Principle in generating and using a Network Digital Twin

The complete life-cycle of an NDT Instance may imply the following challenges:

o A large number and diversity of nodes and functions needs to be monitored, such as physical or virtual routers, switches, network functions, or even compute, storage and networking resources. Scalability needs to be ensured in dependency of the number and types of data points and the level of details needed for model generation and maintenance.
o A large number of data points needs to be transmitted upstream towards the NDT Instance for data aggregation, storage and computation. The platforms that host components associated with the NDT Instance must provide sufficient compute, storage and networking resources to handle such data volume and its treatment.
o The change rate, e.g., events per second, of any data point plays also a crucial role. As all changes of a data point could potentially transferred into the NDT instance, this may lead to either an overload situation of the instance or it may be even impossible to keep up with all state changes, e.g., if considering TCP flows and their changes with respect to sequence numbers. This will require a proper dimensioning in terms of granularity of the data point's information to be relayed to the NDT instance.
o Transmission latency (t1, t2, t3, ...) from a number of network functions to the NDT Instance may differ dependent on the topological distance of a network function, its monitoring strategy (local data point aggregation) and network performance. In case data points are labeled with timestamps, synchronization of distributed network nodes/functions may be required. Dependent on the modelling technique, differences in transmission latency and timestamps may have impact on the model accuracy.
o Users of an NDT Instance may have different requirements on the relevant network scope, accuracy, attributes associated with the network nodes, functions and segments, as well as with delay tolerance in the NDT providing simulation results back to the user.
o Impact and severity of the above items depend on many factors, such as network size, network performance, expected accuracy, and the deployment strategy of NDT enablers (centralized vs. distributed).
o It is likely that a NDT is used by multiple users in parallel. This is not an issue when these uses concurrently read state from the NDT, but challenges arise when multiple users apply changes to the same NDT. This in fact means that parallel configurations of the NDT happen and it has to be ensured that these configurations are not conflicting each other or, in case of a conflict, are merged in a meaningful way.

4. NDT Instance -- Functional Components and Reference Points

The IRTF's Network Management Research Group (NMRG) drafted and published a Reference Architecture for NDTs. This section complements the NMRG's NDT reference architecture without interference and adds a few functional elements in support of the subsequent discussion section. While Section 4.1 depicts an NDT reference architecture with the focus on selected functions and reference points, Section 4.2 discusses the roles of some functional elements and reference points.

4.1. NDT reference architecture -- an extended view

The functional architecture of an NDT as described in [I-D.irtf-nmrg-network-digital-twin-arch] comprises northbound interfaces for users or applications to leverage the NDT's features, and southbound interfaces towards the physical network and its components to collect data points and enforce control decisions.

Figure 2 distills and depicts a few functional additional elements and reference points that may play a role in tackling the above challenges. The Network Twin Instance (T) comprises an interface (M-T) to a Management System, that initializes the Network Twin Instance, monitors its operation and applies changes as needed, e.g. the scope of the physical network from which a twin should be created, or changes in the physical network topology that need to be taken into account for the twin creation. Monitoring as well as the enforcement of changes in the physical network may be performed through a network controller (NW Controller), which is decoupled from the Network Twin Instance and used by the Network Twin Instance as well as by the NDT User (U) through the NW Controller's northbound interface (T-Ctrl, U-Ctrl). In case of a distributed and collaborative deployment, Network Twin Instances can utilize a federation interface (T-T). The role of the illustrated functional element and reference points in addressing the above challenges is discussed in the subsequent Section 4.2


      +------------+             +--------------+
      | Management |             | NDT User (U) |
      +------------+             +--------------+
            ^                          ^ ^
            |M-T                    U-T| |
            |                          | +---------------+
            v                          v                 |
 +----------------------------------------------+        |
 |              Network Twin Instance (T)       |        |
 |                                              |        |
 |  +-------------+      +--------------------+ +-----+  |
 |  | Interpreter |      |  Model Federation  | |     |  |
 |  +-------------+      +--------------------+ |     |  |
 |                                              | T-T |  |
 |  +-------------+      +--------------------+ |     |  |
 |  |             |      | Model Verification | |     |  |
 |  | Twin Models |      +--------------------+ |<----+  |
 |  |             |                             |        |
 |  +-------------+   +------------+            |        |
 |                    |            |  +-------+-+        |
 |                    |            |  |       ^          |
 |                    +------------+  |       |T-Ctrl    |U-Ctrl
 |                                    |       v          v
 |  +-----------+     +------------+  |  +------------------+
 |  |Data Points|     |            |  |  |  NW Controller   |
 |  |  Storage  |     |            |  |  +------------------+
 |  +-----------+     +------------+  |    |      ^       |
 |                                    |    |      |       |
 +------------------------------------+    |      |       |
                                           |      |       |
                                   Ctrl_mon|  Data_Points |Ctrl_conf
                                           v      |       v
 +--------------------------------------------------------------+
 |                   Physical Network (NW)                      |
 +--------------------------------------------------------------+

Figure 2: Functional Elements and Reference Points of a Network Digital Twin

4.2. The role of some functional elements and reference points

The following list describes the role of the functional elements and reference points that are highlighted in Figure 2.

o Interpreter -- Receives probe requests with changes that apply to the Network Twin and additional descriptions to guide the Network Twin Instance for computing the result. The U-T reference point may be Intent-based, which requires the Interpreter to classify the request and determine relevant technical attributes and network function before applying the changes to the Twin Model. Any other API may apply to the U-T reference point and can give the NDT User full control on the attributes that apply to a particular model component should change.
o Twin Data and Behavioral Model (TDBM)-- A model of the data and of the complete network and its behavior that should be represented by the NDT, or a sub-network in case of distributed and collaborating Network Twin Instances.
o Network Controller (NWC) -- Exposes northbound interfaces to the Network Twin Instance (T-Ctrl) and the NDT User (U-Ctrl). The northbound interface is used to configure physical network monitoring and data point collection, as well as to apply changes to the physical network. The NWC exposes southbound interfaces to the physical network for scheduling monitoring rules on the network components and function (Ctrl_mon), to collect monitoring result (Data Points), and to enforce changes in the physical network
o Data Points Storage -- Stores and structures monitored data points in alignment with the Twin Model. In case of a distributed and collaborative deployment, the Data Points Storage may hold only data points associated with the respective sub-network.
o Model Verification -- In case a machine learning model is being used, the model may be accurate and valid for some time, but may need to be re-trained in case of re-configurations in the physical network. A new model may be trained in the background while one model is being used at a time. In case a model turns out to be inaccurate, a suitable indication may be exposed to all connected NDT users until a new, more accurate model is in service.
o Model Federation -- A distributed deployment of multiple Network Twin Instances (T) can be advantageous in many aspects. A single instance can model a sub-network comprising only a single network component or network functions, or may take a few neighbor components and their data points into account. Such strategy can increase scalability and accuracy due to the lower volume on network topology information and data points, as well as the resulting smaller Twin Model. Accuracy may benefit from lower transmission latency of data points. Distributed Network Twing Instances can collaborate via the T-T reference points and expose complete models to neighboring or all Network Twin Instances. Model aggregation can be accomplished by a single or multiple dedicated Network Twin Instances.

For large physical networks and their representation as NDT, a distributed deployment may be advantageous in many aspects. For small sub-networks, In-Network Computing could be a suitable enabler to run machine learning models that represent the NDT of a single or a few neighboring nodes. Training of such sub-network model may be performed on the node with In-Network Computing capabilities or by an external node.

5. SDNDT - Implementation of a Software-defined Network Digital Twin

5.1. Discussion

A first architectural and practical challenge is I) how to extract information about the current state of the physical network to the Network Digital Twin (NDT) and II) how to feed back changes from the NDT to the physical network.

There are two extremes:

o A fully distributed approach where each single network element writes its state into a distributed database and changes are fed back from this database to the elements. NDT users also have to read and write state to the distributed database.
o A fully centralized approach is also possible where a network controller is in charge of reading and writing from/to the network elements. This network controller would have its own data base and NDT users would solely interact with the network controller.

Further we distinguish between these types of NDTs (though there are more):

o Static Test System
o Live Reacting System

The static test system is the more simple type of the two systems. The overall goal is to be able to create a NDT instance on demand, which represents the physical network as good as possible. It lacks the need for dynamic up- dates in either direction, as only a snapshot is taken on-demand. By design, changes should only be applied to either the physical system or the Digital Twin if explicitly desired. In a realistic scenario, the NDT has to be updated to keep up with changes done on the physical network, but its sufficient, if the virtual environment gets recreated each time, as it should only represent a static state of the system at a given time. This approach may look overly simplified, but for a first proof-of-concept implementation it may be the best way, as the number of challenges and constraints is low.

The long-term goal is to have a NDT that is (optimally) in total sync with the physical environment or it represents the physical network as close as possible. This would call for an extension of the static test system to a live reacting system, i.e., automatic synchronization of state changes in the physical network to the NDT and triggered changes from the NDT back. However, the live reacting system will need live-data, i.e., real-time updates to the NDT, which may be challenging in terms of amount of information and the required granuality of the data, by the NDT instance.

In order to achieve a first usable implementation we did limit it to a fully-centralized approach for a static live test system. This focusses the implementation efforts to extracting configuration and runtime information from the physical network to the NDT and the feedback from the NDT. However, it of course neglects the real-time aspect, but getting a first workable solution was and is the goal.

5.2. Implementation Overview

For the implementation of the live static test system we used the goSDN SDN controller [gosdn] as network controller and on top if it a specialized application, i.e., the venv manager [venv-manager]. The emulation of the network elements can be done on virtual machines, but we did chose to rely on containerlab [containerlab]

Figure 3 shows the principle architecture of the static test system. The roles of, as described in Section 4.2, Interpreter, federation, and verification are currently not implemented, but will follow in the future. It is distinguished in the physical network and the NDT part. From the structure both parts are identically:

o network controller -- is a SDN controller (SDN cntrl in the figure) and has the role of reading and writing configuration and state to and from the network elements. The controller is there in two instances, in the upper part for the physical network and in the lower part for the NDT instance.
o venv-manager -- is a networking app running on the North-Bound-Interface (NBI) of goSDN and is in charge of retrieving the topology of the physical network and to feed it over the interface (2) (right hand side) to the NDT instance's venv-manager.
o Controller-to-controller interface -- this interface, (1) in the figure, is used to synchronize the NDT instance with the configuration and state data of the physical network and also from the NDT back to the physical part.
o network elements -- are either physical in the physical network domain or emulated ones in the NDT domain (lower part). In our implementation with use either virtual machines or containerlab for emulating the physical network elements.

The M-T and U-T reference points are exposed by the goSDN controller for the NDT instance.





      +----------------------------------------------+
      |          goSDN controller - eco system       |
      |              for physical network            |
      |  +-------------+      +--------------------+ |
      |  |             |      |                    | |
    +-+->+ SDN cntrl   |<---->|    venv-manager    |<+--+
    | |  |             |      +--------------------+ |  |
    | |  +-------------+                             |  |
    | |      |  ^   |                                |  |
    | +------+--+---+--------------------------------+ (2)
    |        |  |   L----------+                        |
   (1)       |  +----------+   |                        |
    |        |             |   |                        |
    | Crl_mon| Data_Points |   |Ctrl_conf               |
    |        |             |   v                        |
    | +----------------------------------------------+  |
    | |           Physical Network (NW)              |  |
    | +----------------------------------------------+  |
    |                                                   |
    |                                                   |
    |      ^   ^                                        |
    |   M-T|   |U-T                                     |
    |      |   |                                        |
    | +----+---+-------------------------------------+  |
    | |    |   | goSDN controller - eco system       |  |
    | |    v   v     for NDT                         |  |
    | |  +-------------+      +--------------------+ |  |
    | |  |             |      |                    | |  |
    +-+->+ SDN cntrl   |<---->|    venv-manager    |<+--+
      |  |             |      +--------------------+ |
      |  +-------------+                             |
      |      |  ^   |                                |
      +------+--+---+--------------------------------+
             |  |   L----------+
             |  +----------+   |
             |             |   |
      Crl_mon| Data_Points |   |Ctrl_conf
             |             |   v
      +----------------------------------------------+
      |           Virtual/NDT Network                |
      +----------------------------------------------+

Figure 3: Architecture overview of the static test system

The goSDN controller uses gNMI with Yang models, namely a subset of OpenConfig [openconfig-ym], to represent the configuration and state data of the network elements and thus the whole network. The gNMI interfaces, with the respective data models, is used to implement the Crl_mon and Ctrl_conf interfaces, as well as, the collection of data points. The state of devices is represented in the data models within the controller.

For testing purposes we did use Arista's CEOS docker images within containerlab, as a commercial counter part, for network elements. Further, we have developed our own SDN agent, the gnmi-target [danet-gnmi-target], which runs on Linux (tested for debian and ubuntu) and in the future also on FreeBSD 13.2-RELEASE and newer.

5.3. First Findings

The current implementation of a NDT instance is usable for a static test system. The implementation works with one vendor specific operating system and our own gnmi-target as SDN agent on a network element. However, the crucial point will be if and how the required information can be extracted from the physical network. This is usually less a conceptual issue, but more a practical question what is accessbile with more less standardized interfaces.

The current aim was not to have full-fledged real-time live reacting system, but to do the first steps, learn, and then move on towards more features, such as live feeds.

Also self-learning behavioral model of the network elements was not developed, as we rely on virtualized versions of the network element's operating system, such as, Aristas CEOS or our own SDN agent on plain Linux. This neglects of course any impact of the hardware of a real network element, e.g., port or forwarding engine behavior.

6. IANA Considerations

This document does not have IANA considerations.

7. Security Considerations

Security considerations are to be done in future revisions of this memo.

However, one can imagine that a NDT instance with a full copy of the configuration and state information of a complete network is a huge trove for any attacker.

8. Acknowledgments

Neil Schark is partially funded by the German BMBF DemoQuanDT project. Martin Stiemerling is partially funded by the German BSI ADWISOR5G project.

9. References

9.1. Normative References

[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/info/rfc8174>.

9.2. Informative References

[I-D.irtf-nmrg-network-digital-twin-arch]
Zhou, C., Yang, H., Duan, X., Lopez, D., Pastor, A., Wu, Q., Boucadair, M., and C. Jacquenet, "Network Digital Twin: Concepts and Reference Architecture", Work in Progress, Internet-Draft, draft-irtf-nmrg-network-digital-twin-arch-05, , <https://datatracker.ietf.org/doc/html/draft-irtf-nmrg-network-digital-twin-arch-05>.
[gosdn]
"goSDN controller GIT repository", <https://netellyfish.org>.
[venv-manager]
"venv mananger implementation", <https://code.fbi.h-da.de/danet/gosdn/-/tree/master/applications/venv-manager>.
[containerlab]
"venv mananger implementation", <https://containerlab.dev/>.
[openconfig-ym]
"OpenConfig Yang Models repository", <https://github.com/openconfig/public/>.
[danet-gnmi-target]
"da/net gnmi-target", <https://code.fbi.h-da.de/danet/gnmi-target>.

Authors' Addresses

Marco Liebsch
NEC Laboratories Europe GmbH
Kurfuersten-Anlage 36
D-69115 Heidelberg
Germany
Martin Stiemerling
Darmstadt University of Applied Sciences
Schoefferstrasse 3
64295 Darmstadt
Germany
Neil Schark
Darmstadt University of Applied Sciences
Schoefferstrasse 3
64295 Darmstadt
Germany