Internet-Draft | NDT Challenges | July 2024 |
Liebsch, et al. | Expires 9 January 2025 | [Page] |
This document focuses on practical challenges associated with the design, creation and use of Network Digital Twins. According to the identified challenges, a set of suitable functional elements are described to overcome some of these challenges. Experiences from the design, development and evaluation of an SDN-based Network Digital Twin are described and conclude this document.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 9 January 2025.¶
Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
Digital Twins are well known from Industry applications, where, for example, a robotic system or production machine with sensors and actuators is digitally represented on a computer. The twin is used for monitoring and simulation purpose. In the meantime, the computation and use of a digital twin of a data network gets more traction and is discussed in the research community in particular in the view of automated network management, also in the view of beyond 5th Generation (5G) of a mobile communication system.¶
The creation of a Network Digital Twin (NDT) implies many challenges, incl. the continuous collection and computation of relevant data points from network components of a possibly large network topology with many network functions and nodes. As a key use case, the user of an NDT may apply changes, such as configurations or load, first to the digital representation of the network and evaluate and assess these changes' impact to the network operation in terms of performance, latency, or stability. Such simulation requires proper modelling of the network's information and behavior. The IRTF's Network Management Research Group (NMRG) is discussing various use cases and is working on a suitable reference architecture for NDTs [I-D.irtf-nmrg-network-digital-twin-arch].¶
This document focuses on practical challenges associated with the design, creation and use of NDTs. Furthermore, some technology directions and methodologies are discussed as possible solution to overcome some of these challenges. Experiences from the design, development and evaluation of an prototype implementation, that realizes an NDT of a Software-defined Network (SDN) completes this document.¶
A user of a Network Digital Twin may be a person, which uses a suitable frontend, or an automated process, such as a network OAM and Orchestration system. The user is probably aware of the detailed or an abstracted view of the physical network topology. The NDT Instance comprises all functions that are needed to monitor the physical network and to generate a digital representation of it, to expose a digital representation of the network to the user, take probe requests from a user to simulate changes in the digital representation of the network, and provide simulation results back to the user. To build a digital representation of the network, the NDT instance monitors network functions in the physical network and collects relevant data points. The resulting digital representation of the network is in the following denoted as Twin Model.¶
The following assumptions apply:¶
Figure 1 depicts generalized principles of generating and using a Network Digital Twin. The NDT Instance maintains a model of the physical network by means of constantly monitored data points. These data points are monitored on the relevant network nodes and network functions (NF) and transmitted directly or indirectly (via a network controller) to the NDT Instance. The current state of a network is exposed to the NDT user. Change requests in the network are issued to the NDT Instance. Such requests may, for example, tune a parameter in a network node, such as load, performance, routing table entry, or load balancing strategy. The simulation results may by provided back to the user to take further action.¶
The functional architecture of an NDT as described in [I-D.irtf-nmrg-network-digital-twin-arch] comprises northbound interfaces for users or applications to leverage the NDT's features, and southbound interfaces towards the physical network and its components to collect data points and enforce control decisions.¶
The complete life-cycle of an NDT Instance may imply the following challenges:¶
The IRTF's Network Management Research Group (NMRG) drafted and published a Reference Architecture for NDTs. This section complements the NMRG's NDT reference architecture without interference and adds a few functional elements in support of the subsequent discussion section. While Section 4.1 depicts an NDT reference architecture with the focus on selected functions and reference points, Section 4.2 discusses the roles of some functional elements and reference points.¶
The functional architecture of an NDT as described in [I-D.irtf-nmrg-network-digital-twin-arch] comprises northbound interfaces for users or applications to leverage the NDT's features, and southbound interfaces towards the physical network and its components to collect data points and enforce control decisions.¶
Figure 2 distills and depicts a few functional additional elements and reference points that may play a role in tackling the above challenges. The Network Twin Instance (T) comprises an interface (M-T) to a Management System, that initializes the Network Twin Instance, monitors its operation and applies changes as needed, e.g. the scope of the physical network from which a twin should be created, or changes in the physical network topology that need to be taken into account for the twin creation. Monitoring as well as the enforcement of changes in the physical network may be performed through a network controller (NW Controller), which is decoupled from the Network Twin Instance and used by the Network Twin Instance as well as by the NDT User (U) through the NW Controller's northbound interface (T-Ctrl, U-Ctrl). In case of a distributed and collaborative deployment, Network Twin Instances can utilize a federation interface (T-T). The role of the illustrated functional element and reference points in addressing the above challenges is discussed in the subsequent Section 4.2¶
The following list describes the role of the functional elements and reference points that are highlighted in Figure 2.¶
For large physical networks and their representation as NDT, a distributed deployment may be advantageous in many aspects. For small sub-networks, In-Network Computing could be a suitable enabler to run machine learning models that represent the NDT of a single or a few neighboring nodes. Training of such sub-network model may be performed on the node with In-Network Computing capabilities or by an external node.¶
A first architectural and practical challenge is I) how to extract information about the current state of the physical network to the Network Digital Twin (NDT) and II) how to feed back changes from the NDT to the physical network.¶
There are two extremes:¶
Further we distinguish between these types of NDTs (though there are more):¶
The static test system is the more simple type of the two systems. The overall goal is to be able to create a NDT instance on demand, which represents the physical network as good as possible. It lacks the need for dynamic up- dates in either direction, as only a snapshot is taken on-demand. By design, changes should only be applied to either the physical system or the Digital Twin if explicitly desired. In a realistic scenario, the NDT has to be updated to keep up with changes done on the physical network, but its sufficient, if the virtual environment gets recreated each time, as it should only represent a static state of the system at a given time. This approach may look overly simplified, but for a first proof-of-concept implementation it may be the best way, as the number of challenges and constraints is low.¶
The long-term goal is to have a NDT that is (optimally) in total sync with the physical environment or it represents the physical network as close as possible. This would call for an extension of the static test system to a live reacting system, i.e., automatic synchronization of state changes in the physical network to the NDT and triggered changes from the NDT back. However, the live reacting system will need live-data, i.e., real-time updates to the NDT, which may be challenging in terms of amount of information and the required granuality of the data, by the NDT instance.¶
In order to achieve a first usable implementation we did limit it to a fully-centralized approach for a static live test system. This focusses the implementation efforts to extracting configuration and runtime information from the physical network to the NDT and the feedback from the NDT. However, it of course neglects the real-time aspect, but getting a first workable solution was and is the goal.¶
For the implementation of the live static test system we used the goSDN SDN controller [gosdn] as network controller and on top if it a specialized application, i.e., the venv manager [venv-manager]. The emulation of the network elements can be done on virtual machines, but we did chose to rely on containerlab [containerlab]¶
Figure 3 shows the principle architecture of the static test system. The roles of, as described in Section 4.2, Interpreter, federation, and verification are currently not implemented, but will follow in the future. It is distinguished in the physical network and the NDT part. From the structure both parts are identically:¶
The M-T and U-T reference points are exposed by the goSDN controller for the NDT instance.¶
The goSDN controller uses gNMI with Yang models, namely a subset of OpenConfig [openconfig-ym], to represent the configuration and state data of the network elements and thus the whole network. The gNMI interfaces, with the respective data models, is used to implement the Crl_mon and Ctrl_conf interfaces, as well as, the collection of data points. The state of devices is represented in the data models within the controller.¶
For testing purposes we did use Arista's CEOS docker images within containerlab, as a commercial counter part, for network elements. Further, we have developed our own SDN agent, the gnmi-target [danet-gnmi-target], which runs on Linux (tested for debian and ubuntu) and in the future also on FreeBSD 13.2-RELEASE and newer.¶
The current implementation of a NDT instance is usable for a static test system. The implementation works with one vendor specific operating system and our own gnmi-target as SDN agent on a network element. However, the crucial point will be if and how the required information can be extracted from the physical network. This is usually less a conceptual issue, but more a practical question what is accessbile with more less standardized interfaces.¶
The current aim was not to have full-fledged real-time live reacting system, but to do the first steps, learn, and then move on towards more features, such as live feeds.¶
Also self-learning behavioral model of the network elements was not developed, as we rely on virtualized versions of the network element's operating system, such as, Aristas CEOS or our own SDN agent on plain Linux. This neglects of course any impact of the hardware of a real network element, e.g., port or forwarding engine behavior.¶
This document does not have IANA considerations.¶
Security considerations are to be done in future revisions of this memo.¶
However, one can imagine that a NDT instance with a full copy of the configuration and state information of a complete network is a huge trove for any attacker.¶
Neil Schark is partially funded by the German BMBF DemoQuanDT project. Martin Stiemerling is partially funded by the German BSI ADWISOR5G project.¶