cats                                                             N. Tran
Internet-Draft                                                    Y. Kim
Intended status: Informational                       Soongsil University
Expires: 26 September 2025                                 25 March 2025


  Additional CATS requirements consideration for Service Segmentation-
                           related use cases
               draft-dcn-cats-req-service-segmentation-01

Abstract

   This document discusses possible additional CATS requirements when
   considering service segmentation in related CATS use cases such as
   AR-VR and Distributed AI Training

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 26 September 2025.

Copyright Notice

   Copyright (c) 2025 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.





Tran & Kim              Expires 26 September 2025               [Page 1]

Internet-Draft        cats-req-service-segmentation           March 2025


Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Terminology used in this draft  . . . . . . . . . . . . . . .   3
   3.  Differences comparison between Normal and Service Segmentation
           CATS scenarios  . . . . . . . . . . . . . . . . . . . . .   3
   4.  Possbile Additional CATS Requirements . . . . . . . . . . . .   4
   5.  Example 1: AR-VR Hologram Sequence Subtask Segmentation . . .   4
     5.1.  Differences caused by Service Segmentation  . . . . . . .   6
     5.2.  Additional CATS requirements Explanation  . . . . . . . .   6
   6.  Example 2: Federated Learning model training Parallel Subtask
           Segmentation  . . . . . . . . . . . . . . . . . . . . . .   7
     6.1.  Differences caused by Service Segmentation  . . . . . . .   9
     6.2.  Additional CATS requirements Explanation  . . . . . . . .  10
   7.  Normative References  . . . . . . . . . . . . . . . . . . . .  10
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  11

1.  Introduction

   Service segmentation is a service deployment option that splits the
   service into smaller subtasks which can be executed in parallel or in
   sequence before the subtasks execution results are aggregated to
   serve the service request
   [draft-li-cats-task-segmentation-framework].  It is an interesting
   service deployment option that is widely considered to improve the
   performance of several services such as AR-VR or Distributed AI
   Training which are also key CATS use cases
   [draft-ietf-cats-usecases-requirements].  For example, according to
   [Ericssion-holographic-5g], an AR holographic communication service
   can be implemented as a pipeline of pre-processing, encoding/decoding
   and rendering subtasks.  These subtasks can have multiple instances
   running over several edge computing sites.  Meanwhile, federated
   learning model training service can be implemented in a hierarchical
   manner according to [hierfedml-ieee-parallel-distributed-system].  In
   this case, the federated learning global model aggregator service
   combines the local model training results from multiple worker model
   aggregators and computing devices.  Different worker model aggregator
   and device combinations can affect the global model training
   performance.  Hence, a desirable CATS system should consider these
   different subtask combinations in its design.

   This document discusses the differences of applying CATS in this
   service segmenatation scenario compared with the normal CATS scenario
   where a service instance is not segmented.  Based on the differences,
   possible additional CATS requirement are proposed and analyzed via
   examples of AR-VR and Distributed AI Training CATS use cases.





Tran & Kim              Expires 26 September 2025               [Page 2]

Internet-Draft        cats-req-service-segmentation           March 2025


2.  Terminology used in this draft

   This document re-uses the CATS component terminologies which has been
   defined in [draft-ietf-cats-framework].  Additional definitions
   related to service segmentation are:

   Service subtask: An offering that performs only a partial
   funtionality of the original service.  The full functionalities of
   the original service is realized by aggregating the results from all
   of its divided service subtasks.  Service subtasks can run in
   parallel or in sequence.

   Service subtask instance: When a service is segmented into multiple
   service subtasks, each service subtask might have multiple instances
   that performs the same partial functionality of the original service.

3.  Differences comparison between Normal and Service Segmentation CATS
    scenarios

   In the normal CATS scenario:

   *  The CATS system objective is selecting an optimal service instance
      to serve a service request

   *  Different candidate CATS paths are caused by: service instances'
      computing and network resources status.

   *  The CATS system delivers the service request to the determined
      optimal service instance

   In the Service Segmenatation CATS scenario:

   *  The CATS system objective is selecting an optimal combination of
      all segmented service subtasks to serve a service request.  One
      optimal instance of each service subtask is involved.

   *  Different candidate CATS paths are caused by: service subtask
      instances' computing and network resources status, and possible
      different ways of service segmentation (e.g. a service is
      segmented into different number of subtasks)

   *  The CATS system delivers the service request to the determined
      optimal combination of service subtask instances in correct order
      (sequence/parallel) and subtask composition.

   *  The might be no original service instance in the network, only
      service subtask instances are available.




Tran & Kim              Expires 26 September 2025               [Page 3]

Internet-Draft        cats-req-service-segmentation           March 2025


4.  Possbile Additional CATS Requirements

   To handle the differences mentioned above, this document proposes the
   following additional CATS Requirements:

   *  R1: A CATS system should provide a method to distinguish different
      CATS candidate paths corresponding to different service subtask
      instance combinations.

   *  R2: A CATS system should provide a method to deliver the service
      request to the determined optimal service subtask instance
      combination in correct order and correct composition.

   *  R3: A CATS system should provide a method to map the service
      request to corresponding segmented subtasks if the original
      service is not existed, only subtask instance endpoints are
      available.

5.  Example 1: AR-VR Hologram Sequence Subtask Segmentation
































Tran & Kim              Expires 26 September 2025               [Page 4]

Internet-Draft        cats-req-service-segmentation           March 2025


                      Request AR hologram
                          +--------+
                          | Client |
                          +---|----+
                              |
                      +-------|-------+
                      |    Service*** | ***R3: Map request
                      |    Request    |        to decode + render
                      |  Segmentation |        subtasks
                      |    Component  |
                      +-------|-------+
 **R2: Route request to       |            *R1: Different subtask combination
       the determined         |                 CATS cost (Decode + Render)
       subtask sequence       |                 - Decode Site 1/3/4 &
                        +-----|-----+------+    - Render Site 1/2/3
+-----------------------|   CATS**  |C-PS* |---------------------+
|       Underlay**      | Forwarder |------+          +-------+  |
|    Infrastructure     +-----|-----+                 |C-NMA* |  |
|                             |                       +-------+  |
|       +---------------+-----+---------+---------------+        |
|      3ms             4ms             3ms             2ms       |
|    nw delay        nw delay        nw delay        nw delay    |
|       |               |               |               |        |
|       |               |               |               |        |
|       |      2ms      |      2ms      |      3ms      |        |
|       |   nw delay    |    nw delay   |    nw delay   |        |
|       | /-----------\ | /-----------\ | /-----------\ |        |
+-+-----|/----+---+----\|/----+---+----\|/----+---+----\|-----+--+
  |   CATS**  |   |  CATS**   |   |   CATS**  |   |   CATS**  |
  | Forwarder |   | Forwarder |   | Forwarder |   | Forwarder |
  +-----|-----+   +-----|-----+   +-----|-----+   +-----|-----+
        |               |               |               |
  +-----|-----+   +-----|-----+   +-----|-----+   +-----|-----+
  |+---------+|   |+---------+|   |+---------+|   |+---------+|
  ||  Decode ||   || Render  ||   || Decode  ||   ||  Decode ||
  |+---------+|   |+---------+|   |+---------+|   |+---------+|   +---+---+
  | 3ms delay |   | 3ms delay |   | 5ms delay |   | 8ms delay |   |C-SMA* |
  |           |   |           |   |           |   |           |   +---+---+
  |+---------+|   |           |   |+---------+|   |           |       |
  || Render  ||   |           |   || Render  ||   |           |       |
  |+---------+|   |           |   |+---------+|   |           |       |
  | 9ms delay |   |           |   | 7ms delay |   |           |       |
  +-----|-----+   +-----|-----+   +-----|-----+   +-----|-----+       |
        +---------------+---------------+---------------+-------------+
     Service         Service         Service        Service
      Site 1          Site 2          Site3          Site 4





Tran & Kim              Expires 26 September 2025               [Page 5]

Internet-Draft        cats-req-service-segmentation           March 2025


    Figure 1: Example of additional CATS requirement in an AR use
                             case example

   Figure 1 discusses the additional CATS requirements in an AR hologram
   service use case referenced from [Ericssion-holographic-5g].  This
   example service is responsible for returning a processed 3D hologram
   upon receiving a request from an AR client (e.g. AR glass).  The AR
   service is segmented into 2 subtasks: Decode and Render running in
   that respective sequence.  These subtasks have multiple instances
   running in different service sites.

5.1.  Differences caused by Service Segmentation

   *  The CATS system objective is selecting an optimal sequence of
      Decode and Render service subtask instances.  One optimal instance
      of Decode and Render service subtasks should be selected from
      their candidate instances running in different service sites.

   *  Decode and Render service subtasks instances running in different
      service sites have different expected request processing delay
      caused by the current computing resources status at each service
      site.  The network delay from client to each service site are also
      different.

   *  Once the optimal Decode and Render service subtask instance are
      determined, the CATS system should route the service request to
      the determined Decode instance first, followed by the determined
      Render instance.

   *  The original AR hologram service is not available in the network,
      only AR service subtask instances are available.  These service
      subtask instances are transparent to user client.  User client
      might request the AR hologram service via its service ID.

5.2.  Additional CATS requirements Explanation

   Considering applying CATS in this example scenario, the additional
   CATS requirements can be explained as follows:

   R1: A CATS system should provide a method to distinguish different
   CATS candidate paths corresponding to different service subtask
   instance combinations

   *  In this case, each candidate CATS path is represented by the
      combination one Decode service instance and one Render service
      instance from the available instances at 4 different service
      sites.  There are multiple combination options such as Decode
      instance at Service Site 1 and Render instance at Service Site 2,



Tran & Kim              Expires 26 September 2025               [Page 6]

Internet-Draft        cats-req-service-segmentation           March 2025


      Decode instance at Service Site 4 and Render instance at Service
      Site 3, both Decode and Render instances at the same Service Site
      1 or 3, etc.  For each subtask combination, the computing CATS
      metrics of the Decoding and Rendering instance, along with the
      network CATS metrics of the corresponding Service Sites (between
      client and site and between sites) should be aggregated.  For
      example, in figure Figure 1, the combination of Decode instance at
      Service Site 1 and Render instance at service site 2 has a total
      CATS expected delay of 15ms (3ms of computing delay at each
      instance and 9ms network delay between cilent and Service Sites)

   R2: A CATS system should provide a method to deliver the service
   request to the determined optimal service subtask instance
   combination in correct order and correct composition.

   *  In this case, the CATS Forwaders and the underlay infrastructure
      should provide a mechanism to route the client AR hologram service
      request follow the optimal combination sequence determined by the
      CATS system.  For example, if the combination of Decode instance
      at Service Site 1 and Render instance at Service Site 2 is
      selected, the request should be routed in the correct order via
      the CATS Forwaders at client side, Service Site 1, then Service
      Site 2 before return the final response back to the client.
      Segment Routing is a example method to achieve this requirement by
      routing the request via a list of routing segments
      ([draft-ietf-spring-sr-service-programming],
      [draft-lbdd-cats-dp-sr]).

   R3: A CATS system should provide a method to map the service request
   to corresponding segmented subtasks if the original service is not
   existed, only subtask instance endpoints are available.

   *  In this case, because there are no full AR hologram service, the
      service can only be realized by chaining its subtasks.  Hence, the
      CATS system should provide a component that can segment the
      service request into the corresponding subtasks and return the
      response from these subtasks to the client.  The Task Segmentation
      Module discussed in [draft-li-cats-task-segmentation-framework] in
      an example.

6.  Example 2: Federated Learning model training Parallel Subtask
    Segmentation









Tran & Kim              Expires 26 September 2025               [Page 7]

Internet-Draft        cats-req-service-segmentation           March 2025


                       Request FL model
                          +--------+
                          | Client |
                          +---|----+
                              |        **R2: Different subtask combination
**R1: Ask Global Aggregator   |        CATS cost (Global + Worker + Device)
to use the determined         |              - Worker 1/2/1+2/3+4/3+4+5...
combination             +-----|-----+------+ - Device 1/2/1+2+3/4+5+...
+-----------------------|    CATS   |C-PS**|---------------------+
|                       | Forwarder |------+          +-------+  |
|      Underlay         +-----|-----+                 |C-NMA**|  |
|   Infrastructure            |                       +-------+  |
|              +--------------+-----------------+                |
|             3ms                              4ms               |
|           nw delay                         nw delay            |
|              |                                |                |
+--------+-----|-----+--------------------+-----|-----+----------+
         |    CATS   |                    |    CATS   |
         | Forwarder |                    | Forwarder |
         +-----|-----+                    +-----|-----+
         +-----|-----+                    +-----|-----+
         |   Global  |     +-------+      |   Global  |
         | Aggregator|     |C-SMA**|      | Aggregator|
         | Instance 1|     +-------+      | Instance 2|
         +-|------|--+                    +-/----|----\
           |      |                        /     |     \
Different network delay between different Worker and Global Aggregators
          /        \                      /      |             \
+--------/-+  +-----\----+     +---------/+  +---|------+  +----\-----+
|  Worker  |  |  Worker  |     |  Worker  |  |  Worker  |  |  Worker  |
|Aggregator|  |Aggregator|     |Aggregator|  |Aggregator|  |Aggregator|
|Instance 1|  |Instance 2|     |Instance 3|  |Instance 4|  |Instance 5|
|          |  |          |     |          |  |          |  |          |
|now serve:|  |now serve:|     |now serve:|  |now serve:|  |now serve:|
|-3 models |  |-2 models |     |-3 models |  |-1 model  |  |-2 models |
|-5 devices|  |-7 devices|     |-4 devices|  |-6 devices|  |-8 devices|
+-----|----+  +----|-----+     +----|-----+  +----|-----+  +----|-----+
      |            |                |             |             |
Different network delay between different devices and Worker Aggregators
      |            |                |             |             |
+-----|------------|----------------|-------------|-------------|-----+
|                        Local Training Devices                       |
|              (Device 1, Device 2, ......., Device N)                |
|                 (Different computing capabilties)                   |
+---------------------------------------------------------------------+






Tran & Kim              Expires 26 September 2025               [Page 8]

Internet-Draft        cats-req-service-segmentation           March 2025


        Figure 2: Example of additional CATS requirement in a
           Hierarchical Federated Learning use case example

   Figure Figure 2 discusses the additional CATS requirements in an
   Federated Learning Model Training service use case referenced from
   [hierfedml-ieee-parallel-distributed-system].  This example service
   is responsible for returning a trained federated learning model upon
   receiving a request from a client.  The federated learning service is
   implemented in a hierarchical manner.  The service endpoint for
   receiving client request is a Global federated learning Aggregator
   which can have multiple service instances.  Upon receiving a trained
   model request, one or multiple Worker Aggregators and Local Training
   Devices are assigned to the Global Aggregator to train the model in a
   parallel manner.  The number of Training Devices assigned for each
   Worker Aggregator is also varied.  Each Worker Aggregator aggregates
   the local model parameters from its assigned Local Training devices
   and the Global Aggregator aggregates the parameters from its
   associated Worker Aggregators to create the global model for replying
   the client request.

6.1.  Differences caused by Service Segmentation

   *  The CATS system objective is selecting a combination of optimal
      Global Aggregator, Worker Aggregators and Local Training Devices
      to train a model.  One optimal Global Aggregator instance, one or
      multiple Worker Aggregators assigned for the Global Aggregator,
      and one or multiple Local Training Devices for each Worker
      Aggregator must be determined.

   *  Different Worker Aggregator and Local Training Devices have
      different model training performance caused by their associated
      computing, network resources, and current number of models and
      devices they are handling.  Different number of Local Traning
      Devices per Worker Aggregator and different number of Worker
      Aggregators per Global Aggregator also cause different model
      training performances.

   *  Once the combination of optimal Aggregators and Local Training
      Devices are determined, the determined Worker Aggregators and
      Local Training Devices are assigned to the determined Global
      Aggregator instance.  The CATS system route the service request to
      the determined Global Aggregator instance.

   *  The Worker Aggregators and Local Training Devices are transparent
      to user client.  User client might send a model training request
      via the Global Aggregator service ID.





Tran & Kim              Expires 26 September 2025               [Page 9]

Internet-Draft        cats-req-service-segmentation           March 2025


6.2.  Additional CATS requirements Explanation

   Considering applying CATS in this example scenario, the additional
   CATS requirements can be explained as follows:

   R1: A CATS system should provide a method to distinguish different
   CATS candidate paths corresponding to different service subtask
   instance combinations

   *  In this case, there are multiple combination of Worker Aggregator
      and Local Training Devices that can be assigned for a single
      Global Aggregator instance.  Hence, selecting only a Global
      Aggregator service instance is not enough.  Different number of
      Worker Aggregators per a Global Aggregator and different number of
      Training Devices per Worker Aggregators can cause different Global
      Aggregator model training performances.  Besides, the computing
      resources (CPU/GPU/memory/etc.) between Devices and between Worker
      Aggregators are also different.  For Worker Aggregator, apart from
      the computing resources, the current number of serving models and
      devices can also affect the model aggregation performance such as
      congestion.  Network conditions between Devices and Aggregators
      are also varied.  Hence, CATS metrics should reflect the computing
      and network resource status of each Device and Aggregator.  Each
      CATS candidate path should be represented by a metric aggregation
      of a Global Aggregator instance, one or multiple Worker Aggregator
      instances, and their associated Local Training Devices.

   R2: A CATS system should provide a method to deliver the service
   request to the determined optimal service subtask instance
   combination in correct order and correct composition.

   *  In this case, the CATS Path Selector should inform the CATS
      determined Global Aggregator instance or the hierarchical
      federated learning orchestration entity to use the combination of
      chosen Global, Worker Aggregator instances and Local Training
      Devices to train the federated learning model.

   R3: A CATS system should provide a method to map the service request
   to corresponding segmented subtasks if the original service is not
   existed, only subtask instance endpoints are available.

   *  In this case, this requirement is not necessary because the full
      original service (Global Aggregator) is existed and serve the
      request.  The CATS system only handles routing between client and
      the Global Aggregator instances.

7.  Normative References




Tran & Kim              Expires 26 September 2025              [Page 10]

Internet-Draft        cats-req-service-segmentation           March 2025


   [draft-ietf-cats-framework]
              Li, C., et al., "A Framework for Computing-Aware Traffic
              Steering (CATS)",  draft-ietf-cats-framework, February
              2025.

   [draft-ietf-cats-usecases-requirements]
              Yao, K., et al., "Computing-Aware Traffic Steering (CATS)
              Problem Statement, Use Cases, and Requirements",  draft-
              ietf-cats-usecases-requirements, February 2025.

   [draft-ietf-spring-sr-service-programming]
              Ed, F. Clad., et al., "Service Programming with Segment
              Routing",  draft-ietf-spring-sr-service-programming,
              February 2025.

   [draft-lbdd-cats-dp-sr]
              Li, C., et al., "Computing-Aware Traffic Steering (CATS)
              Using Segment Routing",  draft-lbdd-cats-dp-sr, January
              2025.

   [draft-li-cats-task-segmentation-framework]
              Li, C., et al., "A Task Segmentation Framework for
              Computing-Aware Traffic Steering",  draft-li-cats-task-
              segmentation-framework, December 2024.

   [Ericssion-holographic-5g]
              "HOLOGRAPHIC COMMUNICATION IN 5G NETWORKS", May 2022,
              <https://www.ericsson.com/49a8b1/assets/local/reports-
              papers/ericsson-technology-review/docs/2022/holographic-
              communication-in-5g-networks.pdf>.

   [hierfedml-ieee-parallel-distributed-system]
              Xu, Z., Zhao, D., Liang, W., Rana, O., Zhou, P., and M.
              Li, "HierFedML: Aggregator Placement and UE Assignment for
              Hierarchical Federated Learning in Mobile Edge Computing",
              January 2023, <https://doi.org/10.1109/TPDS.2022.3218807>.

Authors' Addresses

   Minh-Ngoc Tran
   Soongsil University
   369, Sangdo-ro, Dongjak-gu
   Seoul
   06978
   Republic of Korea
   Email: mipearlska1307@dcn.ssu.ac.kr





Tran & Kim              Expires 26 September 2025              [Page 11]

Internet-Draft        cats-req-service-segmentation           March 2025


   Younghan Kim
   Soongsil University
   369, Sangdo-ro, Dongjak-gu
   Seoul
   06978
   Republic of Korea
   Phone: +82 10 2691 0904
   Email: younghak@ssu.ac.kr











































Tran & Kim              Expires 26 September 2025              [Page 12]