Internet-Draft | TTE | December 2024 |
Barth, et al. | Expires 6 June 2025 | [Page] |
Conventional traffic engineering approaches for resource management used by RSVP-TE and SR-TE often leverage estimates of the ingress traffic demands, during path placement. Unforeseen and/or dynamic events, can skew these estimates by significant enough margins to result in unexpected network congestion. Recomputed paths that address the new demands may take a considerable amount of time, leaving the network in a sub-optimal state for far too long.¶
This document proposes one mechanism that can avert congestion on a real-time basis.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 6 June 2025.¶
Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
Conventional traffic engineering approaches for resource management used by RSVP-TE [RFC3209] and SR-TE [RFC8402] often leverage estimates of the ingress traffic demands, during path placement. Unforeseen and/or dynamic events, can skew these estimates by significant enough margins to result in unexpected network congestion. Recomputed paths that address the new demands may take a considerable amount of time, leaving the network in a sub-optimal state for far too long.¶
Ideally, the network should be able to react to unforeseen congestion events and attempt to distribute load automatically, avoiding as much congestion as possible. Such mechanisms should be usable to compliment the conventional traffic engineering approaches.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. These words may also appear in this document in lower case as plain English words, absent their normative meanings.¶
Various economic factors of operating real-world networks at scale require network operators to run their networks at relatively high utilizations. While legacy shortest-path routing is helpful in basic path placement, it does not consider the actual traffic demands on the network, resulting in highly utilized paths that can quickly become congested.¶
To address this, network operators have adopted various traffic engineering (TE) techniques whereby an input to the path placement for traffic engineering tunnels utilizes a bandwidth prediction and/or a demand matrix, with bandwidth requirements for the major sources and destinations in the network. These predictions are typically estimates based on historical telemetry and capture network and/or TE tunnel usage.¶
In a more sophisticated view of network demand, a bandwidth requirement can be more accurately viewed as a function over time, with the demand waxing and waning, frequently in a diurnal pattern driven by human interaction. Periodic demands may be driven by complex processes, sometimes scheduled, sometimes in reaction to external events. The salient point is that the elements in the demand matrix are best regarded as time series estimates. As a result, a traffic engineering solution is a set of paths that may themselves vary over time, requiring a periodic optimization not only for a static demand but at several points along the time series.¶
This problem is further compounded by the dynamic changes to demand that were not anticipated in the estimates. Traffic demands may spike or ebb without warning. A singer may announce a concert, causing an unexpected demand as the ticket vendor receives a wave of traffic. Events on social media can cause an unexpected storm of traffic to the media's servers. New product announcements can cause streaming sites to become overloaded. The results of which MAY lead to reoptimization frequencies higher than is practial for a path computing entity.¶
Network resources are also not always consistently predictable. BGP next-hops change, links fail, hardware fails, and software fails. While traffic engineering tools can do contingency analysis and creating protection paths that should mitigate potential congestion, this is typically only done for single failures. In the rare event of multiple failures, traffic engineering would be forced to completely recalculate a solution, which might not be available for unacceptable time periods.¶
All of this leaves network operators in the very uncomfortable situation that in unforeseen circumstances, they may find themselves with a congested network, unable to meet Service Level Objectives (SLOs) and potentially in violation of Service Level Agreements (SLAs). Even more confounding is that this situation could happen even if there is sufficient capacity in the network to support the actual demand, but it cannot be implemented until a global optimization computation completes.¶
To address this issue, we propose an alternate strategy: real-time tactical traffic engineering (TTE). This would be a set of mechanisms that would allow the network to react in real-time to avert congestion and optimize traffic flow. This would work in conjunction with traditional traffic engineering bandwidth management techniques, alleviating problems while optimal traffic assignment is being recomputed.¶
Real-time traffic engineering is a practical approach to a thorny problem: full blown optimality is very hard and can take a long time. A network that can organically, dynamically, and quickly adapt may provide a significant performance improvement while optimal traffic engineering path placement is in-progress.¶
TTE allows the network to dynamically distribute load if congestion is anticipated. The goal is to simply shift load away from congested links and then, if the link congestion abates, shift traffic back.¶
If traffic is on an alternate path (i.e., a TTE path), then it has the potential to create congestion elsewhere in the network. Bringing the traffic back to its original or recently optimized path resulting in a network more aligned with its original, near-optimal traffic pattern.¶
For TTE to operate effectively, it needs to understand when a link is nearing congestion and conversely, when congestion has abated. Each link that is protected by TTE is sampled periodically for its current utilization. The boundaries of acceptable utilization are defined by high and low utilization thresholds. To avoid oscillation, the link must be outside of acceptable utilization for some consecutive number of periodic samples before any action is performed. Further, the high and low thresholds need to be sufficiently far apart such that small load traffic changes will not cause a shift from high to low or low to high.¶
TTE manipulates traffic flows by changing the IPv4 or IPv6 prefixes found in the Forwarding Information Base (FIB), or by changing label entries found the Label Forwarding Information Base (LFIB). For the purposes of this document, both IP prefixes and labels constitute 'flows'. A flow may have one or more paths to its destination.¶
A number of mechanisms exist that potentially create backup paths for a single flow. Typically, these backup paths are used in case of a failure of the original path and allow for a very rapid transfer of traffic from the failed link to the backup path. For this rapid transfer to work, the backup paths are placed in the forwarding table and then marked as being in a backup and inactive state.¶
The key properties of a backup path are that it cannot cause forwarding loops and that it avoids the same link that the primary path is using. TTE makes use of backup paths by turning them into active paths in parallel with the primary path. This creates an Equal Cost Multi-Path (ECMP) situation where some traffic will be forwarded down each of the active paths. In most implementations, ECMP is implemented by hashing of the traffic, so each path will have roughly an equal share of the traffic, however, statistical anomalies are not impossible.¶
Potential backup paths can be computed by several mechanisms:¶
TI-LFA, RLFA: Paths computed by Topology Independent Loop-free Alternates (TI-LFA) [I-D.ietf-rtgwg-segment-routing-ti-lfa] and Remote Loop-Free Alternate (RLFA) [RFC7490] can be used as backup paths.¶
Fast Reroute: Paths computed by RSVP Fast Reroute [RFC4090] can be used as backup paths.¶
Unequal Cost Multi-Path: Interior Gateway Protocols (IGPs) that compute Unequal Cost Multi-Path (UCMP) paths can be used as backup paths.¶
PCE: Paths computed by a Path Computation Element (PCE) [RFC5440] can be used as backup paths.¶
When TTE selects a flow and makes appropriate data plane changes so that traffic is balanced between the primary path(s) and the backup path(s), we say that the flow is 'active'. Similarly, when TTE shifts traffic away from its backup path(s) back to the primary path(s), we say that the flow is 'inactive' or 'deactivated'.¶
In a carefully traffic engineered network, any change to the traffic flow may have an impact in multiple places on the network. When a flow is activated, it may shift traffic to an entirely different path, not just around a single link, and the change may result in congestion along the new path. Networks that are engineered to support protection against link failures should already take this into account. However, even this backup capacity can be saturated if TTE activates enough flows on a variety of links. If TTE is also deployed along the backup path, it may be triggered to activate further flows, further distributing the traffic load.¶
In contrast to re-using traditional backup paths that follow a cumulative least cost metric to a merge point, in the case of RSVP Fast Reroute, or a next-best shortest-path, in the case of Loop-Free Alternatives, mechanisms exist to calculate paths that consider unreserved-bandwidth or residual bandwidth. The resulting TTE specific tunnel(s) MAY be less likely to result in creating secondary congestion.¶
The goal of TTE is to locally address congestion as either a transient condition or until a more optimal global reoptimization can finish. RSVP-TE networks offer an additional option. When congestion is detected and flows are 'activated' an RSVP Labels Switch Router (LSR)could trigger ingress routers to reevaluate the input load of LSPs traversing the congested link. As per RFC4873, a code for "alarm" with sub-code "congestion" could be sent via Resv using AlarmSpec objectif indicating a congestion threshold has been crossed. The Alarm can in turn be 'cleared' when the congestion is abated.¶
TTE is necessarily stochastic. Since we cannot predict flow utilization (and thus link utilization) with absolute certainty, any flow selection is necessarily an estimate of future behavior. TTE assumes that the flows on the link have a typical Gaussian distribution and that there not many 'elephant' flows that have requirements far above the mean and not many 'mice' flows that have requirements far below the mean. TTE also assumes that there are enough flows that the Law of Large Numbers applies. Our simulation results suggest that even 50 flows with a good Gaussian distribution is ample to meet this requirement.¶
Ideally, TTE would only manipulate long-lived flows, as activating a short-lived flow would be ineffective at redirecting bandwidth. However, knowing the lifetime of any specific traffic stream is effectively impossible and the lifetime of an aggregated flow is also unknowable. Historical or policy information could be added to the approaches described here, and this is a matter for further study.¶
When a link is outside of its bandwidth thresholds, TTE must select certain paths to activate or deactivate. TTE can select one or more paths from one or more flows. Which paths and flows to select is a critical decision that strongly affects how quickly TTE converges to a solution where the link bandwidth is within its thresholds. Ideally, TTE selects the right paths and flows to activate to create a 'working set' that avoids congestion.¶
When a link is above its high threshold, then the set of candidate flows for activation are those flows on the link that have inactive paths. Similarly, if a link is below its low threshold, then the set of candidate flows are those that have a backup path that has been previously activated. The task of flow selection is to choose the set of candidates to activate or deactivate to bring the link back within its bandwidth thresholds.¶
In the following sections, we discuss several different possible strategies for flow selection. There are a variety of trade offs between these strategies and the selection of a specific strategy is implementation dependent. Many more strategies are possible.¶
An implementation may employ policy to restrict the set of flows that may be activated by TTE.¶
One approach to flow selection is to randomly select a flow from the set of candidate flows. This strategy has a significant advantage in that it does not require any tracking of per-flow bandwidth, and thus has minimal state and overhead requirements. The disadvantage of this approach is that it may not converge very rapidly. If the discrepancy between current bandwidth and target bandwidth is large, it may take many iterations and many flows may have to be activated before sufficient bandwidth has been moved.¶
In this strategy, overshoot is a distinct possibility. That is, a flow that is selected and activated/deactivated may shift more bandwidth than is required and may cause the link to shift from one extreme to another. However, this is not seriously problematic. Subsequent iterations will correct this and shift bandwidth back. Since each flow selection is an independent event, the odds of continually selecting the same flow is inconsequential, and thus, the odds of persistent oscillation are minimal.¶
More sophisticated strategies are possible, but do require that we track the bandwidth utilization of each flow.¶
An alternate selection strategy is to try to select a set of flows that will shift the link bandwidth utilization from its current level to a more comfortable level. We define the 'target bandwidth' as the average of the high and low bandwidth thresholds. The objective of flow selection is then to select flows that, in aggregate, amount to the difference between the target bandwidth and the current bandwidth. We call this the 'target change'.¶
Flows with a bandwidth bigger than the target change are then effectively elephant flows and should be discarded from the candidate pool. Flows are iteratively randomly selected from the remaining pool until the target change is satisfied.¶
Intuitively, this seems like an effective strategy, but our simulations indicate that it has poor performance. In some situations, there are simply not enough candidates in the pool to meet the target change. As a result, this strategy may sometimes not converge to a solution. From this, we observe that it may sometimes be best to intentionally overshoot the target by selecting an elephant and then converge by an opposing selection of other flows in the next iteration.¶
A similar strategy is to first exclude elephant flows and then select the largest remaining candidates to meet the target change. This has the benefit that it moves fewer flows than purely random selection. It still suffers from not selecting elephant flows if one is necessary.¶
Moving fewer flows is beneficial because it is less disruptive to network traffic. Every time a flow is moved, transport protocols may detect a change in latency and thus a change in round-trip time (RTT). This may be misinterpreted as a congestion event and lead to congestion avoidance. It would be beneficial to minimize this impact.¶
The opposite strategy is to first exclude elephant flows and then select the smallest remaining candidates. This has the unfortunate issue of moving the maximum number of flows and retains the problem of not moving elephants when they are necessary.¶
Analogies to memory allocation problems suggest that selecting the set of candidate flows that most closely total the target change would be a possible strategy. Unfortunately, the number of possibilities is the power set of the number of candidate flows. This can quickly explode and become computationally intractable. This strategy was not simulated.¶
This strategy is a derivative of the maximum fit strategy. In this strategy, if the target change cannot be satisfied by selecting all of the non-elephant candidates, then the smallest elephant is selected instead. This strategy showed excellent results.¶
The following people have substantially contributed to this document:¶
Tarek Saad, Hari Kotni, Minto Jeyananth, Raj Chetan Boddireddy, Shraddha Hegde, Vishnu Pavan Beeram, Chandra Ramachandran¶
The authors would like to thank Jim Uttaro for his comments.¶
This document makes no requests of IANA.¶
This section records the status of known implementations of the procedures defined by this specification at the time of posting of this Internet-Draft and is based on a proposal described in [RFC7942]. The description of implementations in this section is intended to assist the IETF in its decision processes in progressing drafts to RFCs. Please note that the listing of any individual implementation here does not imply endorsement by the IETF. Furthermore, no effort has been spent to verify the information presented here that was supplied by IETF contributors. This is not intended as, and must not be construed to be, a catalog of available implementations or their features. Readers are advised to note that other implementations may exist.¶
According to [RFC7942], "this will allow reviewers and working groups to assign due consideration to documents that have the benefit of running code, which may serve as evidence of valuable experimentation and feedback that have made the implemented protocols more mature. It is up to the individual working groups to use this information as they see fit".¶
Tactical Traffic Engineering is a mechanism that shifts traffic across pre-computed paths. It introduces no new protocols and operates completely locally on a single system. As such, it creates no new security issues.¶