Path computation for IP network optimization
As services converge on IP and move into the cloud, providers need to employ more advanced path computation capabilities to ensure that IP traffic is routed efficiently. This is critical in order to optimize returns on network investments.
Part of the challenge lies in the distributed and autonomous operation of IP routing protocols. Constrained Shortest Path First (CSPF) routing protocols such as OSPF and IS-IS apply the Dijkstra algorithm to determine the shortest path between source and destination. As a result, links on popular routes can easily become congested causing network hotspots, while other links remain underutilized leaving valuable capacity stranded.
Get more bucks out of your bandwidth
Optimizing network utilization and service revenues is a bin-packing problem that is NP-hard in computational complexity theory. Network capacity is a finite resource, and as links fill up it becomes progressively harder to fit more path requests. Network routing protocols such as OSPF and IS-IS achieve fast route convergence, but routers have limited CPU and memory resources available to compute optimal paths.
A centralized Path Computation Element (IETF RFC 4655) has no such constraints on available CPU, memory, or route convergence times. It can operate on-line and employ sophisticated path computation algorithms with dynamic link cost metrics to assist network-embedded routing protocols in finding paths that more efficiently use available link capacity.
Introducing a centralized Path Computation Element
A Path Computation Element (PCE) applies software-defined networking (SDN) paradigms to separate path computation and path signaling functions. This gives operators more control over their networks, adds the ability to apply custom policies, and addresses multi-vendor integration considerations.
The PCE can be considered as part of a centralized control plane that complements the distributed, network-embedded control plane. It uses various open and standard based protocols such as the PCE communication protocol (PCEP – RFC 5440) to interface with underlying routing equipment and communicate route decisions (Figure 1).
Network topology, link state information, and deployed and allocated capacity are tracked in a central traffic engineering database (TED). A centralized PCE can apply sophisticated path computation algorithms to help optimize distributed, network embedded CSPF routing protocols and algorithms that are constrained by compute resources and the need for fast convergence times. While the PCE architecture and interfaces are standardized, the actual path computation algorithms are open because different network layers require different optimization algorithms.
There may be multiple PCEs and computation algorithms to address different layers or network domains (e.g., an IP PCE for the routing layer and transport PCE for the optical layer). A centralized PCE also plays an important role in the application of the emerging segment routing architecture. Segment routing improves scalability of the MPLS control plane by reducing state information and signaling in the network, but must rely on a TED in a centralized PCE to maintain per-service state information on allocated link capacity and to compute source routes.
The STAR path computation algorithm
Optimizing network utilization requires a routing algorithm to meet 2 objectives:
- Efficiency -- Consume as little total link bandwidth as possible to optimize cost
- Balance -- Avoid overloading any links to avoid congestion and deadlock situations
These objectives can be contradictory and may require trade-offs.
Less effective route computation algorithms may lose potential revenue opportunities when requiring more than the minimal amount of required hops to accommodate path requests. This consumes additional network resources that cannot be compensated for by incremental revenue, and will take away from the remaining link capacity available for other path requests. However, if link capacity becomes fully exhausted it may also lead to longer routes or lost revenue opportunities when path requests are rejected due to capacity shortages.
A constrained shortest path first routing algorithm such as Dijkstra picks the path with the least amount of hops, unless administrative link metrics or explicit route constraints dictate otherwise. As a result, CSPF achieves efficiency, but at the expense of balance. The min-max algorithm, on the other hand, will prefer links that are the least loaded. Min-max achieves balance but potentially by taking longer routes, which may also negatively impact network efficiency and revenue generation.
The Bell Labs STAR (Self-Tuned Adaptive Routing) algorithm finds an optimal balance between the shortest path and the least congested path. For each path request, STAR performs a series of alternative path calculations with a variable path stretch factor, and compares their link and network utilization results in order to select the optimal route.
As IP traffic is better balanced over available link capacity, the chance of network hotspots is reduced while more headroom remains available to absorb traffic peaks and future demand growth. More service requests can be routed, more revenue is generated.
To determine the relative efficiency of these algorithms, Bell Labs compared 2 different network topologies: a small 6 node and a larger 59 node topology. For each topology a traffic matrix is defined representing the traffic demand forecasted for the total of all service requests and all links are properly dimensioned to accommodate the projected demand.
Random path requests are generated with a variable capacity between 2 arbitrary end-points, up to the total demand forecasted in the traffic matrix. Path requests are routed by the default CSPF routing algorithm (with link weight set inversely in proportion to the link capacity) and by the STAR path computation algorithm.
Figure 2 shows the link utilization after 50% of path requests have been processed for the 59-node topology. The left diagram shows that CSPF already utilized over 75% of the total link capacity (in red) with some links fully utilized, while there are several (in green) with far less than 50% utilization.
As more path requests are processed, the already congested links will become fully loaded and result in rejected path requests in future. Existing service traffic on these links will also experience more latency due to increased buffering. Normally this would trigger actions to rebalance network traffic (which cause service churn) or add more capacity on congested links.
The results for the STAR algorithm at the right show a far more even link load distribution, with no links loaded more than 75% and the majority of links with a load below 50%. There is far more head room left on all links to accept more service requests. As all network traffic is fairly equally distributed over links there is no need for rebalancing traffic or for upgrading link capacity.
The million dollar question is: How much incremental revenue can be generated by better balancing link utilization and congestion avoidance goals?
To determine this, service revenue is calculated for each successfully routed path request as a function of the capacity and the topology distance in terms of minimal number of hops required by the routed path. For example, a 2 Gbps path between 2 end-points with minimal 3 hop distance generates 6 revenue units, regardless of the number of hops the actual route requires.
Paths that cannot be routed generate no revenues. To simulate service churn each path is given a random lifetime, after which it is deleted from the network.
Figure 3 shows the average results over 5 simulation runs:
- In a 6-node topology, STAR was able to accept 97% of service requests as opposed to 91% with CSPF, and supported 12% more revenue generating traffic than CSPF
- In a 59-node topology STAR could accept virtually all service requests as opposed to 87% with CSPF, and was able to support 24% more revenue generating traffic
Applications of a centralized PCE with STAR
The principle use of STAR is to optimize network utilization or “run the network hotter”. STAR achieves a more balanced link utilization, which improves revenue generation by more efficiently packing service traffic, and results in better end-to-end performance for the connections.
A centralized PCE with the STAR algorithm can also be used in a remedial fashion as an on-line traffic engineering tool to alleviate congestion on heavily utilized links. To minimize service churn, network rebalancing can be applied on the most heavily loaded links and it is possible to restrict the total amount of paths that may be rerouted. The STAR algorithm automatically picks the set of connections to be rerouted for maximum impact and suggests new routing paths.
A PCE with STAR can also enable new elastic service capabilities such as bandwidth on demand and bandwidth calendaring. These dynamic bandwidth services introduce a certain amount of network churn that creates natural opportunities for STAR to balance network traffic.
Computing a path to more profits business caseNetwork Services Platform feature pageIP/optical integration web pageQuantifying IP/optical integration synergies articleCapitalizing on IP/optical integration article
To contact the author or request additional information, please send an email to firstname.lastname@example.org.