Bring insight and automation to peering engineering
In a previous blog post, I talked about how manually managing internet peering points is becoming more problematic as traffic patterns become less predictable. Manual processes for re-directing traffic are too slow and often cause errors that make matters even worse. To quote a recent Ars Technica article, “These reconfigurations happen every day in global routing and mistakes can happen, especially since the configuration of routers is error prone and often still requires manual input, which is prone to 'fat fingers'.”
BGP is the protocol providing the routing mechanism for peering. Manual BGP misconfiguration incidents are happening with increasing frequency, sometimes causing significant widespread service or internet outages. In 2017 there have been 14,000 incidents affecting more than 3,000 systems at least once.
What is needed to avoid ‘fat fingers’, or at least minimize their impact, is a way to automatically re-direct traffic flows.
To illustrate the problems that arise, let’s consider some gamers who are subscribers of an ISP and accessing content from a gaming company. The ISP and the gaming company both share the same goal: provide gamers with the best possible experience. The ISP and the gaming company are using an intermediate peering (or transit) provider to connect to each other.
Suddenly the peering provider sends more traffic to the ISP on the same link that is used for the gaming content. This surge of traffic creates congestion on that link, which slows down the gaming traffic and ultimately impacts the users QoE.
This issue can be solved in two ways:
- The gaming company can steer its egress traffic to a less congested route, using another peering link or another peering provider, with a lower latency – this is egress peering engineering for outbound traffic.
- The ISP can steer the ingress traffic from the gaming company to another route with a lower latency - this is ingress peering engineering for inbound traffic.
Although BGP is supposed to route packets using the best path available, it has limited abilities to make routing decisions based on real-time capacity utilization and performance. In our example, the ISP or the gaming company needs to continuously monitor the latency and automate re-routing when the network fails to meet the targeted KPIs.
One of the biggest challenges facing network operators is keeping track of performance of the many different routes that traffic can take.
Network appliances or analytics tools being used today to measure latency and packet loss only measure from a single peering router. They don’t look at the broader end-to-end network context. They can’t, for instance, compute and compare various paths options and consider the effects of using alternate peering points. Instead of sending traffic to another link, there is often a geographically closer peering point through which more time-sensitive traffic can be routed, also resulting in lower latencies.
End-to-end visibility into the complete service delivery path also has to be complemented by granular visibility into traffic flows. In our example, the ISP should be able to identify the time-sensitive traffic from the gaming company. Understanding application performance across all parts of the network is critical to more efficiently solve and, even, proactively avoid problems.
What we need is a way to bring greater insight that integrates and combines real-time traffic visibility with automated control over network resources and application traffic flows. These kinds of holistic analytics now exist. Applied to peering, they will enable operators to automate traffic engineering at peering routers and network interconnections to better balance traffic flows, avoid over-provisioning and improve overall performance for their customers.
Share your thoughts on this topic by replying below – or join the Twitter discussion with @nokianetworks or @nokia using #peering #traffic #optimization #automation #latency #gaming