The Wizard of Network Optimization
Podcast episode 22
If 80 percent of wireless infrastructure is aimed at the roads, how are our mobile networks handling the COVID-19 crush of traffic at home? Enter the Wizards of Network Optimization, who don’t have to climb a pole to ensure your episode of Tiger King streams flawlessly on Netflix.
Host Michael Hainsworth speaks to Nokia’s Amit Mehrotra about the lessons he learned on the front lines of the coronavirus crisis as it was growing.
Below is a transcript of this conversation. Some parts have been edited for clarity.
Michael Hainsworth: As the world shelters in place under COVID-19, demand for wired and wireless internet access has skyrocketed. But how are telcos keeping up? Enter the wizards of optimization, the techies who know which knobs to turn and which buttons to press to get more out of existing wireless spectrum and fiber optics. We turn to Amit Mehrotra, the Dumbledore of Nokia's Hogwarts. We begin by discussing the fact that 80 percent of wireless network infrastructure is aimed at roads. And we're all at home.
Amit Mehrotra: That's a great place to start Michael. So in this world where we're all cocooned inside our home, turns out that 80 percent of the coverage that we have been building for the last several decades is, like you said, "built for the roads." And we're all sitting inside. So it creates a big challenge for the carriers, for your service providers to reach you, and therein lies a big part of the problem we are trying to solve, which is when people are consuming more, sitting inside their home, it puts extra load on the network. Txtra capacity, extra signal boosting is needed. So all of those things are actually the challenges that the carriers have trying to serve you when you're sitting inside.
MH: So what happens when 80 percent of network traffic is suddenly generated indoors but only 20 percent of the infrastructure is aimed at indoor activity?
AM: When you start to have a lot of indoor consumption, the signal that's serving the carrier, the user is not of the best quality. And when that happens, it puts an extra load on the capacity of the network. So in other words, the wireless carrier might have built their network for adequate capacity, but if you go indoor and start to consume more, watching videos and things like that, you are now actually putting a bigger load on the network. What that does is it very quickly erodes the built-up buffer that typically carriers have, and you start to run out of capacity. Users start to experience that the signal may not be that great, because your house has nice, good bricks on the outside that's blocking a bunch of signals from coming inside. So, for the user, it may become a challenging situation, and for the carrier, it would become an extra load now to cater to.
What you do have is the ability to move things around, shift things around, squeeze a little bit more out of the network, and try your best.
MH: Telcos plan for a 50 percent increase in demand, year-over-year, but with COVID-19, we saw that 50 percent jump happen in a single month. You know, Italy for example, saw a 400 percent increase in video calls. Can you even plan for that?
AM: That's a great point, and no, you can't. You'd argue, or your accountant will argue you shouldn't, and so the challenge here is that it's a completely extraordinary time that we're encountering and the worst part is, we don't know when it's about to end either. So what you do at this stage is you bring all the resources to bear, so you can't really plan for this kind of stuff, but what you do have is the ability to move things around, shift things around, squeeze a little bit more out of the network, and try your best, without having to break the network fabric, to serve your customers. And that in a sense is what network optimization is all about. It's about getting more out of the existing network.
MH: So you helped Netflix optimize traffic by as much as 60 percent. How?
AM: The challenge in watching videos and stuff is, you know, number one, we enjoy it. I mean as a user, I love it. But the majority of these over-the-top (OTT) video that is being provided, like Netflix, for example, is all encrypted. It means that the carrier, whose network is carrying those video packets over to you - they don't really understand what's going in the packet. It could be video, it could be something completely different. So the first challenge was without knowing what's in there, it's very hard to then purposefully give them special status or provide additional capabilities and so on. So we figured out a way to, without needing to unpack the whole thing, just from the shape and using machine learning, identify what is video, and what is not. And once we did that, we were then able to start overcoming lots of little challenges along the pathways, where you could tweak the buffer in one place, and if you allow the packets to go in a different size and a different route, you are able to achieve a better outcome, better performance. Once you start to piece these things together, what you do is you start to uncover or remove these inefficiencies inadvertently built into the network by understanding more and bringing AI and ML, so that's really what we have done is, with Netflix and other video services, be able to overcome those bottlenecks by understanding what's in the packets and optimizing for that particular packet.
MH: There are two types of packets though: TCP and UDP. How was optimization different between them?
AM: Each packet, and in fact, once you get inside the walled garden that is the telco network, there is not just TCP and UDP, which is on the IP side of the packets, but there are a whole bunch of other protocols that we have to work with. There's GDP, there is signaling, there’s user plane and then there's control plane. So this requires domain expertise. This requires somebody who truly understands what is traveling and how do you piece it together. What is a good flow supposed to look like? And then we look for anomalies. We say, "Okay, I know it's supposed to go this way, why is it not doing that?"
The lessons learned here get to be applied to optimization for networks in the future.
MH: So a benefit of utilizing granular subscriber and application awareness, by being able to get into those packets, is that a service provider can tailor new services based upon subscriber preferences and profiles. So this isn't just about addressing the issues of today and the needs of today under COVID-19. The lessons learned here get to be applied to optimization for networks in the future.
AM: Exactly. There's a lot to be said that a lot of the network and capacity that we're consuming, more than 80 percent of the traffic that's going on these networks is video. So video was kind of dear to our heart. We understand that if we can do this better for video in general, with or without COVID, we can help the carrier be much more efficient in their network resource utilization. And that is what we have been doing, and fortunately, we had some of these solutions in hand. So when we are confronted with a crisis situation like this pandemic, we can bring that out rather quickly. We have the ability to get in and optimize so that these video streams are utilizing only the resources they absolutely need, leaving the rest for people to consume voice, which by the way has also gone through the roof. We're seeing usage in voice going up 2x in many of the networks, because people who are now disconnected physically from each other want to stay connected by wireless devices. And that is taking some of that capacity. So we're trying to make sure there is enough room for all of these things to happen simultaneously.
MH: When you focus on optimization, you can focus on the hardware or the software. But in both cases, it amazes me that it's rare that a technician would actually have to visit any of those areas needing more capacity.
AM: This is increasingly the direction that we are all, as the telecommunications industry, going. We understand that fundamentally, there are some physical assets, there is an antenna out there, there is a base station out there, and that's what is serving the needs. But we want to minimize the need to keep running up to that place. So often, that translates into building a little bit extra capacity, throwing in an extra couple of cables, so that I don't have to visit the site for the next couple of years. Now, starting I guess in the previous generation 4G already, we have inserted little server motors on these antennae, so you can remotely tilt up, tilt down, et cetera, so that's a very standard capability. Increasingly, where things are headed is a technology called massive MIMO, where basically, you are steering the beam in the direction you feel you need it the most. And this requires the ability to sense the environment. This requires the ability to electronically cater to a diverse and dynamic situation without needing to do physical changes. And by the way, we are finding with this COVID situation, those tools are not widely deployed, but where they are, they’re proving to be very useful.
MH: If an actual antenna needs to redirect its signals for either greater capacity or faster speeds, tell me more about how down-tilting or up-tilting works. It's physically moving the antennae to shape the beam without having to climb a pole?
AM: So that's the foundational technology there, right? So you literally have a situation where antennae emits a signal, not unlike your home WiFi router which has a little dongle thingy sticking up, and it's radiating, and it's making sure that signal is received and transmitted in and out. In a similar manner, when you see these larger base station antennaes, they're doing effectively the same thing. Now sometimes, you need them to focus all their signal, let's say to cover a mall, and in other cases, it's a large spread-out suburban neighborhood and you want to cater to all of them. No particular hotspots show up, so you need to cover for coverage versus hotspots or capacity coverage. So these principles are what a network design is based on. But once you've designed the network, typically, the human population around that area might change, the infrastructure may evolve, or other changes, and in those situations, you may need to go back and up-tilt or down-tilt or reorient the antennae to point it towards where the hotspots have shifted. What we are trying to do is leverage the advantage of adjusting the settings remotely and that's pretty standard practice. But with massive MIMO and many other technologies, there is now the ability to do it in an almost automated sense . There's an algorithm that decides, well, I'm not getting any traffic from the left side of my ray, I'm gonna move to the right side.
MH: But planning to proactively manage the amount of bandwidth used in the network for video or otherwise is all about network resource savings, particularly because you're dealing with some premium RAN resources here.
AM: That is correct. I think the most expensive resource the carrier has to deal with is the spectrum itself. Spectrum is generally given on large leases, and the carrier has probably paid billions of dollars to get access to broadcast on that particular channel. This is a very important factor. In fact, we have developed machine-learning-based tooling and capability that targets improving spectral efficiency, which is how effectively the spectrum is being used. I'll give you an example. We ended up going to a small network, and by running this as SVM capability in that spectral performance optimization capability, we were able to get an additional 60 percent capacity in the network. With what you have spent, you may want to focus on making sure that you're extracting the most out of that particular resource, and that becomes a critical ROI factor.
That's the game. To make sure that you try to stay ahead of the demand.
MH: What if a telco does have spare spectrum? How do you deploy that?
AM: Okay, that's a great question. With some telcos, depending on where you catch them in the cycle, there is a constant amount of build and action going on in the field. So there are new antennas, new radios, all of these things being added. You may get lucky, in a crisis-type situation, where you already had some latent capacity there. You just have to activate it. In other cases, typically what happens is you consume capacity through licenses, software licenses. So if you have the hardware already out there, you usually watch and see, okay, I think I need another carrier, another five megahertz carrier, or another 10 megahertz carrier and you let it run its course before you turn on another license. One of the things that we at Nokia did was, as COVID-19 started to spread, we reached out to many carriers across the world and offered them the ability to have additional capacity licenses to help them in this time of their need, just so that, if they have the hardware in place, they can leverage that. But to your larger question, you do need to go out there and have the required hardware in place before you can do that, so there is a little bit of a dependency before you unleash more capacity.
MH: What if the bottleneck isn't wireless but at the fiber-optic layer?
AM: That happens all the time. We tend to see that, depending on how robust the infrastructure is, sometimes even how new a neighborhood is may constrain how much fiber is serving that particular area. And that may or may not be the bottleneck. If you end up in a situation where you do have a very limited amount of fiber serving an increasingly growing capacity, there is no other option other than to go dig. Dig up the ground and then put more fiber. But typically, we have seen in the U.S., typically, we have had situations where there is a lot of fiber already in the ground, and you just have to be a little bit resourceful. Tap into an adjacent ring of fiber that you may be able to borrow or do some planning and reaction, which is what a lot of the carriers are doing right now. As they're being hit with this deluge of additional capacity, they’re planning and doing these workarounds so that they can borrow capacity from adjacent circuits and build a larger capacity.
MH: The metaphor I'm thinking of is networks are built in rings of capacity.
AM: Yes, that is true. Definitely true for fiber rings, which typically serve these large neighborhoods. And the circuits are purchased in terms of Gs, you know, it could have a 100 G circuit, et cetera. So it turns out that all of these places, all of these parts can become the bottleneck as you pointed out, and that's the game. To make sure that you try to stay ahead of the demand.
MH: So the hardware is provisioned with a certain number of software licenses, so it's possible to pump more through an existing line just by essentially flipping a software switch?
AM: That's increasingly where we would like to be. In fact, the next generation, this 5G that we talk so much about, is a lot about software-defined networks where you have the capacity and you have capability hardware available, and you try to control more and more through software and licenses. I would say most networks today, they're primarily a 4G network with a layer of 5G starting to show up. We are somewhere in the middle. Most of the advanced 4G networks have the capability to do a lot with software licenses, but fundamentally, the next-generation networks will do this a lot more effectively.
MH: So full circle to the issue of COVID-19. It sounds like the next-generation network could largely be managed anywhere in a WFH type of environment or otherwise.
AM: Indeed. We'll see what comes out after all the dust settles, and we look back at this time. But haven't we as a society fundamentally changed as we encounter this? I was looking at some of the stats for example. My kids, they're having Zoom classrooms. They have education apps. I mean, they're sitting there two to three hours a day doing homework, and education apps turn out to be one of the largest growth segments that are going to come out of this COVID-19 experience. Take any collaboration tool, any radio streaming service; you've seen anywhere from 40 or 50 percent to 150 percent increase in usage. And then we're talking about wireless usage, we're not just talking about people doing this on their home computers, but on the device, because a pattern starts to change. An interesting area there is video gaming. Turns out, this is a boom time for video games. We've seen close to, I want to recall the numbers here, probably 85 percent is what I read was the amount of increase seen in online video games on mobile devices. I think this is here to stay, Michael, and we might as well start to figure out when that happens, how do we provide that low-latency, high-bandwidth service, when my kids are playing Fortnite? So that they don't complain that they missed the shot because the network was slow.
MH: And that's the interesting problem that you as a wizard of optimization has to deal with, that you have to provide not only the low latency that's required for high reaction times in things like video games - a video or a Netflix movie doesn't require that low latency, but it requires high capacity. So you have to provide both at the same time. No pressure.
AM: Yes, and you talk to some of the CTOs of our business now, and they will say that that is a huge challenge. To provide a very low latency high bandwidth network to every square inch of the United States is simply not possible. So what we are doing is we are letting data tell us how to do this. We are letting social media data and heat maps tell us where people are consuming. For instance, I actually have a heat map for gamers. So I understand that that's where I need the low-latency circuits versus where I need to provide high bandwidth and no low latency. And why is that important? Because just uniformly saying “Give me everything everywhere” is a business model that won't work. But once you start to see data, you understand what application is being consumed. That can be done now. You know exactly where consumption is happening. That can be done now. You are able to understand, even in an encrypted packet, whether it is a video or it is not a video. And all of these are using big data technologies, all of these are using AI and ML, and the latest state-of-the-art cloud technologies are being utilized. So that to us is the frontier for where telcos are headed: A data-driven network where optimization decisions are made remotely, and you let your network data and telemetry tell you what you need to do.
MH: You mentioned machine-learning as a key to this entire situation. Are you worried at all about artificial intelligence putting you out of a job at some point?
AM: Well, on the contrary. You know, to take that phrase ‘wizards of optimization’ forward, the wizards of optimization will be learning at their respective Hogwarts schools how to do AI and ML first. These are the tools that we absolutely need. They're going to be indispensable and necessary as we enter a world where IoT devices are gonna proliferate. You're going to have tens of millions of devices connected. We won't be able to keep up unless we embrace these new technologies, new tools, next level of scaling. We are entering into a very exciting era of 5G, of IoT, and I think COVID is challenging us to start to imagine how it will all work out.