How robots gain the gift of sight
Real Conversations podcast | S4 E11 | September 1, 2022
Sebastian holds an M. Sc. Degree in Electrical Engineering and Information Technology at the Technical University of Munich (TUM), joining their Chair of Media Technology as a doctoral candidate in 2019, while focusing on research into computer vision, spatial intelligence, and machine learning.
The cliched image of a robot is a clunky, boxy machine crashing round the place. But now things are changing, Sebastian Eger at the Technical University of Munich, explains how the latest research is giving robots the gift of sight and what this will really mean.
Below is a transcript of this podcast. Some parts have been edited for clarity.
Michael Hainsworth: In a lab in Munich, autonomous mobile robots are being given the gift of sight. This will help them see their surroundings and understand their locations much like humans do. Sebastian Eger is the Chair of Media Technology at the School of Computation, Information, and Technology at the Technical University of Munich.
He tells me he's trying to solve the problem of GPS not working indoors, and radio-based positioning not being precise enough in busy and crowded locations. From factory floors and office buildings to the outside world at large.
Sebastian Eger: That's right. My PhD studies are concerning the problem that GPS signals are not available in indoor or city areas. To solve this problem that we want to have a very precise localization estimation, we want to use cameras.
MH: You're finding that a camera is enough to give an autonomous mobile robot the gift of sight?
SE: That's true. Yes. Cameras give us an enormous amount of information about the environment. They are relatively cheap. Normally, autonomous mobile robots have more than one camera onboard. We nearly can detect 360 degrees around the robot, and we can process this information to do localization and reconstruction of the environment with them.
MH: You're in good company. We know that Elon Musk over at Tesla has been saying that all his smart driving cars need are cameras as well.
SE: That's true. Elon Musk is definitely right that cameras could be used without additional sensors for perceiving the environment, but it's still a research topic. We have to solve a lot of problems to achieve the accuracy we need for, for example, relying only on cameras.
MH: A lot of these sight systems though also use laser radar to bounce beams off objects to determine their distance and composition. Why don't you use ‘lidar’?
SE: The main reason we don't use lidar is that we want to have a very simple system with not much overhead. Lidars are currently relatively large sensors in comparison to cameras, and they are relatively complex. They're rotating all the time. They have moving parts. They are relatively expensive. They need a lot of energy. We want to focus on small agents like cameras or even drones, which don't always have Lidar.
Additionally, we want to support as many objects or agents or clients as possible. Therefore, we don't want really to rely on Lidar. We can still support it. If Lidar is available, it definitely helps to reconstruct, for example, the environment or to do also localization. But the camera already gives us enough accuracy for localization.
MH: With cameras, if you have more than one camera, it's kind of like binocular eyes. You can create depth of field and have a sense of three dimensions just using the cameras. You don't need all that extra bulk.
SE: Exactly. With these DRO (dynamic range optimization) cameras or depth cameras, the main problem is always that there's a limited field of view. This is a huge advantage of Lidar; that they have this 360-degree field of view. But drones are relatively dynamic. Just by moving these cameras into the right positions, we can cover almost all the environment.
MH: I have to say though, as a Canadian, my biggest concern would be that you'd get snow on the cameras, and you wouldn't be able to see where you were going.
SE: That's true. But indoors, normally there's no snow (chuckle).
MH: The camera though ... That's really only the first step in truly making a robot autonomous. You're going to have to process those images too. Right?
SE: Exactly. Even if you have the camera images, the main part is to process these images and get the information out of the images. Therefore, we follow a remote approach, where we offload the heavy computational stuff from the agents to, for example, an edge cloud. As I explained, we want to focus on small agents, and they don't have the resources which we need to process those images.
MH: The idea is that, whether an AMR (autonomous mobile robot) is small or large, you want to offload that processing of the visual images to something that can handle it with an artificial intelligence-based system. That's where edge cloud comes in. I can imagine as well ... This is why 5G is an excellent technology for that, because you need to do all of this in virtually real-time.
SE: Yes. Especially, if you want to support multiple agents simultaneously in a very constrained environment. Then, it should run in real-time. There's 5G with its very low latency. It's very helpful in our case. We can support then five or six autonomous mobile robots simultaneously. They all offload their image data. Or they can also perform some future extraction onboard, and then transmit and process data. But the main burden still lies on the edge cloud.
MH: And if you've got multiple AMRs connected to an edge cloud, then I can imagine you're using existing mapping technology, but you're also able to update those maps in real-time. You're essentially crowdsourcing those map updates amongst all the robots that are working within that edge cloud.
SE: Exactly. This is a very big advantage of a centralized approach, which we follow. Where every robot communicates with a single edge cloud or the next edge cloud and provides its information to the server where we do all the mapping of the environment.
Which is necessary, for example, for the robots to navigate or for localization. This data then is also made, of course, available to the other robots. So they can, for example, avoid obstacles which are occurring and are very dynamic as well.
MH: I can imagine many factory floors might be somewhat sterile and static in their environment, but every once in a while, somebody's going to knock a box onto a floor. You're going to have to be able to move around it.
SE: Exactly. Exactly. Even if there are some workers running around, this is very critical in terms of autonomous mobile robots. That they know exactly where the humans are, so they can avoid touching or running them over.
MH: I can imagine though; this isn't just for the factory floor or inside office buildings. This technology can be applied to the outside world?
SE: Yes. Of course. For example, autonomous driving. Nearly every autonomous firm is currently pursuing this technology. That they build up a very detailed map of the environment, for example, from a city. They can use this map for navigation of their autonomous cars and know exactly if there are some new construction sites or some problems on the roads.
MH: Currently, my GPS system crowdsources things like that, but it requires me to hit a button as I'm driving past the construction site to tell that GPS system that there's now a construction site there. You're suggesting that these types of technologies will become autonomous themselves?
SE: Exactly. In the future. Even now, every car is equipped with cameras. They can autonomously, collect data from driving around the city, upload that to the digital twin. That digital twin can then update the map according to the new data which was sent.
MH: The point of digital twins is quite well-received. The understanding of an environment by twinning it gives us a whole host of new capabilities and functionalities. Give me an example or some ideas as to what the benefit is of a digital twin. Whether it be on a city street or on a factory floor.
SE: The main idea between having each twin is that you have an exact copy where you can then monitor ... for example, in a factory environment, the machines have an inventory of everything lying around. You know exactly the state of your machines. Also, what's going on in real-time in the real world which is also reflected in the digital twin. For example, the robots which are running, or the autonomous cars in a city environment. You always know exactly where they are. This is also where our technology and my studies are sitting.
Since we want to localize every agent based on the cameras, we upload this to the edge cloud. There we reconstruct the environment in real-time, update the environment in real-time. We build up the digital twin on the edge cloud and use this digital twin then also for localizing other agents. For example, based on camera images.
MH: Again, this comes back to the idea that you're crowdsourcing real-time map updates to ensure that all these AMRs are working in concert.
SE: Exactly. This is very critical. Especially, for digital twins. Because you cannot just create a digital twin, and then rely on that it will never change.
MH: Right. There's no point in having a digital twin if you're in a dynamic environment and you're not updating that.
MH: What are other use cases that you'll find as a result of this research for things like digital twins through AMR? It seems to me that this is a building-block type technology that you're working on. We can build a whole bunch of new things based around these simple ideas.
SE: Yes. For example, everyone could use this with their smartphone. They could scan their environment, build a 3D model of their house or of their garden. And then, use this model for planning its own layout or checking how the furniture would look like. Also, for their own, maybe in the future, robots, which will, for example, help them if they get old (chuckle).
MH: Well, it's funny you bring that up. Because I was talking to my daughter about a conversation you and I recently had. She's a big Star Wars fan. And in all those Star Wars movies, there are tiny little robots running around the Death Star.
We always wondered, "What on earth would be the value of those tiny little four-wheeled robots?" Maybe in a galaxy far, far away, they're doing the job you're describing right now, which is mapping in real-time a changing environment.
SE: Exactly and helping and cleaning.
MH: What are some of the applications for the metaverse?
SE: For the metaverse, you need either augmented reality glasses or virtual reality glasses. Both of them also need to have localization. Because if you want to, for example, walk around in your office and this office is projected into the metaverse ... You want to have a very precise localization of your position and what you're doing. What's your viewing angle?
Here, the technology could also apply. For example, all augmented reality glasses have cameras on them. We can use these cameras' mapping and localization of the environment, then project the whole office, for example, into the metaverse. Or project 3D objects into the augmented reality glasses, which then the user can see.
MH: Again, back to the idea of using the edge cloud and 5G to do the heavy lifting on the processing of that. You're not going to want to put a full-fledged computer on your head every time you want to use a pair of augmented reality glasses.
SE: That's true. Yes. Also, here a very low latency network. Because this is very time-critical to have a pose estimation. Because if that’s not the case, the user will get very sick very soon. Because of the motion sickness. If the motion of the real-world part is not working together with the virtual part, then we have this problem that the user gets sick. We want to avoid this. Therefore, also this has to run in real-time, we have to offload the pose estimations and the processing of the images onto the edge cloud.
MH: Back in the early days of the current generation of VR (virtual reality), what made it possible was the chips that could handle real-time positioning and micro movements. That would allow the computers to also project the image into your eyeballs in virtual real-time, so that you wouldn't get that VR sickness.
This sounds like another one of those foundational technologies for that next generation leap in augmented reality. Where it's important that not only does it know where our head is moving, but has the speed to process it, and then project an augmented component on top of the real world.
SE: Exactly. Yes. Normally, you don't have six degrees of freedoms, but only three degrees of freedoms for the rather old pose estimations. In this case, you can really walk around, and you can track yourself for the metaverse instead of just moving your head around and getting different views.
MH: The early version of Pokémon Go was an example of first-generation AR (augmented reality). But the problem was the Pokémon would walk in front of things they should be walking behind. Your technology will give us the ability to understand in 3D where an object should be occluded, so that it's behind an object, not hovering in front of it.
SE: Exactly. If you move, for example, the app also knows that you are moving, and the projected object will be moving in your scene. You could, for example, go behind it and view it from a different angle. Instead of just moving it around.
MH: Nokia's Tech Vision 2030 sees digital-physical fusion as one of the major drivers for advanced networking. By 2030, how far advanced will this technology be?
SE: In 2030, it will be very far. Because the pressure from the autonomous driving firms and technology ... They're already pushing this very fast, because they need it so they can use it in cities or highways. It will be doable so everyone can, for example, create a digital twin and map its own environment, which is currently also possible. But in a way that the accuracy can also be used, for example, for tracking AR and VR glasses.
MH: Let's revisit what the implications are for the telecom industry. For the CSPs. Where do you see 5G fitting into giving AMRs the gift of sight?
SE: 5G is critical because we need this latency critical speed of 5G, that we can offload or transmit information between the autonomous mobile robots and the edge cloud. For all types of computational things, like mapping, object detection, image segmentation.
Therefore, it's very critical. With the growing numbers of robots and clients or ... For example, also smartphones. It will be a very scalable problem to support everyone if they have to use this computational performance of the edge cloud.
MH: I've always believed that we don't get big leaps in technology by just one advance. It's the coming together of multiple technologies. It sounds like in this case, we're talking about cameras, 5G, and edge cloud giving us a whole new opportunity to open up a whole new world.
SE: That's true. This combination is very... the best of three worlds in this case. Because the cameras, they are very good sensors in terms of providing us with enough information. But we need a lot of computational effort to process these images. And the edge cloud exactly provides us with that.
We have enough computational effort. We can put a lot of processing power into the edge cloud, and then the 5G network can connect both entities together with a very low latency. The user will not even know that the whole computational effort was not on its own device. It will be very cool in the future that everyone will be able to use this.