The next frontier of gaming: virtual worlds that move like real ones

Woman wearing glasses looking up at cherry blossom tree, with blurred pink, purple tree outline.

The next breakthrough in XR gaming won’t come from better hardware alone — it will come from next generation multimedia technology and networks that operate at the speed of human perception.  As immersive experiences edge closer to realism, the true differentiator is no longer pixels, processing power, of form factor. It is how effortlessly digital worlds respond to human movement, sound, and intent - in real time.

The hallmark of a truly immersive system lies not in what is seen when the hardware is worn, but in how easily the user forgets it is there at all. 

The modus operandi of immersive gaming over the years has been to reach this goal. Naturally, this has led to a relentless hardware race, with each new generation of Extended Reality (XR) devices arriving with higher-resolution displays, lighter frames, and more powerful processors. As a result, gamers today can choose from cutting-edge smart glasses,  intelligent wearable headsets and increasingly capable spatial computing platforms.   

Yet even as hardware advances rapidly, many users still sense something is “off” once an experience begins. While today’s systems offer a glimpse of an alternative reality, there remains a gap between technically impressive XR and true immersion. To bridge this gap, we must look beyond visual fidelity, graphics pipelines and toward the science of the ‘flow state’, a condition in which challenge, skill, and focus are in balance, allowing attention to fully engage and the outside world to fade away.

True immersion during gameplay  

The feeling of near immersion is a familiar peak experience to gamers. It happens when your reflexes take over, your awareness narrows, and the game world feels immediate and responsive. In these moments, every movement, sound, and visual cue works in harmony to sustain engagement. When systems respond seamlessly, players remain absorbed in the experience rather than thinking about the technology enabling it. This level of responsiveness is coming to the forefront as gaming evolves toward spatial computing, multiplayer XR environments, and more interactive digital worlds. 

 As gaming evolves toward spatial computing, shared XR environments, and persistent virtual worlds, this level of responsiveness becomes increasingly critical. Improvements in resolution, form-factor and onboard computing continue to matter — but they now deliver diminishing returns on immersion by themselves. 

Further progress depends on understanding and supporting the factors that sustain attention, engagement, and presence over time. This means designing systems that respond within the tight temporal limits of human perception — from motion latency to spatial audio cues — so interactions feel natural, instinctive, and unforced. At its core, immersion is a “human problem,” not just a hardware one. 

What enables sustained immersion

Supporting flow-driven experiences requires a complex symphony of technologies working in unison behind the scenes. Our research and innovation focus on closing the perception gap, ensuring that networks, media technologies and immersive formats operate at speeds and scales aligned with human sensory perception. 

Flow state, when attention narrows and action feels effortless, depends on precision. Even small disruptions between physical motion and digital response can break immersion instantly. This is where Nokia’s long-standing expertise in networking, multimedia experience, and standards development becomes instrumental.

Ultra-low latency connectivity: the nervous system of immersion

Ultra-low latency connectivity acts as the nervous system of immersive environments. Micro-delays between gesture, sound, and visual feedback are not merely technical issues; they are perceptual ones. Networks must deliver consistent, predictable responsiveness so movement translates instantly into action.

By designing end-to-end systems — from media encoding to transport — that operate within human reaction thresholds, immersive experiences remain fluid, responsive, and believable.

Spatial audio that moves like reality

Sound plays a decisive role in how we orient ourselves and react instinctively. As a primary inventor and leading developer of 6DoF (Six Degrees of Freedom) spatial audio, our developments are enabling users to move naturally through a sound scene, in all directions with full head rotation.

As one of the leading contributors to the MPEG-I standard (ISO/IEC 23090-4), we helped define the framework that makes this possible, allowing immersive audio to respond dynamically as users navigate virtual spaces. When sound behaves as it does in the real world — accurate, immediate, and spatially consistent — it strengthens presence and deepens emotional engagement.

Beyond the speed of movement, the sense of space is defined by the authenticity of the soundscape. High-fidelity spatial audio is essential for orientation, providing the brain with the precise sensory cues it needs to navigate and react to virtual threats or environments instinctively. When audio feels real and arrives faster than conscious thought, it deepens the user's emotional and physical connection to the experience. This precision, coupled with advanced visual processing, ensures that image quality remains consistently high without introducing the processing delays that often pull a user out of their immersion. 

Ultimately, these innovations work together to address the "human problem" of immersion rather than just the hardware specs. By integrating responsive visual compression with instantaneous feedback loops, we can sustain the delicate balance of focus and skill known as the flow state.  

Man and woman looking into a large LCD screen showing a vivid swirling neon bubble vortex

Volumetric video and interactive holographic experiences

Visual immersion increasingly extends beyond flat or stereoscopic video. Nokia has developed and showcased the world’s first standards-based real-time volumetric video communication system, built on Visual Volumetric Video-based Coding (V3C) and MPEG Immersive Video (MIV) standards.

This system leverages existing 2D video coding tools, making volumetric experiences more scalable and cost-efficient than approaches that rely on proprietary pipelines. It can achieve glass-to-glass latencies of around 160 milliseconds, comfortably within conversational interaction thresholds and below the latency of traditional teleconferencing services.

This capability opens the door to future interactive holographic experiences — where players and participants are no longer passive observers, but active elements inside spatially accurate digital scenes.

Compression that makes immersion practical

Behind every immersive experience lies advanced video compression. Nokia’s inventors have been deeply involved in the development of every major, market-adopted video codec, from H.264/AVC in the early 2000s to H.266/Versatile Video Coding (VVC), completed in 2020.

Each generation has roughly halved the required bitrate without compromising picture quality. This continuous evolution is essential for XR and gaming, where high visual fidelity must be delivered reliably, at scale, and with minimal latency - often over wireless and cloud-based networks.

As a primary contributor to MPEG immersive media standards (MPEG I), including V3C, Video-based Point Cloud Compression (V PCC), and MIV, Nokia is helping define how immersive content is captured, compressed, transported, and experienced globally.

Closing the perception gap

Sharper graphics and faster processors will continue to matter — but they are no longer sufficient on their own. The next leap in XR gaming comes from closing the perception gap: aligning networks, media technologies, and immersive formats with the way humans see, hear, and react.

By combining ultra-low latency networking with advanced spatial audio, volumetric video, and highly efficient video compression, Nokia is helping ensure that immersive worlds don’t just look realistic — they move, sound, and respond like the real one.

When technology operates at the speed of human perception, it fades into the background. What remains is presence, flow, and experience.

Discover how Nokia’s technology is pioneering immersive player experiences.
Learn more: https://www.nokia.com/licensing/patents/gaming/

Döne Bugdayci Sansli

About Döne Bugdayci Sansli

Döne Bugdayci Sansli, (M. Sc), is a Principal Researcher at Nokia, where she works on next-generation video coding and compression, turning cutting-edge research into practical contributions for global standards. In the past, in addition to algorithm development for video compression standards, she has worked on HDR/WCG image and video as well as mobile imaging.

Connect with Döne on LinkedIn

Article tags