Main content

Building presence, not just connections

Building presence, not just connections

What if the next breakthrough in communication isn't about seeing or hearing more, but about feeling present?

We can hear each other. We can see each other. Yet even the most advanced communication technologies rarely make us feel as though we are truly sharing the same space.

That is about to change.

At AES AVARIG 2026 in Paris, from June 30 to July 1, Nokia, Dolby and Fraunhofer IIS demonstrate how immersive communication, six degrees of freedom (6DoF) experiences and advanced scene rendering can bring people closer to that reality. Visitors will be able to experience first-hand how advances in immersive audio, XR technologies and open standards are creating communication experiences that feel more natural, more interactive and more human.

Exploring next generation communication use cases

For years, the industry has focused on improving the quality and efficiency of voice and video services. The recently standardized 3GPP IVAS codec introduced a new dimension to the conversation by enabling immersive audio experiences that preserve spatial information and create a stronger sense of presence.

As the world’s first spatial audio-enabled voice codec for cellular networks, 3GPP IVAS lays the foundation for immersive communication. What we are demonstrating now builds on that foundation by combining the benefits of IVAS with MPEG-I immersive audio to explore interactive 6DoF experiences and advanced scene rendering. Together, these technologies show how future communication can become richer, more natural and more spatially aware on devices.

This is not a shift away from immersive voice, but an expansion of what becomes possible: from immersive voice to immersive presence

That distinction matters.

Immersive voice helps us hear conversations more naturally. It can preserve the ambience of the surrounding environment, if listeners choose to include it,  and reproduce voices from the direction they come from. Immersive presence takes that idea further. It is about making people feel as though they are inside a shared space, able to not only hear what is happening but to move through and interact with the scene around them. That could mean true telepresence in a remote location, movement within another audio space, or the ability to step into a preferred virtual environment. Achieving this requires more than a communications standard. It requires a common way to represent, render and interact with sound as well as visuals, within a shared spatial environment.

This is where MPEG-I immersive audio enters the picture.

Why presence matters

Human communication is inherently spatial. In a physical meeting, we rarely think about how much information we gather from the world around us. We know where people are standing. We recognize who is speaking based on the direction of their voice. We understand movement, proximity and group dynamics almost instinctively.

Most digital communication systems discard much of this information.

Traditional calls reduce conversations to a voice stream. Video conferencing adds visual information but still confines interactions to a flat screen.

As communication increasingly expands into XR environments and spatial computing experiences, those limitations become more apparent.

Future communication systems need to preserve the spatial cues that make real-world interactions feel natural. They need to create presence.

What MPEG-I brings to the table

This is where things get interesting.

Most communication systems today assume that the listener is fixed in one position. Whether you are participating in a voice call, video conference or streaming experience, the content remains largely the same regardless of how you move.

MPEG-I immersive audio introduces a fundamentally different approach.

Rather than delivering the same fixed audio mix to every listener, it enables an interactive spatial scene that can change as the listener moves. Voices, sound sources and participants can be positioned within a three-dimensional environment, while the renderer continuously adapts the experience based on the listener’s position and orientation.

What makes MPEG-I immersive audio so powerful is its ability to support six degrees of freedom audio experiences. We are no longer just reproducing captured audio.  We are rendering a spatial scene that people can move through. As the listener changes position, the renderer updates the audio presentation so that the spatial relationships between voices, sound sources and the listener are preserved. 

This is what six degrees of freedom, or 6DoF, makes possible. Instead of being locked into a single listening position, users can approach a conversation, move around participants, explore an environment and experience changing perspectives.

The audio behaves much more like it does in the real world.

That flexibility is where MPEG-I immersive audio becomes especially compelling. It can support object and channel-based audio formats, configurable reverberation for virtual and augmented reality environments, and seamless transitions between VR and AR spaces. It can also support multiple Ambisonics captures, allowing listeners to move between the microphone positions and experience a scene from different vantage points. 

In practice, that means MPEG-I immersive audio can help create experiences that feel less like listening to content and more like being inside it: joining a remote meeting room, walking through a virtual concert space, moving between sound perspectives in a live event, or blending real and virtual audio in an augmented environment. 

This is one of MPEG-I’s important contributions to immersive communication in the future: it turns communication from something users simply receive into something they can actively explore.

Open standards, real collaboration

One of the most rewarding aspects of standards work is seeing different areas of expertise come together.

The demonstration at AES AVARIG showcases technologies contributed by Nokia, Fraunhofer IIS and Dolby, highlighting how collaboration can accelerate innovation and create experiences that no single company could deliver alone.

Each partner contributes expertise from their different technical perspectives. Together, those contributions help create experiences that are greater than the sum of their parts.

For Nokia, this collaborative approach is essential.

Standards succeed because different companies bring their best expertise together and no single company can build an ecosystem alone. The real value comes when technologies developed by different organizations work together seamlessly.

The objective is not to create isolated demonstrations or proprietary solutions. The objective is to create interoperable foundations that allow innovation to scale across networks, devices and applications.

What's new for Nokia since Mobile World Congress?

Visitors who experienced Nokia's immersive communication demonstration at Mobile World Congress earlier this year will notice several important developments.

The most visible change is the introduction of live video communication.

While the MWC demonstration focused primarily on immersive audio communication, the latest version allows participants to appear as themselves within the experience. Using the smartphone's front-facing camera, the remote participant's actual face is captured and displayed on the communication robot.

In this version, the participant appears through live video rather than as an avatar. 

Immersive experiences today have come a long way, and avatars already play an important role in representing users in virtual environments. In our demonstration, we are exploring a complementary approach: enabling communication through live video of the participant. We believe that seeing actual facial expressions while also experiencing spatial audio and 6DoF interaction can deepen the feeling of presence and human connection.

This addition is the next step in transforming the overall experience.

The combination of real-time video, immersive audio and spatial interaction creates a stronger sense of telepresence and human connection.

Our latest version also introduces a portrait-mode user experience that can be comfortably operated with one hand, making the interaction feel familiar despite the advanced technology behind it. This also allows the user to easily move around in a virtual scene in spaces where physical movement is limited.

Equally important is what the demonstration does not require. Many immersive experiences today rely on powerful PCs, tethered systems and bulky head-mounted displays. Nokia's implementation takes a different approach. The experience runs on a mobile platform paired with lightweight XR glasses. That decision was intentional.

Our focus is on technologies that can eventually integrate into real communication ecosystems. Demonstrating immersive communication on a mobile platform helps bridge the gap between experimental technology and practical deployment.

Looking beyond Paris

The most exciting aspect of this work is not a specific device, demonstration or standard. It is the possibility of fundamentally improving how people connect.

Every major advance in communication has expanded our ability to share experiences across distance. Immersive communication represents the next step in that evolution by preserving something that traditional voice and video calls often lose: a genuine sense of presence.

The technologies we are demonstrating today are only the beginning.

As immersive technologies continue to evolve, we will see new possibilities emerge across communication, collaboration, media, entertainment and spatial computing. Many of the most compelling applications have likely not been imagined yet.

The work also continues.

Visitors to AES AVARIG will be among the first to experience this latest evolution of immersive communication. For those unable to join us in Paris, Nokia will continue the journey at International Broadcasting Convention (IBC) 2026 in Amsterdam this September, where we will showcase new advances in immersive media and communication technologies.

What you see today is not the destination. It is the next step.

That is the power of open standards. They create a foundation that others can build upon, improve and take in new directions.

Building the future of immersive communication will require an ecosystem of innovators, developers, researchers, device makers, operators and content creators working together.

The technology is maturing fast. The opportunities are enormous.

Now it is time to build what comes next.

Join us.

Arto Lehtiniemi

About Arto Lehtiniemi

Arto Lehtiniemi (M.Sc., Dr. Tech) is Head of Immersive Audio Standardization and Research at Nokia, focusing on shaping the future of audio for mobile, VR, and AR. He drives innovations in spatial audio and 6DoF technologies, combining technical expertise with user experience design. As a prolific inventor, Arto has contributed to over 500 patent applications across immersive media, audio, mobile solutions and beyond. His work includes building forward-looking prototypes and enabling next-generation audio experiences. With a strong music background, Arto brings creativity and engineering together to deliver impactful solutions.

Connect with Arto on LinkedIn

Antti Eronen

About Antti Eronen

Antti Eronen is a Principal Researcher at  Nokia Technology Standards. He holds an M.Sc. and Ph.D. in signal processing from Tampere University of Technology and has over 20 years of experience in immersive audio innovation. His work spans spatial audio rendering, sound recognition, and multimedia standards. Widely published and a named inventor on numerous patents, Antti is passionate about turning advanced audio research into real-world impact.

Article tags

XR