Compelling lifelike audio experiences: MPEG-I Immersive Audio is the future for VR and AR

by Kari Järvinen , Sujeet Mate

5 Dec 2023

Woman holding a smartphone in her hands and listening to music

No Virtual or Augmented Reality (VR, AR) experience is viable without compelling audio. Immersive audio gives the listener the sensation of being fully immersed in a sound scene. The surrounding sound field continuously adapts to the user’s spatial movements and head orientation to create a feeling of being truly present in the scene – just like in real life.

Audio is of paramount importance for the overall immersive experience especially in musical performances and live events. Now, imagine enjoying these in the privacy of your own home while being able to virtually move around the event venue through VR or AR while experiencing the audio scene from any position. You could, for example, immerse yourself in a concert experience, whether by joining the cheering crowd or even the band on stage, surrounded by the most realistic audio.

Furthermore, lifelike immersive audio will make virtual travel and real estate exploration much more enjoyable and engaging. When visiting a new city or at a shopping mall, AR immersive audio can bring you information about interesting things nearby and guide you to them as if a personal guide was walking beside you. Augmented audio can deliver sounds from nearby places, tailored according to your personal interests, so that you will no longer miss anything worth visiting.

These are just a couple of examples of the useful and exciting applications enabled by immersive audio. And the best part is that to experience this new, rich, and lifelike soundscape, all you have to wear are tiny earbuds – which you might wear in any case.

The new MPEG-I Immersive Audio standard

The MPEG-I Immersive Audio standard is currently being finalized by MPEG Audio group (ISO/IEC JTC1/SC29/WG6). The Committee Draft is expected to be released next year, and the final standard will follow one year later in 2025.

As one of the latest additions to the MPEG-I suite of standards for immersive media, MPEG-I Immersive Audio will bring lifelike immersive audio experiences into AR and VR. Together the MPEG-I standards will ensure interoperability between all parts of the ecosystem, thus improving efficiency and reducing costs. These standards will be easily available to serve the needs of the media industry. Creators will be able to make high quality content for a wide range of applications and products, whilst developers will benefit from the rich set of features and guaranteed high quality. VR, AR and metaverse in general will become more affordable and widespread.

MPEG-I Immersive Audio covers technologies for six degrees of freedom (6DoF) immersion including 6DoF rendering and supporting metadata with bitstream syntax for efficient storage and streaming. These will further improve immersive experiences for the end user. The new technology supports audio sources with spatial extent and unique sound characteristics of a real-life environment, such as different musical instruments. Rendering uses modelling to create realistic acoustic environments, like different room sizes, and geometry to describe the elements that affect sound, like walls, doors, and furniture. It also supports complex acoustic phenomena, such as echoes, blocked sounds (occlusion), bending of sounds (diffraction) and changes in sound wave frequency due to motion (the Doppler shift). The rendering even considers the acoustic properties of different materials used in, for example, walls and furniture, making the experience more authentic. This will make navigating in the VR or AR world with 6DoF, including 3D spatial navigation (left-right, forward-back, up-down) and head rotation (roll, pitch, yaw), even more realistic for the end user.

Sounds are different for VR and AR

As said, MPEG-I Immersive Audio has been designed to provide compelling, real-life audio experiences for both VR and AR. The end user of this technology will be fully immersed in the vivid audio of lifelike scenes, such as a walk in a forest, while being able to interact with the virtual or augmented world.

What needs to be kept in mind is that VR and AR are two different things. In the case of VR, the acoustic properties of an audio scene are known during content creation, while AR rendering uses the acoustic properties of the actual real-life space around the user. Immersive audio in AR applications adapts to user’s real-life surroundings and creates the impression of the sound source truly being present in the same physical space. MPEG-I Immersive Audio will use binaural rendering for headphones, like most VR and AR applications do today, but it will also support rendering to loudspeakers.

Woman walking in the woods

The user will be fully immersed in the vivid audio of lifelike scenes, while being able to move around and interact with the virtual or augmented world.

Nokia leads in the development of MPEG-I Immersive Audio

Nokia has been active in the MPEG-I Immersive Audio standardization from its very beginning in early 2017. As one of the leading developers we are also part of the group of companies whose technology was selected as the baseline for the new standard through winning the Call for Proposals. Furthermore, Nokia is among the key contributors to the ongoing post-selection improvements work (Core Experiments). We have also played a significant role in the creation of the software tools necessary for the evaluation and development of new 6DoF immersive audio technology. Currently Nokia is actively driving the finalization of the standard through improving the existing baseline technology and extending it with new features.

Our researchers continue to lead in the development of 6DoF immersive audio for Virtual and Augmented Reality. With over 30 years of world’s leading audio research and innovation, our advanced immersive audio technologies enable compelling, lifelike audio for both virtual and augmented use.

About Kari Järvinen

Kari Järvinen, (M.Sc., Lic.Sc. (Tech.)), is a Distinguished Scientist at Nokia Technologies and a Nokia Bell Labs Fellow. He is an internationally acclaimed expert in voice and audio compression/transmission with about 20 years’ experience as chairman of working groups in ETSI and 3GPP standardization.

Connect with Kari on LinkedIn

About Sujeet Mate

Sujeet Mate, (M.S., Dr. Tech), is a Principal Researcher at Nokia and a Nokia Bell Labs Distinguished Member of Technical Staff. His expertise is in end-to-end multimedia systems. He contributes to immersive multimedia standards in MPEG Audio/Systems (ISO/IEC JTC1/SC29/WG 6/3) and 3GPP SA4.

Connect with Sujeet on Linkedin

Article tags

Immersive Audio Standardization

Coding for connection: Voice codec and the foundation of communication

Kari Järvinen

15 Jun - 3 minutes read

5G Innovation Voice Audio

Select your country