Unveiling MPEG-I: The next generation of VR and AR audio

The MPEG-I immersive audio standard, recently finalized by the MPEG Audio group (ISO/IEC JTC1/SC29/WG6), is the latest addition to the MPEG-I suite for immersive media. It enables lifelike audio experiences in VR and AR, ensuring interoperability across the device and service ecosystem to boost efficiency and reduce costs in production.
The standard (ISO/IEC 23090-4) and the reference software (ISO/IEC 23090-34), supports six degrees of freedom (6DoF) immersion, advanced rendering, and efficient storage and streaming. It models realistic acoustic environments - including early reflections, reverberation, occlusion, diffraction, and Doppler effects -making navigation in VR/AR more authentic. The modelling capabilities of MPEG-I immersive audio will help make high-quality immersive experiences more accessible and widespread for creators, developers, and end users.
While VR audio reproduction takes you to completely imaginary acoustic environments, in AR reproduction your physical sound environment is complemented with additional virtual sounds in a plausible manner. In both AR and VR, the user is completely immersed in the surrounding acoustic environment and can navigate the sound scene while audio sources stay at their respective positions.
Technology driving the standard
MPEG-I immersive audio standard supports 6DoF rendering of object, channels and higher order ambisonics (HOA) for VR and AR. Nokia’s key contributions to the standard have been in late reverberation rendering, augmented reality (AR) support and rendering of multiple HOA captures.
The late reverberation in MPEG-I immersive audio is the first immersive late reverberation for VR and AR applications that can automatically configure itself for physical acoustic environments and supports connected environments. AR support in MPEG-I immersive audio is enabled by the listening space information interface that enables applications to configure the virtual representation of the physical listening space. Rendering of multiple HOA signal captures enables the end-users to experience full fidelity 6DoF environments which have been captured live without costly pre-production.
Nokia envisions MPEG-I immersive audio to enable novel VR/AR and extended reality applications where the immersive audio reproduction automatically configures itself to the space that is being captured or modelled. The rendering of multiple HOA signals facilitates user-friendly capture and transmission of immersive audio scenes enabling full listener movement, enabling cost efficient productions. Some of the key domain areas may include, e.g., sports and entertainment, music, immersive social VR/AR experiences, virtual travel, user created content, and beyond.
The next step in standardization is to develop the rendering technologies even further based on the feedback from the application developers for key use cases. Nokia has already made important first steps in enabling the reference rendering software to run in consumer mobile devices. Nokia is constantly researching and gathering feedback from early standard adopters and content creators on how the developed state-of-the-art techniques can best serve the audio industry to drive the future of immersive 6DoF audio.
Join us at the 2025 AES Long Beach Convention
At the 2025 AES Long Beach convention, Nokia will showcase these life-like audio experiences with multiple HOA signals captured by using off-the-shelf mobile devices illustrating potential for a wider target addressable market of the standardized technologies. This makes it possible for mobile phone users to capture complete 6DoF audio scenes with their mobile devices. Nokia also showcases a novel combination of a microphone-array captured and object based 6DoF scene.
To try the magic for yourself, come and visit us at the AES exhibition area booth number 312. We look forward to discussing the future of immersive audio with you.