MPEG-I immersive audio

MPEG-I immersive audio is a new standard that enables six degrees of freedom (6DoF) rendering of objects, channels, and Higher-Order Ambisonics (HOA) for extended reality (XR). It keeps listeners fully immersed while they move and interact within virtual or augmented worlds.

mpegi

How it works

MPEG-I immersive audio pairs encoded audio with compact scene metadata and efficiently renders it in real time to headphones or loudspeakers, adapting to listener movement and 3D space. It enables lifelike audio in VR and AR and is designed for interoperability across devices and services, supporting efficient production.

At the heart of MPEG-I is a clean split between audio signals and a compact description of the scene. Creators can author channels, audio objects, and HOA, then encode them with physics-informed metadata describing source positions, sizes, materials, rooms, portals, and what can change.

On the device, a renderer fuses that stream with listener tracking (head and, when available, position) and an optional Listener Space Information (information about the playback room) to compute what you should hear. This includes direct sound with appropriate distance and Doppler, early reflections and late reverberation, occlusion and diffraction around surfaces, and smooth updates as the scene evolves.

The MPEG-I immersive audio standard was recently finalized by the MPEG Audio group (ISO/IEC JTC1/SC29/WG6). Nokia’s key contributions include late-reverberation rendering, AR support, and rendering of multiple HOA captures.

Core features

Interactive by design

Audio responds to head and body movement, scene changes, and user actions.

Rich spatial cues

Occlusion, diffraction, reflections, extent and room reverberation for convincing presence.

Authoring flexibility

Channels, objects, HOA, and multi-point HOA for walkable areas.

Room-aware

Optional Listener Space Information aligns virtual sound with real spaces and supports accessibility controls such as focus and reverb reduction. Listener Space Information is used for AR when blending the virtual content with the physical scene while for VR the acoustic description is in the bitstream.

Interoperable

Works with MPEG-H 3D Audio assets and common delivery workflows. Does not preclude other audio codecs.

Streamed and downloaded

Supports both streaming-based and file-based operation for a broad range of use cases.

Mobile-ready

Low-complexity render paths, metadata-driven efficiency, and scaling from headphones to speakers.

Future use cases

MPEG-I immersive audio enables XR applications where immersive audio reproduction configures itself to the space being captured or modelled. Potential areas include:

  • Sports and entertainment
  • Music
  • Immersive social experiences
  • XR communications in industrial and business environments
  • Virtual travel and training
  • User-created content, and more