MASA format and OMASA
MASA (Metadata-Assisted Spatial Audio) is a new spatial audio format specifically designed for compact devices like mobile phones. It’s one of the input formats supported by the IVAS codec. OMASA (Objects with MASA) extends this by enabling combined encoding of audio objects alongside MASA.

How it works
The MASA format delivers accurate and consistent real-time spatial sound across a wide range of devices and listening environments. It stores spatial audio in just two audio channels while preserving crucial metadata—information about the exact position, movement and characteristics of each sound source.
This makes MASA a lightweight parametric format that is ideal for immersive calls over mobile networks. It maintains optimal quality of the original audio based on the smallest amount of data needed for coding and transmission, while avoiding the complexity—and potential degradation—of format conversions. This ensures a true 3D sound field, whether played through headphones, speakers, or other setups.
OMASA builds on this by combining audio object coding with MASA’s spatial metadata. In this approach, individual sound elements—such as voices, instruments or noises—are transmitted as separate mono audio objects, alongside the spatial audio scene.
Nokia’s Immersive Voice solution can encode spatial audio into MASA format using the phone’s built-in microphones.
Core features
Accurate spatial positioning
Place sounds anywhere in 3D space, including above and below the listener.
Metadata-driven rendering
Capture audio in real-time with irregular mobile microphone arrays and reliably transmit spatial data even in busy network conditions.
Flexibility and wide compatibility
Adapts automatically to different devices, environments and bitrates while maintaining spatial integrity.
Full object-based control (OMASA)
Adjust, move, or mix voices and ambient sounds independently during calls for a more interactive and user-friendly listening experience.