Frequently asked questions
Looking for more knowledge? Here are some of the most common questions we hear about the IVAS codec—and how we answer them when speaking with industry experts.
The IVAS (Immersive Voice and Audio Services) codec is standardized and technically validated. Early demonstrations are already available, with more in development.
Widespread commercial deployment across chipsets, devices, and networks is still on the horizon. However, with industry-wide collaboration, a phased rollout could begin within the next couple of years, accelerating as hardware, software stacks, and operator infrastructure mature.
Not necessarily. IVAS can run purely in software, but for power-efficient, low-latency performance in real-time spatial audio mobile calls, it should ideally be implemented in the DSP or chipset.
Mobile voice codecs like AMR, EVS, and IVAS are typically integrated into the baseband processor’s DSP or audio subsystem.
That said, IVAS can be implemented in software, making it feasible for:
- Early prototypes or apps that want to trial IVAS
- Legacy hardware without specialized DSP support
- Over-the-top (OTT) apps with custom media pipelines
No. IVAS is a 3GPP-standardized, algorithmic codec. It uses traditional signal processing, not AI or machine learning, to deliver real-time immersive spatial audio.
Yes. IVAS is input-format agnostic: it supports mono, stereo, multichannel, and object-based audio, but doesn’t require any specific configuration. You can still receive spatial audio from others even if you're sending mono.
Let’s take an example:
- If your device has only one microphone, IVAS will transmit mono audio using legacy modes (similar to EVS).
- If the other caller is using a spatial audio-enabled device, you’ll experience immersive audio playback, rendered to stereo or mono depending on your output setup.
This flexibility and backwards compatibility allows gradual adoption across a mixed-device ecosystem.
Yes. While smartphones are key enablers of IVAS services, the immersive audio for IVAS can also be captured and created by other devices and systems such as conference phones, ambisonics microphones and audio mixers.
Yes. IVAS supports multi-channel, object-based, and metadata-assisted audio, which allows rendering multiple speakers in a spatial scene.
However, keep in mind that IVAS is a codec, not a full communications protocol.
Group call functionality depends heavily on the surrounding call architecture—especially IMS or OTT service design—handling multiple audio streams.
For immersive audio modes, IVAS supports a flexible bitrate range from 13.2 to 512 kbit/s, enabling use in both low-bandwidth environments and high-quality streaming scenarios.
IVAS operates through the IMS (IP Multimedia Subsystem) core in VoLTE and VoNR environments. Key requirements include:
- SIP/SDP signalling support for IVAS media negotiation
- Codec integration with EVS, including fallback to fullband if IVAS isn't supported end-to-end
- Support for real-time metadata (e.g., spatial position, head tracking)
While no special RAN hardware is needed, 5G NR and 5G-Advanced offer ideal throughput and latency conditions.
IVAS was co-developed by 13 companies under the framework of IVAS Codec Public Collaboration. Nokia played a key role in this effort when IVAS was approved by the global telecommunications standards body 3GPP in Release 18.
Nokia has also proactively developed the Immersive Voice platform—an end-to-end spatial calling solution that supports both IVAS and Opus codecs.
With Nokia Immersive Voice, you can trial IVAS in:
- One-to-one calls
- Multi-party calls
- Audio and video conferencing
- XR communication
Both are immersive audio formats, but they serve different purposes:
MASA (Metadata-Assisted Spatial Audio) describes how discrete sound sources should be positioned in space. It’s lightweight, and ideal for interactive, real-time rendering.
Ambisonics captures the entire 3D sound field using spherical harmonics. It’s scene-based, capturing a full environment rather than individual sources—commonly used in VR and 360° video.
No. Any functioning headphones will work.
However, for the best experience, headphones with head-tracking can take advantage of IVAS spatial metadata for dynamic rendering, offering more life-like immersion.
Yes, but with some caveats.
A smartphone with at least two microphones and two speakers can capture and play back spatial audio. But for optimal results headphones are preferred, as each of your ear receive a dedicated processed signal. Without headphones, orientation, distance, and reflections can impact perception.
That’s why device-specific tuning is essential for IVAS implementations. The technology is also adaptable to multi-speaker environments, like cars, meeting rooms, and home entertainment systems.