Audio communication gets real

by Adriana Vasilache , Lasse Laaksonen

7 Mar 2023

Imagine having a call with your friends or family who live in another country and hearing their voices as if you were all in the same place, together. Or imagine walking on a beach and calling a loved one to share the richness of the soundscape with waves, seagulls, and wind in your hair. With spatial audio, the imagination turns into reality. Even a mundane teleconference becomes a lot more enjoyable as each participant’s voice is placed in a different direction, making it sound like you’re having a conversation around the same table. Keep reading and you will find out more!

Traditional voice communication has focused on mono

Nokia has played an integral part in developing voice coding technologies for each generation of cellular standards since the very beginning of mobile communication. These 3GPP voice codec standards are crucial components of the communication systems that make today’s connected world possible.

Voice codecs help transmit speech signals as accurately as possible between callers, enabling the people speaking on the phone to hear each other clearly. Low latency between the moment a word is spoken and the moment the corresponding sound reaches the receiver prevents talking over each other and ensures that the conversation flows naturally.

Traditionally, only the speech signal has been transmitted whilst background signals have been removed as unwanted noise. Any real-life sound environment, however, is much more complex than just speech and noise, with various sounds and sound reflections coming from every direction creating a feel of presence in a space.

Until recently, voice communication has focused on mono; a single audio channel. Even today, most phone calls are made by lifting the phone to one’s ear. This means that the speech signal can be captured with only one microphone and reproduced with only one speaker – failing to convey the full sound experience. Most smartphones already have more than one microphone making it possible to capture more audio channels. Using more than one audio channel to transmit the full sound experience is paving the way for what is known as spatial audio.

IVAS – the new spatial voice and audio codec standard

3GPP SA4, the technical specification group responsible for codecs, such as the Enhanced Voice Services (EVS) codec, is now breaking new ground in spatial audio with the standardization of the EVS Codec Extension for Immersive Voice and Audio Services (IVAS). The IVAS codec builds on the success of the EVS codec that is already widely deployed in various devices and networks around the world. Again, Nokia is an essential contributor to the development of this codec.

The IVAS codec will be interoperable with EVS and offers a straightforward upgrade for 3GPP voice services. IVAS is not merely an extension of the old mono channel world of communications, but a true revolution – with IVAS, for the first time ever, mobile communication becomes spatial.

Immersive calls will capture and share the full spatial audio scene. And with the novel Nokia-proposed metadata-assisted spatial audio (MASA) format, any device with multiple microphones can capture spatial audio for IVAS, without the need for special spherical microphone arrays. The IVAS MASA format is specifically optimized for the direct spatial audio pick-up from smartphones. In addition, this format can be derived from other immersive formats via suitable conversions.

The IVAS codec also brings new features to the receiver’s side, where encoded audio is decoded and rendered for listening. In mobile communication and other use cases focused on personal devices, headphone playback is often the most important way of consuming audio content. Head-tracking technology unleashes full capabilities of IVAS and the magic of spatial audio: Just by turning your head to a particular person you will be able to hear their voice more closely. Just like in real life.

Completing the IVAS standard

The IVAS Public Collaboration project was launched in May 2022 with the target of developing a joint candidate for the IVAS codec that meets the objectives of the corresponding 3GPP Work Item, originally launched in 2017. This multi-party development, open to all 3GPP members, is carried out in a transparent and collaborative manner with relevant information publicly available. We are currently on the final stretch of the development work for the IVAS codec selection.

The IVAS standardization process is based on a set of 3GPP-agreed documents, such as design constraints and performance requirements, which have been thoroughly debated over recent years by industry experts. All selection phase documents will be finalized this spring in time for the candidate codec submission. This will be followed by rigorous testing at neutral test laboratories, with a total budget of more than one million euro reserved for testing.

The IVAS codec standardization is expected to be completed in time for 3GPP Release 18 in 2023, delivering the biggest leap forward in voice communication since the very first mobile call.

About Adriana Vasilache

Adriana Vasilache, PhD, is a Principal Researcher at Nokia and a Nokia Bell Labs Fellow. Her expertise is in data modelling and compression, and she is an active contributor to the voice and audio codec development and standardization.

Connect with Adriana on Linkedin

About Lasse Laaksonen

Lasse Laaksonen is the Principal Researcher at Nokia Technologies and a Nokia Bell Labs Distinguished Member of Technical Staff. He heads voice and audio codec development and standardization at Nokia, and he is the Nokia Head of Delegation in 3GPP SA4.

Connect with Lasse on LinkedIn