Transparent AI in Nokia’s audio product development

by Konstantinos Drosos , Mikko Heikkinen

10 Apr 2024

Nokia’s cutting-edge audio products, such as OZO Audio and Immersive Voice, provide mobile device owners with sounds of unparallelled immersion, clarity and focus. This level of quality requires extensive exploration, analysis and testing at every step of the development process. Many of the smart features of these audio solutions - such as applying noise reduction - are developed by training algorithms. Our research and engineering team runs audio data through machine-learning (ML) models, analyzes the output and tweaks the algorithms until we get the desired result for our product.

Developing an ML model is a complex task that requires multiple parameters to be tuned. These range from specifying the type and complexity of the ML model, to defining the speed at which the model should learn. It’s not uncommon to try out hundreds of parametric combinations when creating a top-performing ML model for a specific use case or product. Even the smallest of changes to these parameters can have a massive impact. To trace every parameter and consistently reproduce our results, we need to have transparency and traceability in our ML code and processes. This entails tracing back each parameter to identify its impact on the developed ML model.

At the same time, creating a top-performing ML model requires lots of data. We use various sources of data to develop ML models for our audio solutions. Some of it is gathered in-house through our own R&D work, and some of it is open-source data from external research communities. It may be people talking, ambient environmental noise, synthetic sounds created with simulations, or metadata from sound-wave analyses.

We use the software tools, principles, practices and policies of Machine Learning Operations (MLOps) for our audio development work to protect the traceability, transparency, and trustworthiness of our ML models and the data used for developing these models. By ensuring the consistent quality of our ML models, we can also ensure that our results are reproducible. We enable and empower our MLOps by various open-source tools, capable of allowing transparency and traceability. The configuration-management tool, Hydra, is central to this work, as it logs the parameters used in the experiments that yield our models. We use Hydra together with PyTorch Lighting, a framework that helps to speed up our development work by bringing all our tools together. This ensures we can reproduce our results every time.

SLURM is another important tool of ours. It is a cluster-management, resource-allocation and job-scheduling system that shows which user ran a specific job and with what resources. We use SLURM together with the container system Apptainer, allowing us to transfer our code seamlessly and transparently between various allocated resources in our datacenter. This way we can transfer and manage our jobs without the challenges that arise if different versions of the same software are used.

We trace the output of SLURM through MLFlow - an end-to-end platform for managing the ML lifecycle. MLFlow logs everything from the SLURM job ID to the software version and the data used for a specific job.

Source code version control is a paramount tool for software development. Our processes ensure we version our data as well as our code. We use a combination of DVC, Artifactory and Git. Together with MLFlow and the above-mentioned tools, our versioning tools ensure comprehensive traceability and transparency across every aspect of our work.

Our decision to use all these MLOps principles and tools enables us to ensure that our models meet the highest ethical standards and are fully reproducible. This way of working is not mandatory - there is no compliance requirement guiding our approach. It’s a conscious and ethical choice we have made to guarantee to our customers that our ML models are trustworthy, and that the data integrity of our products is sound. In this way, our audio product portfolio exemplifies Nokia’s commitment to ethical, transparent and reproducible AI.

It’s all about fostering our customers’ trust in our work so they can use Nokia’s cutting-edge technology with complete peace of mind.

About Konstantinos Drosos

Konstantinos (Kostas) Drosos is a principal audio machine learning scientist at Nokia. He is the author or co-author of over 50 scientific papers and an acting reviewer in various journals and international conferences, and is considered as a pioneer in different audio machine learning tasks. He is involved in the research and development of deep learning based methods in OZO Audio.

Connect with Kostas on LinkedIn

About Mikko Heikkinen

Mikko Heikkinen is a principal software engineer at Nokia with a broad background in developing advanced multimedia technologies. He is a trusted software generalist currently contributing to the development of OZO audio technologies. He holds several granted patents and patent applications and conducts research in machine learning applied to audio processing.

Connect with Mikko on LinkedIn