Skip to main content

Multimedia research and standardization




The latest multimedia technology innovation from Nokia

Our portfolio of innovations continues to grow thanks to our ongoing investment in multimedia R&D and our internationally acclaimed team of experts. The work of our inventors in video research and standardization has been recognized with numerous prestigious awards, including five Technology & Engineering Emmy® Awards. 

Computer Vision

Significant improvement for temporal consistency in video semantic segmentation

Semantic segmentation is a far tricker task for video than for static images, either resulting in temporally inconsistent – or costly and inaccurate – predictions. Momentum Adapt is an unsupervised online method that improves temporal performance to deliver the consistency your AI applications need. Uncover how this approach outperforms state-of-the-art algorithms in adapting to even the most severe environmental changes. 

Find out more about this novel approach to improving semantic segmentation performance in the whitepaper by Amirhossein Hassankhani, Hamed Rezazadegan Tavakoli and Esa Rahtu.

Meet Momentum Adapt

Learned video/image compression

Getting the full picture of lossless image codecs

Image codec performance can be efficiently enhanced through domain adaptation – but its adaptation overhead can compromise its gain. This is where an adaptive multi-scale progressive probability model delivers: effective domain adaptation without the significant overhead. See how this technique could reduce the bitstream size of lossless image codecs by up to 4.8%.

Want to enhance your lossless image compression? Read the whitepaper from Honglei Zhang, Francesco Cricri, Nannan Zou, Hamed R. Tavakoli and Miska M. Hannuksela.

New AI frontiers for image compression

For the last 30 years, image and video compression algorithms have been designed by engineers – but changes may be afoot. With artificial intelligence set to step up the game, model overfitting at inference time may be necessary to improve the efficiency for learning-based codecs. Learn why Nokia is exploring the potential for modified neural networks to streamline the compression process.

Discover more from the article by Honglei Zhang, Francesco Cricri, Hamed R. Tavakoli, Maria Santamaria, Yat-Hong Lam, and Miska M. Hannuksela

Neural Network Compression

Temporal dependencies: the life hack for federated learning

Federated learning (FL) mitigates some long-lasting challenges of large-scale machine learning including privacy and computation costs, but it also comes with bandwidth challenges of its own. Discover how temporal dependencies are key to improving the communication efficiency in FL without sacrificing model accuracy.

Realize the power of efficient federation learning in the whitepaper by Homayun Afrabandpey, Goutham Rangu, Honglei Zhang, Francesco Criri, Emre Aksu and Hamed R. Tavakoli.

Temporal dependencies of weight updates in communication efficient FL

Versatile Video Coding (VVC)

VVC: A great all-rounder for immersive video

Immersive video, with its wide range of exciting content types and services, is taking over the show from conventional 2D. Discover why the Versatile Video Coding (VVC) rules the roost when it comes to immersive video compression and implementing advanced features.

VVC caught your eye? Learn more about it in the article by Miska M. Hannuksela and Sachin Deshpande

VVC for immersive video streaming

Neural network based video post-processing, this time with content adaptation

Decoded video is usually affected by coding artefacts. This can be alleviated by post-processing - for example using neural network based filters - and better filtering can be achieved by adapting the neural network to the video content. However, this comes with a bitrate overhead. In our paper, we show how efficient content adaptation can be performed, with the aid of the MPEG NNR standard for compressing the adaptation signal.

Ready to learn more? Read the article by Maria Santamaria, Francesco Cricri, Jani Lainema, Ramin G. Youvalari, Honglei Zhang and Miska M. Hannuksela.

Content-adaptive neural network post-processing filter

A new low latency feature for Versatile Video Coding

Everything from video conferencing to computer vision depends on keeping latency low. We have developed Gradual Decoding Refresh (GDR), a new feature that builds on Versatile Video Coding (VVC). Learn how GDR alleviates delay issues related to intra coded pictures – putting them on par with their inter coded counterparts – and maximizes coding efficiency while minimizing leaks. 

Dive deeper into the topic with Limin Wang, Seungwook Hong and Krit Panusopone

Gradual Decoding Refresh (GDR)

Video/image coding for machines

Less distraction, more machine learning action

E2E learned compression may take the lead in image coding for machines, but its insufficient flexibility in adaptively allocating bits can sacrifice machine vision performance. Leveraging Regions-of-Interest can minimize the bits allocated for backgrounds, resulting in reduced bitrates while retaining the accuracy of machine tasks. Learn more about how this method can achieve impressive gains within learned image codecs.

Ready to find out more? Read the whitepaper by Jukka I. Ahonen, Nam Le, Honglei Zhang, Francesco Cricri and Esa Rahtu.

Machines, let's ditch the distractions

Eliminating numerical instability from convolutional neural networks’ equations

Convolutional neural networks can unlock extraordinary tools for image and video coding, but their limited precision in floating point arithmetic is inescapably problematic. Our post-training quantization technique stops data corruption in its tracks, dividing operations between integer and floating-point domains for maximum numerical stability. See how this technique can realize uncompromised deep learning performance across a variety of platforms.

Ready for better machine performance? Take a look at the whitepaper by Honglei Zhang, Nam Le, Francesco Cricri, Jukka Ahonen and Hamed Rezazadegan Tavakoli.

Convolutional neural networks need accuracy

Vision enhanced for human- and machine-kind

Images compressed with neural network-based codecs are often plagued with checkerboard artifacts, degrading picture quality for human, if not machine, eyes. In steps a new codec fine-tuning technique to remove these problematic artifacts, enhancing details for humans and retaining machine performance at no extra cost. Discover how every vision can benefit from this technique.

Set your sights on clearer end-to-end coded images in the whitepaper by Nam Le, Honglei Zhang, Francesco Cricri, Ramin G. Youvalari, Hamed R. Tavakoli, Emre Aksu, Miska M. Hannuksela and Esa Rahtu.

Compressed images looking like a chessboard?

Machine oriented image compression: a content-adaptive approach

An increasing amount of videos and images are watched by computer algorithms instead of humans. Our research considers how image coding can adapt to non-human eyes, with implications for smart cities, factory robotics, security and much more. Discover how an inference-time content-adaptive approach can improve compression efficiency for machine-consumption without modifying codec parameters.

Want to learn more? Read the article by Nam Le, Honglei Zhang, Francesco Cricri, Ramin Ghaznavi-Youvalari, Hamed R. Tavakoli, Esa Rahtu

Machines are watching

Visual volumetric coding

Dynamic mesh coding: Realizing photorealistic metaverse experiences on every device

Dynamic meshes bring immersive experiences to life, but their full potential can only be unleashed by standards that ensure interoperability. Initially designed for point clouds, the recent MPEG Visual Volumetric Video-based Coding (V3C) framework can extend its talents to efficiently encode and decode these dynamic meshes – on any device. Discover how this approach exceeds the compression performance of today’s best prior art to support tomorrow’s metaverse experiences.

Ready to unlock new immersive opportunities? Get the article by Patrice Rondao Alface, Aleksei Martemianov, Lauri Ilola, Lukasz Kondrad, Christoph Bachhuber and Sebastian Schwarz.

V3C-based Coding of Dynamic Meshes

Breaking the barriers of immersive content with volumetric video

Virtual, augmented and mixed reality applications are on the rise, and volumetric video is the fundamental technology enabling the exploration of real-world captured immersive content. Learn how the family of Visual Volumetric Video-based Coding (V3C) standards efficiently code, store and transport volumetric video content with 6 degrees of freedom. 

Curious to know more? Read the article by Lauri Ilola, Lukasz Kondrad, Sebastian Schwarz and Ahmed Hamza


Real-time decoding goes mobile with point cloud compression

From education to entertainment, capturing the real world in multi-dimensional immersive experiences presents a multitude of opportunities – alongside data-heavy complications. The release of the MPEG standard for video-based point cloud compression (V-PCC) for mobile is an immersive media gamechanger. Discover how V-PCC distribution and storage, and real-time decoding can now be achieved on every single media device on the market. 

Find out more in this article by Sebastian Schwarz and Mika Pesonen

Real-time decoding

Navigating realities in 3-Dimensions with Point Cloud Compression

Point clouds are integral to immersive digital representations, enabling quick 3D assessments for navigating autonomous vehicles, robotic sensing and other use cases. This level of innovation requires massive amounts of data – and that’s where Point Cloud Compression (PCC) comes in. See how PCC lightens point cloud transmission for current and next-generation networks.

Discover more in the article by Sebastian Schwarz, Marius Preda, Vittorio Baroncini, Madhukar Budagavi, Pablo Cesar, Philip A. Chou, Robert A. Cohen, Maja Krivokuća, Sébastien Lasserre, Zhu Li, Joan Llach, Khaled Mammou, Rufael Mekuria, Ohji Nakagami, Ernestasia Siahaan, Ali Tabatai, Alexis M. Tourapis, and Vladyslav Zakharchenko.

Emerging MPEG standards for Point Cloud Compression

360-degree video

Growing OMAF’s vision in its second generation

Omnidirectional Media Format (OMAF) was the first VR standard to store and distribute immersive media. Now its second edition has its sights set on even more, building upon its predecessor’s best features from overlays to multiple viewpoints. Unveil how to leverage these tools for maximum quality of experience in immersive applications.

Prepare for incredible immersion by reading the whitepaper by Burak Kara, Mehmet N. Akcay, Ali C. Begen, Saba Ahsan, Igor D.D. Curcio, Kashyap Kammachi-Sreedhar and Emre Aksu.

Benchmarking the 2nd edition of OMAF