Multimedia research and standardization
The latest multimedia technology innovation from Nokia
Our portfolio of innovations continues to grow thanks to our ongoing investment in multimedia R&D and our internationally acclaimed team of experts. The work of our inventors in video research and standardization has been recognized with numerous prestigious awards, including five Technology & Engineering Emmy® Awards.
Significant improvement for temporal consistency in video semantic segmentation
Semantic segmentation is a far tricker task for video than for static images, either resulting in temporally inconsistent – or costly and inaccurate – predictions. Momentum Adapt is an unsupervised online method that improves temporal performance to deliver the consistency your AI applications need. Uncover how this approach outperforms state-of-the-art algorithms in adapting to even the most severe environmental changes.
Learned video/image compression
Getting the full picture of lossless image codecs
Image codec performance can be efficiently enhanced through domain adaptation – but its adaptation overhead can compromise its gain. This is where an adaptive multi-scale progressive probability model delivers: effective domain adaptation without the significant overhead. See how this technique could reduce the bitstream size of lossless image codecs by up to 4.8%.
Want to enhance your lossless image compression? Read the whitepaper from Honglei Zhang, Francesco Cricri, Nannan Zou, Hamed R. Tavakoli and Miska M. Hannuksela.
New AI frontiers for image compression
For the last 30 years, image and video compression algorithms have been designed by engineers – but changes may be afoot. With artificial intelligence set to step up the game, model overfitting at inference time may be necessary to improve the efficiency for learning-based codecs. Learn why Nokia is exploring the potential for modified neural networks to streamline the compression process.
Neural Network Compression
Temporal dependencies: the life hack for federated learning
Federated learning (FL) mitigates some long-lasting challenges of large-scale machine learning including privacy and computation costs, but it also comes with bandwidth challenges of its own. Discover how temporal dependencies are key to improving the communication efficiency in FL without sacrificing model accuracy.
Versatile Video Coding (VVC)
VVC: A great all-rounder for immersive video
Immersive video, with its wide range of exciting content types and services, is taking over the show from conventional 2D. Discover why the Versatile Video Coding (VVC) rules the roost when it comes to immersive video compression and implementing advanced features.
VVC caught your eye? Learn more about it in the article by Miska M. Hannuksela and Sachin Deshpande.
Neural network based video post-processing, this time with content adaptation
Decoded video is usually affected by coding artefacts. This can be alleviated by post-processing - for example using neural network based filters - and better filtering can be achieved by adapting the neural network to the video content. However, this comes with a bitrate overhead. In our paper, we show how efficient content adaptation can be performed, with the aid of the MPEG NNR standard for compressing the adaptation signal.
A new low latency feature for Versatile Video Coding
Everything from video conferencing to computer vision depends on keeping latency low. We have developed Gradual Decoding Refresh (GDR), a new feature that builds on Versatile Video Coding (VVC). Learn how GDR alleviates delay issues related to intra coded pictures – putting them on par with their inter coded counterparts – and maximizes coding efficiency while minimizing leaks.
Video/image coding for machines
Less distraction, more machine learning action
E2E learned compression may take the lead in image coding for machines, but its insufficient flexibility in adaptively allocating bits can sacrifice machine vision performance. Leveraging Regions-of-Interest can minimize the bits allocated for backgrounds, resulting in reduced bitrates while retaining the accuracy of machine tasks. Learn more about how this method can achieve impressive gains within learned image codecs.
Ready to find out more? Read the whitepaper by Jukka I. Ahonen, Nam Le, Honglei Zhang, Francesco Cricri and Esa Rahtu.
Eliminating numerical instability from convolutional neural networks’ equations
Convolutional neural networks can unlock extraordinary tools for image and video coding, but their limited precision in floating point arithmetic is inescapably problematic. Our post-training quantization technique stops data corruption in its tracks, dividing operations between integer and floating-point domains for maximum numerical stability. See how this technique can realize uncompromised deep learning performance across a variety of platforms.
Ready for better machine performance? Take a look at the whitepaper by Honglei Zhang, Nam Le, Francesco Cricri, Jukka Ahonen and Hamed Rezazadegan Tavakoli.
Vision enhanced for human- and machine-kind
Images compressed with neural network-based codecs are often plagued with checkerboard artifacts, degrading picture quality for human, if not machine, eyes. In steps a new codec fine-tuning technique to remove these problematic artifacts, enhancing details for humans and retaining machine performance at no extra cost. Discover how every vision can benefit from this technique.
Machine oriented image compression: a content-adaptive approach
An increasing amount of videos and images are watched by computer algorithms instead of humans. Our research considers how image coding can adapt to non-human eyes, with implications for smart cities, factory robotics, security and much more. Discover how an inference-time content-adaptive approach can improve compression efficiency for machine-consumption without modifying codec parameters.
Visual volumetric coding
Dynamic mesh coding: Realizing photorealistic metaverse experiences on every device
Dynamic meshes bring immersive experiences to life, but their full potential can only be unleashed by standards that ensure interoperability. Initially designed for point clouds, the recent MPEG Visual Volumetric Video-based Coding (V3C) framework can extend its talents to efficiently encode and decode these dynamic meshes – on any device. Discover how this approach exceeds the compression performance of today’s best prior art to support tomorrow’s metaverse experiences.
Ready to unlock new immersive opportunities? Get the article by Patrice Rondao Alface, Aleksei Martemianov, Lauri Ilola, Lukasz Kondrad, Christoph Bachhuber and Sebastian Schwarz.
Breaking the barriers of immersive content with volumetric video
Virtual, augmented and mixed reality applications are on the rise, and volumetric video is the fundamental technology enabling the exploration of real-world captured immersive content. Learn how the family of Visual Volumetric Video-based Coding (V3C) standards efficiently code, store and transport volumetric video content with 6 degrees of freedom.
Real-time decoding goes mobile with point cloud compression
From education to entertainment, capturing the real world in multi-dimensional immersive experiences presents a multitude of opportunities – alongside data-heavy complications. The release of the MPEG standard for video-based point cloud compression (V-PCC) for mobile is an immersive media gamechanger. Discover how V-PCC distribution and storage, and real-time decoding can now be achieved on every single media device on the market.
Navigating realities in 3-Dimensions with Point Cloud Compression
Point clouds are integral to immersive digital representations, enabling quick 3D assessments for navigating autonomous vehicles, robotic sensing and other use cases. This level of innovation requires massive amounts of data – and that’s where Point Cloud Compression (PCC) comes in. See how PCC lightens point cloud transmission for current and next-generation networks.
Discover more in the article by Sebastian Schwarz, Marius Preda, Vittorio Baroncini, Madhukar Budagavi, Pablo Cesar, Philip A. Chou, Robert A. Cohen, Maja Krivokuća, Sébastien Lasserre, Zhu Li, Joan Llach, Khaled Mammou, Rufael Mekuria, Ohji Nakagami, Ernestasia Siahaan, Ali Tabatai, Alexis M. Tourapis, and Vladyslav Zakharchenko.
Growing OMAF’s vision in its second generation
Omnidirectional Media Format (OMAF) was the first VR standard to store and distribute immersive media. Now its second edition has its sights set on even more, building upon its predecessor’s best features from overlays to multiple viewpoints. Unveil how to leverage these tools for maximum quality of experience in immersive applications.