Multimedia research and standardization

Multimedia research and standardization

The latest multimedia technology innovation from Nokia

Our portfolio of innovations continues to grow thanks to our ongoing investment in multimedia R&D and our internationally acclaimed team of experts. The work of our inventors in video research and standardization has been recognized with numerous prestigious awards, including five Technology & Engineering Emmy® Awards.

man with VR glasses

Learn about licensing our multimedia inventions

On this page

Computer Vision
Immersive Voice and Audio Services (IVAS)
Learned video/image compression
Neural Network Compression
Next Generation Immersive Audio
Next-Generation Video Coding
Versatile Video Coding (VVC)
Video/image coding for machines
Visual volumetric coding
360-degree video
Learn more

What’s in the store for the future of entertainment?

A new report by MIT Technology Review Insights, produced in partnership with Nokia, explores how emerging tech is remaking the media and entertainment industries. Click here to read the full report.

Standards as infrastructure

Girl

The media business, reimagined

Computer Vision

Hard and fast point sampling with NeRF

Neural radiance field (NeRF) training can create incredibly complex 3D scenes from 2D images – at a high computational cost. Our method uses hard sampling to efficiently optimize these networks at twice the speed, saving both time and memory otherwise lost through traditional random sampling. Imagine NeRF’s potential in efficiently representing a variety of multimedia content (beyond just scene representation) with Nokia-enabled consumer-grade hardware compatibility.

Like the sound of speeding up and reducing the memory cost of neural network training down to inference levels? Read the paper by Juuso Korhonen, Goutham Rangu, Hamed Rezazadegan Tavakoli and Juho Kannala to learn more.

Significant improvement for temporal consistency in video semantic segmentation

Semantic segmentation is a far tricker task for video than for static images, either resulting in temporally inconsistent – or costly and inaccurate – predictions. Momentum Adapt is an unsupervised online method that improves temporal performance to deliver the consistency your AI applications need. Uncover how this approach outperforms state-of-the-art algorithms in adapting to even the most severe environmental changes.

Find out more about this novel approach to improving semantic segmentation performance in the whitepaper by Amirhossein Hassankhani, Hamed Rezazadegan Tavakoli and Esa Rahtu.

Immersive Voice and Audio Services (IVAS)

A brave immersive world for audio communication

Enhanced voice services have delivered a lot of benefits, but their monophonic nature prevents communication use cases from achieving sought-after audio immersion. Now with the Immersive Voice and Audio Services (IVAS) codec, immersive audio can be experienced via 5G mobile systems for the first time ever. Discover how IVAS handles various network conditions for mobile spatial audio calls and more.

Read the paper by Markus Multrus, Stefan Bruhn, Juan Torres, Eleni Fotopoulou, Tomas Toftgård, Erik Norvell, Stefan Döhla, Yuan Gao, Huan-yu Su, Lasse Laaksonen, Adriana Vasilache, Takehiro Moriya, Stéphane Ragot, Marc Emerit, Hiroyuki Ehara, Marek Szczerba, Andrea Genovese, Andre Schevciw, Václav Eksler, and Vladimir Malenovsky.

Immersive Voice and Audio Services (IVAS)

Immersive Voice and Audio Services (IVAS)

Immersive sound is calling with Metadata-Assisted Spatial Audio

Many immersive audio formats can only be experienced offline. The metadata-assisted spatial audio (MASA) format supported by 3GPP IVAS can describe audio scenes for live applications such as phone calls, delivering captivating low-delay audio experiences for the users. Find out how MASA can recreate life-like human rendering of audio scenes using spatial parameters.

Capture the magic of MASA by reading the paper from Jouni Paulus, Lasse Laaksonen, Tapani Pihlajakuja, Mikko-Ville Laitinen, Juha Vilkamo and Adriana Vasilache.

Bringing richness, quality and more lifelike interaction to voice and video calling in 5G advanced

IVAS stands for Immersive Voice and Audio Services, and it is a new voice and audio codec standardized by 3GPP. It is part of 3GPP Rel. 18. IVAS is the first 3GPP standard for transmitting conversational stereo and immersive voice and audio.

The IVAS codec enables live immersive audio for any device form factor, bringing people together for real-life interaction with accurate and immersive three dimensional rendering of captured sound.
Nokia is participating in IVAS standardization in 3GPP and one of its most active contributors and proponents.

Read more about IVAS in 3GPP Rel.18 and in our whitepaper

Immersive Voice and Audio Services (IVAS)

Learned video/image compression

Content-adaptive video compression doesn’t miss a thing

Neural networks that are trained too closely on big datasets may struggle to reconstruct compressed video accurately. Content-adaptive neural network tools such as loop-filters can positively exploit this overfitting, improving coding efficiency with negligible decoding costs. Find out how content-adaptation can also enable decoding customization for specific content in addition to these major coding gains.

Discover the leading research by Ruiying Yang, Maria Santamaria, Francesco Cricri, Honglei Zhang, Jani Lainema, Ramin G. Youvalari, Miska M. Hannuksela and Tapio Elomaa.

Getting the full picture of lossless image codecs

Image codec performance can be efficiently enhanced through domain adaptation – but its adaptation overhead can compromise its gain. This is where an adaptive multi-scale progressive probability model delivers: effective domain adaptation without the significant overhead. See how this technique could reduce the bitstream size of lossless image codecs by up to 4.8%.

Want to enhance your lossless image compression? Read the whitepaper from Honglei Zhang, Francesco Cricri, Nannan Zou, Hamed R. Tavakoli and Miska M. Hannuksela.

New AI frontiers for image compression

For the last 30 years, image and video compression algorithms have been designed by engineers – but changes may be afoot. With artificial intelligence set to step up the game, model overfitting at inference time may be necessary to improve the efficiency for learning-based codecs. Learn why Nokia is exploring the potential for modified neural networks to streamline the compression process.

Discover more from the article by Honglei Zhang, Francesco Cricri, Hamed R. Tavakoli, Maria Santamaria, Yat-Hong Lam, and Miska M. Hannuksela

Neural Network Compression

Temporal dependencies: the life hack for federated learning

Federated learning (FL) mitigates some long-lasting challenges of large-scale machine learning including privacy and computation costs, but it also comes with bandwidth challenges of its own. Discover how temporal dependencies are key to improving the communication efficiency in FL without sacrificing model accuracy.

Realize the power of efficient federation learning in the whitepaper by Homayun Afrabandpey, Goutham Rangu, Honglei Zhang, Francesco Criri, Emre Aksu and Hamed R. Tavakoli.

Next Generation Immersive Audio

6DoF rendering: immersive audio, wherever you are

Authentic audio is essential for AR/VR applications, but recreating a captured audio scene for the listener presents a significant challenge. Our six degrees of freedom (6DoF) rendering method enables life-like playback of audio scenes captured with multiple microphone arrays, allowing listeners to move freely within the scene, even beyond the area covered by the microphones.

Eager to hear more? Read the award-winning article by Jussi Leppänen, Archontis Politis, Mikko-Ville Laitinen, Lauros Pajunen and Antti Eronen.

Next-Generation Video Coding

More, more, more: Convolutional cross-component modeling answers streaming demands

Our growing appetite for streaming high-quality media seems insatiable, driving the need for new and advanced coding technologies. Nokia's convolutional cross-component modeling (CCCM) approach excels in next-generation video coding, utilizing advanced filtering for cross-component prediction. Learn how Nokia is leading the way in this exciting technological advancement, achieving significant bit rate reductions compared to current codecs.

Curious about what this means for video streaming? Read the paper by Pekka Astola, Alireza Aminlou, Ramin G. Youvalari, and Jani Lainema to find out more.

Versatile Video Coding (VVC)

Helping machines and humans to see clearly with multi-layer VVC

Humans often need to be kept in the loop on machine-vision tasks, but methods that optimize video for machines often sacrifice the viewing quality for people. The VVC multi-layer scheme ensures that baseline video is always available for machine analysis while enhanced content can be viewed ad hoc by humans. Explore how hybrid VVC solutions benefit everyone (and every thing).

Envision multi-layer VVC benefits with the paper by J. Laitinen, T. Partanen, A. Mercat, J. Vanne, M. Hannuksela, H. Zhang, A. Aminlou and F. Cricri.

Multi-layer videos, now in sync with (CR) SEI

Today’s leading video coding standards may have the ability to support multi-layer videos, but their most widely deployed implementations do not. The Constituent Rectangles SEI message says goodbye to ad-hoc compositions by enabling applications to combine multiple synchronized videos into a single one.

Get the latest on how (CR) SEI messages can flexibly address numerous media use cases. Dive into the article by Jill Boyce and Miska M. Hannuksela on SMPTE Motion Imaging Journal here.

(CR) SEI message for multi-layer video

(CR) SEI message for multi-layer video

A pathway to VVC-based broadcasting and streaming

With 50% greater performance better efficiency than HEVC, Versatile Video Coding (VVC) is a dream for broadcast and streaming – if you know how to use it. Thankfully The Media Coding Industry Forum (MC-IF) has published the first technical guidelines for broadcast and streaming applications to help you navigate this state-of-the-art standard. Discover best practices for compression performance, interoperability, bitrate ranges and more.

Get started on your next steps with MC-IF’s technical guidelines for broadcast and streaming applications.

VVC: A great all-rounder for immersive video

Immersive video, with its wide range of exciting content types and services, is taking over the show from conventional 2D. Discover why the Versatile Video Coding (VVC) rules the roost when it comes to immersive video compression and implementing advanced features.

VVC caught your eye? Learn more about it in the article by Miska M. Hannuksela and Sachin Deshpande.

Neural network based video post-processing, this time with content adaptation

Decoded video is usually affected by coding artefacts. This can be alleviated by post-processing - for example using neural network based filters - and better filtering can be achieved by adapting the neural network to the video content. However, this comes with a bitrate overhead. In our paper, we show how efficient content adaptation can be performed, with the aid of the MPEG NNR standard for compressing the adaptation signal.

Ready to learn more? Read the article by Maria Santamaria, Francesco Cricri, Jani Lainema, Ramin G. Youvalari, Honglei Zhang and Miska M. Hannuksela.

A new low latency feature for Versatile Video Coding

Everything from video conferencing to computer vision depends on keeping latency low. We have developed Gradual Decoding Refresh (GDR), a new feature that builds on Versatile Video Coding (VVC). Learn how GDR alleviates delay issues related to intra coded pictures – putting them on par with their inter coded counterparts – and maximizes coding efficiency while minimizing leaks.

Dive deeper into the topic with Limin Wang, Seungwook Hong and Krit Panusopone

Video/image coding for machines

Competitive learning: the content-specific post-processing frontier

For machines intending to perform vision tasks, adapting reconstructed human-ready videos is a must. But how do we address artifacts caused by varying compression rates and unique content? Joint optimization pits content-specific filters against each other for the right to post-process video with fewer resulting artifacts. Discover how this competitive learning can result in greatly improved performances of reconstructed data.

Ready to claim victory over video artifacts? Read the paper by Honglei Zhang, Jukka I. Ahonen, Nam Le, Ruiying Yang, Francesco Cricri.

NN-VVC: A sight for all eyes

While video compressing technologies are traditionally tailored to human viewers, growing AI activity is driving up the demand for machine consumption too. NN-VVC combines machine learning and conventional codecs to optimize video compression, transmission and storage for both human and machine consumption. Learn how this ground-breaking research surpassed today’s state-of-the-art codecs to win IEEE ISM 2023’s Best Paper Award.

Discover the award-winning paper by Jukka I. Ahonen, Nam Le, Honglei Zhang, Antti Hallapuro, Francesco Cricri, Hamed Rezazadegan Tavakoli, Miska M. Hannuksela and Esa Rahtu.

Less distraction, more machine learning action

E2E learned compression may take the lead in image coding for machines, but its insufficient flexibility in adaptively allocating bits can sacrifice machine vision performance. Leveraging Regions-of-Interest can minimize the bits allocated for backgrounds, resulting in reduced bitrates while retaining the accuracy of machine tasks. Learn more about how this method can achieve impressive gains within learned image codecs.

Ready to find out more? Read the whitepaper by Jukka I. Ahonen, Nam Le, Honglei Zhang, Francesco Cricri and Esa Rahtu.

Eliminating numerical instability from convolutional neural networks’ equations

Convolutional neural networks can unlock extraordinary tools for image and video coding, but their limited precision in floating point arithmetic is inescapably problematic. Our post-training quantization technique stops data corruption in its tracks, dividing operations between integer and floating-point domains for maximum numerical stability. See how this technique can realize uncompromised deep learning performance across a variety of platforms.

Ready for better machine performance? Take a look at the whitepaper by Honglei Zhang, Nam Le, Francesco Cricri, Jukka Ahonen and Hamed Rezazadegan Tavakoli.

Vision enhanced for human- and machine-kind

Images compressed with neural network-based codecs are often plagued with checkerboard artifacts, degrading picture quality for human, if not machine, eyes. In steps a new codec fine-tuning technique to remove these problematic artifacts, enhancing details for humans and retaining machine performance at no extra cost. Discover how every vision can benefit from this technique.

Set your sights on clearer end-to-end coded images in the whitepaper by Nam Le, Honglei Zhang, Francesco Cricri, Ramin G. Youvalari, Hamed R. Tavakoli, Emre Aksu, Miska M. Hannuksela and Esa Rahtu.

Machine oriented image compression: a content-adaptive approach

An increasing amount of videos and images are watched by computer algorithms instead of humans. Our research considers how image coding can adapt to non-human eyes, with implications for smart cities, factory robotics, security and much more. Discover how an inference-time content-adaptive approach can improve compression efficiency for machine-consumption without modifying codec parameters.

Want to learn more? Read the article by Nam Le, Honglei Zhang, Francesco Cricri, Ramin Ghaznavi-Youvalari, Hamed R. Tavakoli, Esa Rahtu

Visual volumetric coding

Dynamic mesh coding: Realizing photorealistic metaverse experiences on every device

Dynamic meshes bring immersive experiences to life, but their full potential can only be unleashed by standards that ensure interoperability. Initially designed for point clouds, the recent MPEG Visual Volumetric Video-based Coding (V3C) framework can extend its talents to efficiently encode and decode these dynamic meshes – on any device. Discover how this approach exceeds the compression performance of today’s best prior art to support tomorrow’s metaverse experiences.

Ready to unlock new immersive opportunities? Get the article by Patrice Rondao Alface, Aleksei Martemianov, Lauri Ilola, Lukasz Kondrad, Christoph Bachhuber and Sebastian Schwarz.

Dynamic Meshes

V3C-based Coding of Dynamic Meshes

Breaking the barriers of immersive content with volumetric video

Virtual, augmented and mixed reality applications are on the rise, and volumetric video is the fundamental technology enabling the exploration of real-world captured immersive content. Learn how to efficiently store and distribute volumetric video, which is encoded with the family of Visual Volumetric Video-based Coding (V3C) standards.

Curious to know more? Read the article by Lauri Ilola, Lukasz Kondrad, Sebastian Schwarz and Ahmed Hamza

Storage and Transport of Visual Volumetric Video-Based Coding

Storage and Transport of Visual Volumetric Video-Based Coding

Real-time decoding goes mobile with point cloud compression

From education to entertainment, capturing the real world in multi-dimensional immersive experiences presents a multitude of opportunities – alongside data-heavy complications. The release of the MPEG standard for video-based point cloud compression (V-PCC) for mobile is an immersive media gamechanger. Discover how V-PCC distribution and storage, and real-time decoding can now be achieved on every single media device on the market.

Find out more in this article by Sebastian Schwarz and Mika Pesonen

Navigating realities in 3-Dimensions with Point Cloud Compression

Point clouds are integral to immersive digital representations, enabling quick 3D assessments for navigating autonomous vehicles, robotic sensing and other use cases. This level of innovation requires massive amounts of data – and that’s where Point Cloud Compression (PCC) comes in. See how PCC lightens point cloud transmission for current and next-generation networks.

Discover more in the article by Sebastian Schwarz, Marius Preda, Vittorio Baroncini, Madhukar Budagavi, Pablo Cesar, Philip A. Chou, Robert A. Cohen, Maja Krivokuća, Sébastien Lasserre, Zhu Li, Joan Llach, Khaled Mammou, Rufael Mekuria, Ohji Nakagami, Ernestasia Siahaan, Ali Tabatai, Alexis M. Tourapis, and Vladyslav Zakharchenko.

360-degree video

I-Frame splicing: a smarter way to stream

Adaptive streaming allows us to “tag in” higher quality segments when network conditions improve. The same mechanism can be used for swapping in low-quality background segments in 360-degree viewport-dependent streaming. But what about all that wasted bandwidth? I-Frame splicing blends pre-downloaded low quality segments with higher ones, enabling better service experiences without wait or waste. Discover how I-Frame splicing can have an immense impact on bandwidth savings for networks and internet traffic dominated by video.

Learn more in the whitepaper by Mehmet N. Akcay, Burak Kara, Ali C. Begen, Saba Ahsan, Igor D.D. Curcio, Kashyap Kammachi-Sreedhar and Emre Aksu.

Growing OMAF’s vision in its second generation

Omnidirectional Media Format (OMAF) was the first VR standard to store and distribute immersive media. Now its second edition has its sights set on even more, building upon its predecessor’s best features from overlays to multiple viewpoints. Unveil how to leverage these tools for maximum quality of experience in immersive applications.

Prepare for incredible immersion by reading the whitepaper by Burak Kara, Mehmet N. Akcay, Ali C. Begen, Saba Ahsan, Igor D.D. Curcio, Kashyap Kammachi-Sreedhar and Emre Aksu.

Learn more

Man wearing haptic gloves

Blog

Feel the future: Exploring thermal haptics in Extended Reality

Abstract digital visualization of colorful curved light trails and particles on a black background

Blog

How AI is Revolutionizing Spatial Audio

Woman wearing smart glasses holding a smart phone

Blog

Split and conquer: High-quality XR for all

A woman sitting on a couch with a bowl of popcorn in her lap, holding a remote control in her hand while watching TV.

Blog

The Future of Video Compression: Is VVC Ready for Prime Time?

From Vision to (Extended) Reality: Nokia brings MPEG V-DMC encoded content to your browser

Blog

From Vision to (Extended) Reality: Nokia brings MPEG V-DMC encoded content to your browser

From the palm of our hands to the heart of our cars

Blog

From the palm of our hands to the heart of our cars

Nokia Immersive Voice: It’s the real voice around you that counts

Blog

Immersive Voice: It’s the real voice around you that counts

Multiview HEVC — Stereoscopic video on steroids

Blog

Multiview HEVC — Stereoscopic video on steroids