Standardizing AI in multimedia: our vision for innovation

Artificial Intelligence (AI) is transforming multimedia as we know it, opening doors to more innovative and powerful capabilities. From crafting personalized interactive experiences to transforming 2D visuals into rich immersive 3D environments instantly, global standardization projects are exploring how AI and multimedia work together – driving the industry into a bold new era of innovation where nearly every aspect of multimedia is being reimagined.
But this surge of innovation faces a critical challenge. Unlocking the full potential of AI and multimedia innovation sits at a crossroads. The integration of complex machine learning (ML) models often varies significantly across platforms, hardware and implementations. The rapid and fragmented development of AI capabilities risks creating incongruous tools that are unable to work together seamlessly. It makes it difficult to ensure compatibility, fairness and predictable performance, especially when systems need to work across different environments and devices.
Without coordinated standards to address these kinds of issues, the industry risks developing powerful tools that can’t fully work together, slowing progress and undermining trust in AI-driven media technologies.
AI’s pivotal moment in multimedia is now
Businesses across the multimedia industry are creating their own AI capabilities. Video streaming services are perfecting recommendation engines, many are spearheading 3D immersive experiences thanks to recent emergent AI techniques, photo and video editing apps are enhancing advanced generative capabilities to reimagine users ideas, and audio platforms are developing AI composers to elevate personalized music experiences. All efforts pointing towards a convergent multimedia echo system. Each innovation drives the industry forward, but the challenge is they don’t currently do so together.
Consider two video platforms using different AI-based compression methods. A video optimized by one system may not play properly on another, or might require extra processing work, impacting the viewer experience. The isolation between platforms creates a ripple effect across the entire industry: users face frustrations with incompatibilities between their favorite apps, while companies waste hundreds of thousands building duplicate AI systems, and innovation slows.
The longer this fragmented approach continues, the more the industry drifts from what’s possible. To truly unlock AI’s potential and propel the industry forward, common ground is essential. That’s where MPEG-AI comes in.
The vision for MPEG-AI bridging AI and multimedia
Serving as an umbrella for the standardization activities, the vision for the MPEG-AI standard considers two aspects of the interaction between AI and multimedia: AI as a multimedia coding tool and multimedia for consumption by AI systems. MPEG-AI sets out how the tech can be used to improve multimedia, for example, AI-based compression of videos and point clouds, and how multimedia can be made easier for AI to use, like feeding AI systems with video data for machine analysis or compressed models.
To ensure optimal performance in the multimedia process, the vision also sets out to enable AI-based standards to either entirely rely on AI methods or combine them with traditional techniques. MPEG-AI, as with all standards, looks to create transparent, thoroughly testing technologies which serve the entire multimedia industry and that users can trust.
Standardization enabling unified innovation
The vision for standardization comes to life through its application. Take AI as a multimedia coding tool – through neural networks, AI can enhance multimedia coding by applying deep learning models to compress and improve the quality of video. MPEG-AI will standardize the use of AI-driven compression methods, ensuring interoperability across platforms.
Looking at multimedia for consumption by AI, neural networks can consume multimedia data – such as videos or images – to improve tasks like object recognition or object tracking. MPEG-AI will define standards for how multimedia is encoded and structured to be more AI-friendly. Feature coding for machine consumption is such an activity that transforms an input video into decodable bitstream for any machine task.
At Nokia, we play an instrumental role in advancing AI-driven multimedia technologies and the standards that support them. Over the past two decades, we’ve contributed almost 5,000 inventions that enable multimedia products and services, and we continue to play a leading role in multimedia research and standardization.
AI infuses new vitality into the standardization landscape, introducing fresh challenges and revolutionizing traditional methods for developing standards. For instance, the once tedious task of creating coding tools has evolved into solving optimization problems by designing neural network architectures and training schemes for these networks. New challenges have, however, emerged, such as reproducibility, efficiency, and impact of underlying hardware requirements on design choices, which the community is actively considering.
As multimedia evolves to incorporate AI-driven capabilities, we’re also helping shape the technologies that enable this shift. One area is Neural Network Compression (NNC), ISO/IEC 15938-17:2024, which allows for the efficient storage and transmission of AI models used in services like object detection, speech enhancement and video postprocessing and super resolution. NNC is crucial for bringing advanced AI to bandwidth-constrained devices, making intelligent features more accessible across networks and platforms.
By contributing to standards like MPEG-AI and advancing enablers such as NNC, we are helping ensure that AI-powered multimedia systems can interoperate, scale, and deliver value – across the entire ecosystem.
What’s next?
The introduction of AI in the multimedia landscape, powered by MPEG-AI family of standards, is going to redefine digital experiences for consumers. People can expect to benefit from AI enhanced experiences across various devices. One could have more immersive gaming experiences with virtual reality, mixed reality, and experience an enhanced metaverse. Virtual meetings will be in 3D, our favorite artists could be performing in our living room as lifelike holograms, and we’ll be able to take virtual tours of historic landmarks from our sofas.
Outside of the home, AI in multimedia will enable smart cities and elevate our daily interactions. Agentic-AI creates our travel itineraries and books our trips, while augmented reality guides us to the nearest available parking spots through our glasses. Innovations that have long been talked about will become reality in months rather than years – rolled out to the masses thanks to the umbrella family of standards, MPEG-AI.
As future multimedia systems evolve to integrate AI components at their core, one significant bottleneck in service delivery will be the compression of data within AI systems. This data often takes the form of tensors, representing intermediate outputs from neural networks or the parameters of these neural networks. Establishing standards for universally compressing such tensors could significantly boost the adoption of AI services across various scenarios where the compression and delivery of AI tensors are crucial. Currently, the neural network compression ad hoc group is exploring such universal compression paradigm to enable future multimedia systems. The current findings indicate that neural network compression standard could operate efficiently on tensors beyond neural network parameters.
Nokia is proud to be an active contributor to MPEG-AI family of standards, fostering transparency and thorough assessment of technologies. Without standards, devices and systems from different vendors would not work together, and much of the value that modern technology has brought to the world would not have been created. We currently sit at a key inflexion point for AI as a multimedia coding tool and multimedia for consumption by AI. The decisions made over the coming months will see the vision become a reality.