Be your own director: Multiview Live Streaming with VVC

Motocross rider carving through dirt trail, kicking up dust at high speed outdoors

Most broadcast live events rely on multi-camera productions, where a director decides which camera feed is delivered to viewers at any given moment. Multiview streaming challenges this model by allowing viewers to take on the role of the director themselves, for example, by following a favorite driver in a Formula 1 race or watching a game from multiple perspectives simultaneously. This shift is enabled both by the growing screen sizes of modern televisions and monitors and by recent advances in multiview video streaming technology. The latest video coding standard, Versatile Video Coding (VVC) [1], makes such personalized multiview experiences even more effortless to be realized technically, as discussed in this blog post.

Advantages of the single-player multiview approach

Advantages of the Single-Player Multiview Approach

Several technical alternatives exist for realizing multiview streaming, as summarized and compared in [2] and [3]. The prior works conclude that the single-player approach, illustrated in Figure 1, offers clear advantages over other alternatives. In this approach, the player fetches camera feeds based on the user's selections and combines them into a single video bitstream, which a single video decoder then decodes. This architecture scales efficiently for content creation and distribution, since the same set of encoded camera feeds serves all users regardless of their individual viewing choices. By combining the selected feeds into one bitstream, the player ensures perfectly synchronized playback across all views. In addition, the single-player approach enables multiview playback on devices that support only a single video decoder.

Figure 1. Single-player multiview approach.

Figure 1. Single-player multiview approach.

 

VVC: A perfect fit for Single-Player Multiview

Figure 2 illustrates how VVC is used in the single-player multiview approach. Each camera feed is encoded independently by a dedicated VVC encoder and made available for streaming. Based on the viewer’s selections, the player receives the desired bitstreams and merges them as independent subpictures into a single VVC bitstream, which a single VVC decoder then decodes.

Figure 2. Use of VVC for the single-player multiview approach.

Figure 2. Use of VVC for the single-player multiview approach.

Although it is possible to realize single-player multiview technology with tiles in High Efficiency Video Coding (HEVC) and some other codecs, VVC offers several important advantages discussed below.

Parallel, distributed, and scalable encoding is achieved, since each camera feed is encoded independently. Players can also receive, decode, and play any standalone VVC bitstream as such without modifications, enabling the multiview functionality to be introduced as an add-on service without affecting the delivery of single-camera streams. 

Only simple constraints are needed for VVC encoders to enable players to merge encoded bitstreams as independent subpictures. For more details on these encoding constraints, please refer to DVB's recommendations [4]. In contrast, relatively complex encoding techniques are required for HEVC encoding to avoid prediction dependencies between tiles so that they can be combined into a single bitstream [5]. Avoiding such dependencies across tile boundaries also reduces compression efficiency compared to independent subpictures.

The player operations for merging bitstreams as independent subpictures into a conforming VVC bitstream require only high-level modifications, specifically to parameter sets [4]. Additionally, the merge base track feature of the VVC packaging format in MP4 files [6] prescribes these modifications, allowing multiview merging to be implemented without the capability to parse or write VVC syntax, as described in further details in [7]. By comparison, merging HEVC tiles requires rewriting of coded video data, such as slice headers, which increases implementation complexity.

Same engine for multiple use cases

While this blog focused on VVC's suitability for multiview video streaming, the same underlying technology enables a range of other use cases. These include:

  1. Picture-in-picture replacement, where a rectangular region of the main video is replaced by a supplementary video, such as a sign language video [4].
  2. Cloud-based mixing of multi-point real-time video, such as multi-camera surveillance systems or multi-point video conferencing [5].
  3. Viewport-dependent 360-degree video streaming, where only the user's current viewport is delivered at the highest quality [7].
  4. Streaming of stereoscopic 3D video, allowing players to receive a single view for monoscopic playback or both views for stereoscopic 3D presentation.

Conclusion

In summary, VVC makes single-player multiview streaming efficient, scalable, effortless to implement, and ready for deployment. VVC's independent subpictures enable flexible personalized use of multiple views while preserving compatibility with single-view-capable services and devices. As audiences increasingly expect control over how they watch live content, VVC provides a future-proof foundation for delivering richer, more immersive viewing experiences.

References

  1. ITU-T H.266, "Versatile video coding", https://www.itu.int/itu-t/recommendations/rec.aspx?rec=16662 (accessed 12 March 2026).
  2. R. van Brandenburg, "Why Single-Player Multiview is Superior", https://www.tiledmedia.com/everything-about-multiview/ (accessed 12 March 2026).
  3. M. Sorin and J. Le Tanou, "Multiview streaming", https://www.mediakind.com/blog/multiview-streaming/ (accessed 12 March 2026).
  4. DVB Document A001, "Specification for the use of Video and Audio Coding in Broadcast and Broadband Applications", Annex M, "Considerations for personalization and accessibility services with VVC", https://dvb.org/specifications/ (accessed 12 March 2026).
  5. Y. Sanchéz, R.Globisch, T. Schierl, and T. Wiegand, "Low Complexity Cloud-video-Mixing Using HEVC", Proceedings of IEEE Consumer Communications and Networking Conference, Jan. 2014, https://ieeexplore.ieee.org/abstract/document/7351200 (accessed 12 March 2026).
  6. ISO/IEC 14496-15, "Carriage of network abstraction layer (NAL) unit structured video in the ISO base media file format", Clause 11 "VVC elementary streams and sample definitions", https://www.iso.org/standard/89118.html (accessed 12 March 2026).
  7. M. M. Hannuksela and S. Deshpande, "VVC for Immersive Video Streaming", Proceedings of ACM Mile-High Video, May 2023, https://dl.acm.org/doi/abs/10.1145/3588444.3591004 (accessed 12 March 2026).
Miska Hannuksela

About Miska Hannuksela

Miska Hannuksela, (M.Sc., Dr. Tech), is the Head of Video Research at Nokia Technologies and a Nokia Bell Labs Fellow. He is an internationally acclaimed expert in video and image compression and end-to-end multimedia systems.

Article tags