Fast forwarding video takes more than a swipe

by Miska Hannuksela

13 Feb 2023

Fast forwarding video takes more than a swipe

Every time you jump to a desired scene within a video stream, continue watching a movie at a later time, switch channels on TV or join a video conference after it’s already started, you use a video decoding functionality called random access. While it is hard to imagine living without these features, developing random access in video coding took decades of innovation.

Evolution of random access in video coding

Today’s video compression standards, including the Advanced Video Coding (AVC, 2003), the High Efficiency Video Coding (HEVC, 2013), and the Versatile Video Coding (VVC, 2020) standard, are the result of decades-long international collaboration. Each of these codec generations have achieved a 50 percent bitrate reduction compared to their predecessor without compromising picture quality. In addition to superior compression efficiency, each new codec has also introduced other improvements in functionalities such as random access.

One of the key technical challenges with random access is that it comes at the expense of this compression efficiency, because it does not exploit the temporal correlation between the frames in a video sequence. Uncompressed video is composed of still images that follow each other. In one second of video there are typically about 24 to 60 frames. The simplest form of random access is achieved through a special type of frame called instantaneous decoding refresh (IDR) frame, which is independent from any other frame and cannot be followed by frames that depend on any frame preceding the IDR frame.

A clean random access (CRA) frame enables a more sophisticated form of random access. CRA is also independent from any other frames. But unlike an IDR frame, a CRA may be followed by frames that precede the CRA frame in display order and depend on one or more frames preceding the CRA frame, improving the compression efficiency [1] and ensuring consistent picture quality between different frames.

Figure 1 presents an example of IDR and CRA frames within a video sequence. The arrows illustrate inter-frame prediction, which means that each frame is divided into blocks which are then encoded by pointing to approximately matching blocks in reference frames. This way inter-prediction removes temporal correlation between frames. In figure 1, the prediction arrows start from the available reference frames and point to inter-predicted frames.

Figure 1. Illustration of IDR, CRA, and inter-predicted frames.

An individual user’s experience of network throughput may vary due to many reasons, such as the total amount of network traffic. When the video bitrate exceeds network throughput, the receiver does not have enough bandwidth for real-time playback and therefore is required to pause playback for rebuffering. Avoiding these kinds of interruptions requires adapting the streamed video bitrate to make sure that it does not exceed network throughput. The same video content is encoded multiple times and made available as multiple versions that have different resolutions and bitrates, and the player selects which version to stream on a segment basis. Each segment starts with an IDR or CRA frame so that it can be decoded. A CRA frame can only be used when the codec supports inter-frame prediction between frames of different resolution. By using CRA frames instead of IDR frames, the required bitrate can be reduced by up to an additional nine percent [2].

In gradual decoding refresh (GDR), the content of the frame is refreshed over a period of frames as depicted in Figure 2. In a GDR frame only a small portion of the frame is compressed without inter-frame prediction, and thus it can be recovered correctly when the decoding starts from the GDR frame. In subsequent frames, the refreshed area is either compressed without inter-frame prediction or it only uses the refreshed areas in the previous frames for prediction. The refreshed area can therefore be reconstructed properly when the decoding started from the GDR frame and gradually covers the entire frame.

GDR helps avoid temporary bitrate fluctuations caused by IDR and CRA frames. Consequently, it enables a very low end-to-end latency, which makes GDR a perfect match for applications requiring low delay, such as cloud gaming, and for networks offering low-delay transmission, such as 5G mobile networks.

Figure 2. Illustration of gradual decoding refresh (GDR).

Video coding standards specify mandatory features that all standard-conforming decoders must implement and optional features, which encoders may indicate in the bitstream but decoders are not required to support. Table 1 presents a summary how different random access types are enabled as mandatory or optional in video coding standards. As you can see, VVC is clearly the most advanced codec when it comes to random access features.

Table 1

Table 1. Decoder support for different types of random access.
Supplemental enhancement information (SEI) has no mandatory decoder process.

Nokia is in the forefront of advancing random access capability

When it comes to random access, VVC is without a doubt the most advanced video coding standard as it supports all types including IDR, CRA and GDR.

Nokia has been at the forefront of developing video coding standards since the very beginning, from AVC to VVC and beyond. Nokia is also a pioneer in all types of random access, as exemplified by selected referenced standardization contributions for IDR [3], CRA [4], CRA with resolution change [5], and GDR [6].

In addition to specifying the bitstream syntax and decoder operation for random access, we have helped the entire industry in incorporating the use of GDR into encoders. While implementing GDR in decoders is relatively straightforward, it requires very sophisticated non-standardized encoder operation. Nokia has published various papers (e.g., [7]) and given presentations describing how to implement GDR in VVC encoding, as well as provided GDR source code in the VVC reference codec [8] as well as on top of Fraunhofer's VVenC encoder.

As the world’s appetite for video continues to grow, our inventors keep contributing to improving standards and working with our industry partners to ensure seamless streaming is available for all.

References

[1] A. Fujibayashi and T. K. Tan, "Random access support for HEVC", JCTVC-D234, Jan. 2011.

[2] R. Skupin, C. Bartnik, A. Wieckowski, K. Suehring, Y. Sanchez, and B. Bross, "Constrained RASL encoding for bitstream switching", JVET-W0133, July 2021.

[3] M. M. Hannuksela, "Signaling of “clean” random access positions", JVT-C083, May 2002.

[4] Y.-K. Wang and M. M. Hannuksela, "On random access", JVT-D097, July 2002.

[5] M. M. Hannuksela and A. Aminlou, "Use cases and proposed design choices for adaptive resolution changing (ARC)", JVET-M0259, Jan. 2019.

[6] Y.-K. Wang and M. M. Hannuksela, "Gradual decoder refresh using isolated regions", JVT-C074, May 2002.

[7] L. Wang, S. Hong, and K. Panusopone, "Gradual decoding refresh for Versatile Video Coding", IEEE International Conference on Image Processing, Sep. 2021.

[8] S. Hong, L. Wang, and K. Panusopone, "GDR software", JVET-U0097, Jan. 2021.

About Miska Hannuksela

Miska Hannuksela, (M.Sc., Dr. Tech), is the Head of Video Research at Nokia Technologies and a Nokia Bell Labs Fellow. He is an internationally acclaimed expert in video and image compression and end-to-end multimedia systems.