Hierarchical Temporal Structure for End-to-End Neural Network-based Video Compression

Hierarchical Temporal Structure for End-to-End Neural Network-based Video Compression

Research Paper  /  Jun 2021

This paper presents an end-to-end Artificial Neural Network (ANN)-based compression framework in response to the video compression task of the Challenge for Learned Image Compression (CLIC) at CVPR 2021. In this framework, the video frames are divided into Groups Of Pictures (GOPs) in which each frame can be encoded in Intra or Inter mode. In Intra mode, an auto-encoder compresses the pixel values directly. For Inter frames, we leverage bi-directional prediction with reference frame signaling, allowing for efficient hierarchical GOP temporal structures. The motion information, computed using the luminance, and prediction residuals are compressed using dedicated auto-encoder structures, in which the layers are conditioned based on the GOP structure. The network is trained fully end-to-end, from scratch. The results demonstrate the promises of end-to-end approaches.