Deep bi-prediction blending
Deep bi-prediction blending. This paper presents a learning-based method to improve bi-prediction in video coding. In conventional video coding solutions, block-based motion compensation blocks from already decoded reference pictures stand out as the main tool used to predict the current frame. Especially, bi-predicted blocks, i.e. blocks that combine two different motion compensated prediction blocks, greatly improve the final temporal prediction accuracy by averaging together the 2 predictions. In recent codecs generation such as VVC, the blending process has been improved, for example by performing weighted blending or refining the predicted block by adding a correction offset derived from the two blocks' optical flow. In this context, we introduce a simple neural network that further improves the blending operation. A complexity balance, both in terms of network size and encoder mode selection, is carried out. Extensive tests on top of the recently standardized VVC codec are performed and show a BD-rate improvement of -1.4% in random access configuration, for a network size of about 10k parameters. We also propose a simple CPU-based implementation and network quantization to assess the complexity/gains tradeoff in a conventional codec framework.