Video postproduction pipeline will increasingly benefit from artificial intelligence tools. For instance, the automatic extraction of specific objects helps the postproduction workflow. In particular, booms mics removal could be accelerated and color chart detection could end up in a more efficient color pipeline. For now, the segmentation of these objects is usually done via rotoscoping and consequently necessitates huge manual work. Semantic segmentation has made huge progress since the use of convolutional networks. Existing and publicly available frameworks such as Detectron2 (\url{https://github.com/facebookresearch/detectron2}) and PointRend \cite{kirillov2019pointrend} already allow to perform high quality detection and segmentation of $80$ different generic classes. However, the performance of these frameworks is very much bound to the quantity and quality of training data. Unfortunately, fetching relevant video footage and manually extracting the objects (e.g., boom mics and color charts) is out of reach. To alleviate this problem, we propose in this paper a lightweight training strategy: training data is generated synthetically by inserting in an existing dataset the desired objects, along with data augmentation. A pretrained network is used and fine-tuned using this new dataset. Despite its simplicity, we show in this paper that the system can achieve good performances for an automatic video postproduction pipeline.
Learning semantic object segmentation for video post-production
Learning semantic object segmentation for video post-production
Learning semantic object segmentation for video post-production
Research Paper / Dec 2021 / Computer Vision, Machine learning/ Deep learning /Artificial Intelligence
Related Content
The ability of multimedia data to attract and keep people’s interest for longer periods of time is gaining more and more importance in the fields of information retrieval and recommendation, especially in the context of the ever growing market value of social media and advertising. In this chapter we introduce a benchmarking framework (dataset and evaluation too…
We present a new method for reconstructing a 4D light field from a random set of measurements. A 4D light field block can be represented by a sparse model in the Fourier domain. As such, the proposed algorithm reconstructs the light field, block by block, by selecting frequencies of the model that best fits the available samples, while enforcing orthogonality wi…
Research Paper /Feb 2024 / Wireless communication, 5G, Machine learning/ Deep learning /Artificial Intelligence
The ubiquitous deployment of 4G/5G technology has made it a critical infrastructure for society that will facilitate the delivery and adoption of emerging applications and use cases (extended reality, automation, robotics, to name but a few). These new applications require high throughput and low latency in both uplink and downlink for optimal performance, while…
Webinar /Jun 2024
Blog Post /May 2025