High-level understanding of stories in video such as movies and TV shows from raw data is extremely challenging. Modern video question answering (VideoQA) systems often use additional human-made sources like plot synopses, scripts, video descriptions or knowledge bases. In this work, we present a new approach to understand the whole story without such external sources. The secret lies in the dialog: unlike any prior work, we treat dialog as a noisy source to be converted into text description via dialog summarization, much like recent methods treat video. The input of each modality is encoded by transformers independently, and a simple fusion method combines all modalities, using soft temporal attention for localization over long inputs. Our model outperforms the state of the art on the KnowIT VQA dataset by a large margin, without using question-specific human annotation or human-made plot summaries. It even outperforms human evaluators who have never watched any whole episode befor
On the hidden treasure of dialog in video question answering
On the hidden treasure of dialog in video question answering
On the hidden treasure of dialog in video question answering
Related Content
The ability of multimedia data to attract and keep people’s interest for longer periods of time is gaining more and more importance in the fields of information retrieval and recommendation, especially in the context of the ever growing market value of social media and advertising. In this chapter we introduce a benchmarking framework (dataset and evaluation too…
We present a new method for reconstructing a 4D light field from a random set of measurements. A 4D light field block can be represented by a sparse model in the Fourier domain. As such, the proposed algorithm reconstructs the light field, block by block, by selecting frequencies of the model that best fits the available samples, while enforcing orthogonality wi…
Interestingness is the quantification of the ability of an imageto induce interest in a user. Because defining and interpretinginterestingness remain unclear in the literature, we introduce inthis paper two new notions, intra- and inter-interestingness, andinvestigate a novel set of dedicated experiments.More specifically, we propose four experimental protocols:…
Webinar /Jun 2024
Blog Post /Jul 2025
Blog Post /Jun 2025
Blog Post /Jun 2025