Automatic extraction of face tracks is a key component of systems that analyzes people in audio-visual content such as TV programs and movies. Due to the lack of annotated content of this type, popular algorithms for extracting face tracks have not been fully assessed in the literature. To help fill this gap, we introduce a new dataset, based on the full audio-visual person annotation of a feature movie.
Thanks to this dataset, state-of-art tracking metrics such as track purity, can now be exploited to evaluate face tracks used by, e.g., automatic character naming systems. Also, due to consistent labeling, algorithms that aim at clustering faces or face tracks in an unsupervised fashion can benefit from this test-bed. Finally, thanks to the availability of the corresponding audio annotation, the dataset can be, e.g., used for evaluation of speaker diarization methods, and more generally for assessing multimodal people clustering or naming systems.
In order to get the data, you are asked to supply your name and email address. You will receive instructions on how to download the dataset via this email address. We may store the data you supplied in order to contact you later about benchmark related matters. The data will not be used in any other way.
To download the Hannah dataset, please send an email to email@example.com.
This work is supported by AXES EU project.
CITING HANNAH DATASET
A. Ozerov, J.-R. Vigouroux, L. Chevallier and P. Pérez. On evaluating face tracks in movies. In Proc. Int. Conf. Image Proc. (ICIP), 2013.