The delivered package contains the development and test sets for the MediaEval 2016 Predicting Media Interestingness Task and for the MediaEval 2017 Predicting Media Interestingness Task. For 2016, it is composed of:

  • Shots and key-frames from a set of 78 Hollywood-like movie trailers of different genres
  • The corresponding ground truth
  • Additional low-level and mid-level features

For 2017, it is composed of:

  • Shots and key-frames from a set of 103 Hollywood-like movie trailers of different genres and 4 continuous extracts of ca. 15min from full-length movies.
  • The corresponding ground truth
  • Additional low-level and mid-level features

All or part of the content is distributed under CC license. The researcher commits to use the content in line with its provisions. Should any provision of the CC license and this license be irreconcilable, the CC license will prevail. The content contains the relevant credits, in accordance with the CC license. The researcher won’t, in any circumstance, delete or alter such credits. The researcher will find such license here

 

Data

The data consists of:

  • the movie shots (obtained after the manual segmentation of the trailers or excerpts). Video shots are provided as individual mp4 files, whose names follow the format:

shotstartingframe-shotendingframe.mp4.

  • collections of key-frames extracted from the previous video shots (one key-frame per shot). The extracted key-frame corresponds to the frame in the middle of each video shot. Its naming format follows:

frameNb_shotstartingframe-shotendingframe.jpg.

  • the corresponding movie titles (or at least the name of the video as it appears on the internet).

 

Ground-Truth

For all these trailers and excerpts, ground-truth consists in binary annotations of each shot and key-frame into interesting/non interesting according to the following use scenario:

Interestingness should be assessed according to the following use case. The use case scenario of the task derives from a practical use case at InterDigital which involves helping professionals to illustrate a Video on Demand (VOD) web site by selecting some interesting frames and/or video excerpts for the movies. The frames and excerpts should be suitable in terms of helping a user to make his/her decision about whether he/she is interested in watching a movie.

Ground truth is provided in two separated text files, one for the shots and another one for the key-frames. All data was manually annotated in terms of interestingness by human assessors. A pair-wise comparison protocol was used [1]. Annotators were provided with a pair of images/video shots at a time and asked to tag which of the content is more interesting for them. The process is repeated by scanning the whole dataset. To avoid an exhaustive, full comparison, between all the possible pairs, a boosting selection method was employed (i.e., the adaptive square design method [2]). The obtained annotations are finally aggregated using a BTL model computation [1] resulting in the final interestingness degrees of the images/video shots. The final binary decisions are then obtained from some empirical thresholding applied to the rankings.

Important note: Because we are using the adaptive square design method [2] to annotate the data, the number of shots, and consequently the number of key-frames, for which we are providing annotations is the maximal number which can be expressed as t=s^2 in each video. E.g., for a video with 55 shots in total, only 49 = 7^2 shots, resp. key-frames, were annotated. This explains the discrepancy one may notice between the number of shots and key-frames provided in the data, and the number of shots and key-frames in the annotation files. Also, this appears in the provided features which we computed on all the data, not taking into account the limit that comes from the annotation process. Please use the annotation files to serve as references when the number of shots and key-frames is concerned.

 In both cases, the following format will be used for ground truth text files (please note that the data format is comma-separated): 

  • Shots: one line per shot:

videoname,shotname,[classification decision: 1(interesting) or 0(not interesting)],[interestingness level],[shot rank in video]

  • Keyframes: one line per key-frame:

videoname,key-framename,[classification decision: 1(interesting) or 0(not interesting)],[interestingness level],[key-frame rank in movie]

 

Features

Low-level features are also provided together with the data. We would like to thank our colleagues Yu-Gang Jiang and Baohan Xu from the Fudan University, China, for making these features available for the task:

  • Dense SIFT are computed following the original work in [3], except that the local frame patches are densely sampled instead of using interest point detectors. A codebook of 300 codewords is used in the quantization process with a spatial pyramid of three layers [4];
  • HOG descriptors [5] are computed over densely sampled patches. Following [6], HOG descriptors in a 2x2 neighborhood are concatenated to form a descriptor of higher dimension;
  • LBP (Local Binary Patterns) [7];
  • GIST is computed based on the output energy of several Gabor-like filters (8 orientations and 4 scales) over a dense frame grid like in [8];

(All the aforementioned visual features are extracted using the codes from the authors of [6]).

  • Color Histogram in HSV space;
  • MFCC computed over every 32ms time-window with 50% overlap. The cepstral vectors are concatenated with their first and second derivatives;
  • fc7 layer (4096 dimensions) and prob layer (1000 dimensions) of AlexNet.

These features are provided in Matlab file format (.mat), and therefore can be loaded using the Matlab function, i.e., "load filename.mat". For more information about these features and how they are organized, please refer to the README.txt file in the released package.

 

Please cite the following paper in your publication if you happen to use the above features:

[10] Yu-Gang Jiang, Qi Dai, Tao Mei, Yong Rui, Shih-Fu Chang. Super Fast Event Recognition in Internet Videos. IEEE Transactions on Multimedia, vol. 177, issue 8, pp. 1-13, 2015. 

Additionally, some features at video level are also provided:

  • C3D features, extracted from fc6 layer (4096 dimensions) and averaged on a segment level.

 

Mid Level Features

In addition to the low-level features, mid level features related to face detection and tracking are also provided. These features were kindly computed by the organizers of the Multimodal Person Discovery in Broadcast TV task:

  • Face-related features. Face tracking-by-detection is applied within each shot using a detector based on histogram of oriented gradients [5] and the correlation tracker proposed in [9]. Format is the following:

* time identifier left top right bottom

* identifier : face track identifier

* time : in seconds

* left : bounding box left boundary (image width ratio)

* top : bounding box top boundary (image height ratio)

* right : bounding box right boundary (image width ratio)

* bottom : bounding box bottom boundary (image height ratio)

 

References

[1] R.A. Bradley, M. E. Terry, Rank Analysis of Incomplete Block Designs: the method of paired comparisons. Biometrika, 29 (3-4):324-345, 1952.

[2] J. Li, M. Barkowsky and P. Le Callet, Boosting Paired Comparison Methodology in Measuring Visual Discomfort for 3DTV: Performances of three different Designs. SPIE Electronic Imaging, Stereoscopic Displays and Applications, Human Factors, 8648, p. 1-12, 2013.

[3] D. Lowe, Distinctive image features from scale-invariant keypoints. International Journal on Computer Vision 60:91¨C110, 2004.

[4] S. Lazebnik, C. Schmid and J. Ponce, Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 2006.

[5] N. Dalal and B. Triggs, Histograms of oriented gradients for human detection. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 2005.

[6] J. Xiao, J. Hays, K. Ehinger, A. Oliva and A. Torralba, SUN database: Large-scale scene recognition from abbey to zoo. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 2010.

[7] T. Ojala, M. Pietikainen and T. Maenpaa, Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence24(7):971¨C987, 2002.

[8] A. Oliva and A. Torralba, Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision 42:145¨C175, 2001.

[9] Danelljan et al. Accurate scale estimation for robust visual tracking. BMVA 2014.

About

 The Interestingness Dataset is a collection of movie excerpts and key-frames and their corresponding ground-truth files based on the classification into interesting and non-interesting samples. It is intended to be used for assessing the quality of methods for predicting the interestingness of multimedia content.

The data has been produced by the MediaEval 2016 Predicting Interestingness and the MediaEval 2017 Predicting Interestingness Tasks' organizers and was used in the context of this benchmark. A detailed description of the benchmark can be found on our Data Description page. The license conditions are mentioned on the Download page.

 

Acknowledgements

We would like to thank the MediaEval benchmark and their organizers for their support in the creation of this dataset. We also would like to thank the different co-organizers and advisories for the task:

and of course our annotators.

 

CITING THE INTERESTINGNESS DATASET

If you make use of the Interestingness Dataset, or refer to its results, please use the following citations:

C.-H. Demarty, M. Sjöberg, B. Ionescu, T.-T. Do, M. Gygli and N. Q. Duong. Mediaeval 2017 predicting media interestingness task. In Proc. Of the MediaEval 2017 Workshop, Dublin, Ireland, September 13-15, 2017 (pdf).

@InProceedings{Demarty2017,

Title = {MediaEval 2017 Predicting Media Interestingness Task},

Author = {Claire-H\'{e}l\`{e}ne Demarty and Mats Sj\¨{o}berg and Bogdan Ionescu and Than-Toan Do and Mychael Gygli and Ngoc Q.K. Duong},

Booktitle = {Proc. of the MediaEval 2017 Workshop, Dublin, Ireland, Sept. 13-15},

Year = {2017},

 }

 

C.-H. Demarty, M. Sjöberg, B. Ionescu, T.-T. Do, H. Wang, N. Q. Duong, and F. Lefebvre. Mediaeval 2016 predicting media interestingness task. In Proc. Of the MediaEval 2016 Workshop, Hilversum, Netherlands, Oct. 20-21, 2016 (pdf).

@InProceedings{Demarty2016,

Title = {MediaEval 2016 Predicting Media Interestingness Task},

Author = {Claire-H\'{e}l\`{e}ne Demarty and Mats Sj\¨{o}berg and Bogdan Ionescu and Than-Toan Do and Hanli Wang and Ngoc Q.K. Duong and Fr\'{e}d\'{e}rique Lefebvre},

Booktitle = {Proc. of the MediaEval 2016 Workshop, Hilversum, Netherlands, Oct. 20-21},

Year = {2016},

 }

Download

In order to get the data, you are asked to supply your name and email address. You will receive instruction on how to download the dataset via this email address. We may store the data you supplied in order to contact you later about benchmark related matters. The data will not be used in any other way.

To download the Interestingness dataset please send an email to interestmanagement@interdigital.com. By doing this you irrevocably agree to any and all provision of the license agreement in this page.

 

Terms of use

Interestingness Dataset Release Agreement

The scene selection you are about to download, in case you agree with these Terms of use, may not be suitable for children.

The goal of the Interestingness Dataset is to develop new techniques, technology, and algorithms for the automatic prediction of interestingness levels.

The Interestingness dataset consists of:

  • Shots and key-frames extracted from trailers under Creative Commons License.
  • The annotations of these shots and key-frames
  • Additional low-level features computed on the material

InterDigital, the University Politehnica of Bucharest, the University of Helsinki, the Tongji University, the Singapore University of Technology and Design, Singapore and the University of Science, Vietnam share the copyright and all rights of authorship on the data. InterDigital is the principal distributor of the Interestingness dataset. 
 

Release of the dataset

To advance the state-of-the-art in predicting media interestingness, the Interestingness Dataset is made available to the researcher community for scientific research only. All other uses of the Interestingness Dataset will be considered on a case-by-case basis. To receive a copy of the Interestingness Dataset, the requestor must agree to observe all of these Terms of use.
 

Consent

The researcher(s) agrees to the following restrictions on the Interestingness Dataset:

  1. Redistribution: Without prior written approval from InterDigital, the Interestingness Dataset, in whole or in part, shall not be further distributed, published, copied, or disseminated in any way or form whatsoever, whether for profit or not. For the avoidance of any doubt, this prohibition includes further distributing, copying or disseminating to a different facility or organizational unit in the requesting university, organization, or company.
  2. Modification and Non Commercial Use: Without prior written approval from InterDigital, the Interestingness Dataset, in whole or in part, may not be modified or used for commercial purposes. Modification is allowed for scientific research purposes only. It would be highly appreciated if the modified Interestingness Dataset was shared with InterDigital, at this address: interestmanagement@interdigital.com.
    For the avoidance of doubt, commercial purposes include but are not limited to:
    • Development of commercial systems,
    • proving the efficiency of commercial systems,
    • training or testing of commercial systems,
    • using screenshots of data from the database in advertisements, selling data from the database
  3. Publication Requirements: In no case should the still frames or videos be used in any way that could directly or indirectly harm InterDigital, the University Politehnica of Bucharest, the University of Helsinki, the Tongji University, the Singapore University of Technology and Design, Singapore and the University of Science, Vietnam. InterDigital, the University Politehnica of Bucharest, the University of Helsinki, the Tongji University, the Singapore University of Technology and Design, Singapore and the University of Science, Vietnam permit publication (paper or web-based) of the data for scientific purposes only. Any other publication without scientific and academic value is strictly prohibited.
  4. Citations/References: All documents and papers that report on research that uses the Interestingness Dataset must acknowledge the use of the dataset by including an appropriate citations to the following:
    C.-H. Demarty, M. Sjöberg, B. Ionescu, T.-T. Do, H. Wang, N. Q. Duong, and F. Lefebvre. Mediaeval 2016 predicting media interestingness task. In Proc. Of the MediaEval 2016 Workshop, Hilversum, Netherlands, Oct. 20-21, 2016.
    C.-H. Demarty, M. Sjöberg, B. Ionescu, T.-T. Do, M.Gygli and N. Q. Duong. Mediaeval 2017 predicting media interestingness task. In Proc. Of the MediaEval 2017 Workshop, Dublin, Ireland, Sept. 13-15, 2017.
  5. No Warranty: THE PROVIDER OF THE DATA MAKES NO REPRESENTATIONS AND EXTENDS NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED. THERE ARE NO EXPRESS OR IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, OR THAT THE USE OF THE MATERIAL WILL NOT INFRINGE ANY PATENT, COPYRIGHT, TRADEMARK, OR OTHER PRO- PRIETARY RIGHTS.

The Principal Investigators can be contacted via email.