Video Memorability Dataset

  • About
  • Description
  • Download


The VideoMem or Video Memorability Database is a collection of sound-less video excerpts and their corresponding ground-truth memorability files. The memorability scores are computed based on the measurement of short-term and long-term memory performances when recognizing small video excerpts a few minutes after viewing them for the short-term case, and 24 to 72 hours later, for the long-term case. It is accompanied with video features extracted from the video excerpts.

It is intended to be used for understanding the memorability of videos and for assessing the quality of methods for predicting the memorability of multimedia content. A detailed description of the dataset can be found on our Data Description page and in the article presenting the dataset (see below for citation). The license conditions are mentioned on the Download page.



All documents and papers that report on research that uses the VideoMem Dataset must acknowledge the use of the dataset by including an appropriate citation to the following:

R. Cohendet, C.-H. Demarty, N. Q. Duong and M. Engilberge. VideoMem: Constructing, Analyzing, Predicting Short-term and Long-term Video Memorability. 2018. Arxiv:1812.01973.


Title = {VideoMem: Constructing, Analyzing, Predicting Short-term and Long-term Video Memorability},

Author = {Romain Cohendet and Claire-H\'{e}l\`{e}ne Demarty and Ngoc Q.K. Duong and Martin Engilberge},

Booktitle = {arxiv:1812.01973},

Year = {2018},



It would be highly appreciated if this use was shared with InterDigital, at this address:, with “VideoMem” in the subject of your email.


The delivered package contains the dataset used in [1] and in the MediaEval 2018 Predicting Media Memorability Task [2]. It was intended for the tasks of video memorability understanding and prediction. It is composed of:

  • 10,000 short soundless video excerpts split into:
  • The corresponding ground truth for the 8,000 video excerpts of the development set;
  • Extracted video features on the 10,000 video excerpts.

You are informed that the original video excerpts remain the ownership of their legitimate owners and that no license is granted on these excerpts. They are provided and may be used exclusively under the article L.122-5 3° a) of the French Code of intellectual property or, where applicable, under the “fair use” doctrine or its equivalent.


Videos were extracted from raw footage used by professionals when creating content. They are varied and contain different scenes types (e.g., animal, food and beverages, nature, people, transportation, etc.). Videos are released in .webm format, with a bitrate of 3,000 kbps. (This format was also used during the annotation process.) They are provided as individual files, named:  videoNb.webm

Textual metadata

Each video comes with its original title that can be found in file dev-set_video-captions.txt in the package. These titles can often be seen as a list of tags (textual metadata).

Ground truth

The corresponding ground truth for the development set can be found in file ground-truth_dev-set.csv.

It contains one line per video, which consists of:

  • the video's filename;
  • its short-term memorability score;
  • the number of annotations that was used to calculate the short-term memorability score;
  • its long-term memorability score;
  • the number of annotations that was used to calculate the long-term memorability score. 


A set of pre-extracted visual features is also provided.

Precomputed features are organized in different folders, one per feature. Most of the time the chosen file format is plain text. We specify the internal feature format for each feature file below.

Video specialized features

For the following two features, one will find one file per video.

  • C3D features [3]
    • outputs: the final classification layer of the C3D model
    • file format: text file
    • feature: a single list of numbers on one line (dimension = 101)
  • HMP [4]
    • outputs: the histogram of motion patterns for each video
    • file format: text file
    • feature: a single list of pairs of numbers with the format: bin: number (dimension = 6075) on one line

Image features on video

The following features were extracted on three key-frames (first (0), one-third (56) and two-thirds (112)) on each video. So there are three files for each video, with names videoNb-0.txt, videoNb-56.txt, videoNb-112.txt.

  • HoG [5]
    • outputs: histograms of oriented gradients. Gradients are calculated on 32x32 windows on a greyscale image.
    • file format: text file
    • feature: a single list of numbers on one line (dimension = depends on the image size)
  • LBP [6]
    • outputs: Local Binary Patterns. These features represent some local texture information. LBP values are calculated for patches of 8x15 pixels.
    • file format: text file
    • feature: a single list of numbers on one line (dimension = depends on the image size)
  • InceptionV3 [7]
    • outputs: output of the fc7layer of the InceptionV3 deep network.
    • file format: text file
    • feature: a single list of pairs of numbers with format imagenet_class:activation (max dimension = 1,000)
  • ORB [8] (An efficient alternative to SIFT or SURF)
    • outputs: Oriented FAST and Rotated BRIEF. ORB is basically a fusion of FAST keypoint detector and BRIEF descriptor with many modifications to enhance the performance.
    • file format: pickle (extension of file ".p").
    • feature: a list of keypoints and descriptors (see here for some more details). Keypoints and their descriptors were stored using function pickle_keypoints available here)
  • Color Histogram
    • outputs: classic color histogram (three channels)
    • file format: text file
    • feature: 3 lists (Red, Green, Blue in that order) of 255 pairs with format bin: number, e.g., 254:1008. One list per line.
  • Aesthetic visual features [9]. Those features were extracted for one frame every 10 frames of one single video and then aggregated through the computation of their mean and median values.
    • outputs: a collection of features used in the prediction of visual aesthetics, composed of color, texture, and object-based descriptors, aggregated at video level by median and mean methods.
    • file format: text file
    • feature: a single list of comma-separated numbers on one line (dimension = 109)

We would like to thank Ricardo Manhaes Savii (Federal University of São Paulo) for extracting most of the features. His code to extract the features is available here. We also want to thank Mihai Gabriel Constantin (here) for extracting the aesthetic features.


[1] Cohendet, R., Demarty, C.-H., Duong N. Q. and Engilberge, M.. VideoMem: Constructing, Analyzing, Predicting Short-term and Long-term Video Memorability. 2018. Arxiv:1812.01973.

[2] Cohendet, R., Demarty, C.-H., Duong N. Q, Sjöberg, M., Ionescu, B., Do, T.-T. MediaEval 2018: Predicting Media Memorability. Proceedings of the MediaEval Workshop, Sophia Antipolis, France, 2018.

[3] Tran, D., Bourdev, L., Fergus, R., Torresani, L. and Paluri, M. (2015). Learning Spatiotemporal Features with 3D Convolutional Networks. In Computer Vision (ICCV), 2015 IEEE International Conference on (pp. 4489-4497).

[4] Almeida, J., Leite, N. J., and Torres, R. D. S. (2011). Comparison of video sequences with histograms of motion patterns. In Image Processing (ICIP), 2011 18th IEEE International Conference on (pp. 3673-3676).

[5] Dalal, N., and Triggs, B. (2005). Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition, 2005 (CVPR). In IEEE Computer Society Conference on (pp. 886-893).

[6] He, D. C., and Wang, L. (1990). Texture unit, texture spectrum, and texture analysis. In IEEE transactions on Geoscience and Remote Sensing, 28(4), (pp. 509-512).

[7] Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2818-2826).

[8] Rublee, E., Rabaud, V., Konolige, K., and Bradski, G. (2011). ORB: An efficient alternative to SIFT or SURF. In Computer Vision (ICCV), 2011 IEEE international conference on (pp. 2564-2571).

[9] Haas, A.F., Guibert, M., Foerschner, A., Co, T., Calhoun, S., George, E., Hatay, M., Dinsdale, E., Sandin, S.A., Smith, J.E., Vermeij, M.J.A., Felts, B., Dustan, P., Salamon, P., Rohwer, F. (2015) Can we measure beauty? Computational evaluation of coral reef aesthetics In PeerJ (pp. 1390).


In order to get the data, you are asked to supply your name and email address. You will receive instructions on how to download the dataset via this email address. We may store the data you supplied in order to contact you later about benchmark related matters. The data will not be used in any other way.


To download the VideoMem dataset please send an email to By doing this you irrevocably agree to any and all provision of the license agreement on this page.



This Limited Database Evaluation License Agreement (the “Agreement”) is entered into as of Your download of the database (“Effective Date”).

This Agreement governs the download and use of the Database (as defined below). Your use of the Database is subject to the terms and conditions set forth in this Agreement. By installing, using, accessing or copying the Database, you hereby irrevocably accept the terms and conditions of this Agreement. If you do not accept all or parts of the terms and conditions of this Agreement you cannot install, use, access nor copy the Database.



Authorized Purpose” means any use of the Database for research on the Database and evaluation of the Database exclusively, and academic research using the Database without any commercial use. For the avoidance of doubt, a commercial use includes, but is not limited to:

  • development of commercial systems,
  • proving the efficiency of commercial systems,
  • training or testing of commercial systems,
  • using screenshots of data from the database in advertisements,
  • selling data from the database”

Database” means the database which consists of:

  • A list of 10,000 soundless video excerpts extracted from raw professional raw footage;
  • The corresponding ground truth for 8,000 out of the 10,000 excerpts (i.e., for each excerpt, a short-term memorability score, a long term-memorability score and the number of annotations in both cases);
  • Video features that were extracted from the video excerpts.

Limited Period” means the life of the Intellectual Property Right owned by InterDigital on the Database and Software in each and every country where such Intellectual Property Right would exist.

Intellectual Property Rights” means all copyrights, trademarks, trade secrets, patents, mask works and other intellectual property rights recognized in any jurisdiction worldwide, including all applications and registrations with respect thereto.

Materials” means the relevant short excerpts of videos from which the Database has been built and that are provided with the Database in the only intend to ease the use of the Database.


InterDigital grants Licensee a free, worldwide, non-exclusive, license on copyright owned on the Database to download, use and reproduce solely for the Authorized Purpose for the Limited Period.


Restrictions on use 

Licensee shall not remove, obscure or modify any copyright, trademark or other proprietary rights notices, marks or labels contained on or within the Database, falsify or delete any author attributions, legal notices or other labels of the origin or source of the Materials.

Without prior written approval from InterDigital, the Database and/or the Materials, in whole or in part, shall not be further distributed, published, copied, or disseminated in any way or form whatsoever. For the avoidance of any doubt, this prohibition does not include further distributing, copying or disseminating to a different facility or organizational unit in the same requesting university, organization, or company.

Without prior written approval from InterDigital, the Database and/or the Materials, in whole or in part, may not be modified or used for commercial purposes. For commercial use of the dataset, a specific paying license may be negotiated, please contact us.

In no case should the Database be used in any way that could directly or indirectly harm InterDigital. InterDigital permits publication (paper or web-based) of the data for scientific purposes only. Any other publication without scientific and academic value is strictly prohibited.


Title to and ownership of the Database, the Documentation and/or any Intellectual Property Right protecting the Database shall, at all times, remain with InterDigital. Licensee agrees that except for the rights granted on copyright on the Database set forth in Section 2 above, in no event does anything in this Agreement grant, provide or convey any other rights, immunities or interest in or to any Intellectual Property Rights (including especially patents) of InterDigital or any of its Affiliates whether by implication, estoppel or otherwise.

You are informed that, for efficiency reasons, the Materials is provided by InterDigital with the Database. You are informed that the Materials remains the ownership of its legitimate owner and that no license is granted on such material. The Materials are provided and may be used exclusively under the article L.122-5 3° a) of the French Code of intellectual property or, where applicable, under the “fair use” doctrine or its equivalent. Any use, redistribution or diffusion of such materials is strictly prohibited out of the strict necessity of the use of the Database in the strict compliance of this license.



Any publication or oral communication regarding the Database and/or the Software shall be elaborated in good faith and shall not be driven by a deliberate will to denigrate InterDigital or any of its products. In any publication and on any support joined to an oral communication (for instance a PowerPoint document) resulting from the use of the Database, the following statement/citation shall be inserted:

The dataset was provided by InterDigital and is described in the following publication:

Cohendet, R., Demarty, C.-H., Duong N. Q. and Engilberge, M.. VideoMem: Constructing, Analyzing, Predicting Short-term and Long-term Video Memorability. 2018. Arxiv:1812.01973.

In any oral communication resulting from the use of the Database, the Licensee shall orally indicate that the Database is InterDigital's property.


No Warranty - Disclaimer



Hence, the Licensee uses the Database at his own cost, risks, and responsibility. InterDigital shall not be liable for any damage that could arise to Licensee by using the Database, either in accordance with this Agreement or not.

InterDigital shall not be liable for any consequential or indirect losses, including any indirect loss of profits, revenues, business, and/or anticipated savings, whether or not in the contemplation of the Parties at the time of entering into the Agreement unless expressly set out in the Agreement, or arising from gross negligence, willful misconduct or fraud.

Licensee agrees that it will defend, indemnify and hold harmless InterDigital and its Affiliates against any and all losses, damages, costs and expenses arising from a breach by the Licensee of any of its obligations or representations hereunder, including, without limitation, any third party, and/or any claims in connection with any such breach and/or any use of the Database, including any claim from third party arising from access, use or any other activity in relation to this Database.

The Licensee shall not make any warranty, representation, or commitment on behalf of InterDigital to any other third party. 


Term and Termination

This Agreement shall terminate at the end of the Limited Period unless earlier terminated by either party on the ground of material breach by the other party, which breach is not remedied after thirty (30) days advance written notice, specifying the breach with reasonable particularity and referencing this Agreement.


General Provisions

12.1 Severability.  If any provision of this Agreement shall be held to be in contravention of applicable law, this Agreement shall be construed as if such provision were not a part thereof, and in all other respects, the terms hereof shall remain in full force and effect.

12.2 Governing Law.  Regardless of the place of execution, delivery, performance or any other aspect of this Agreement, this Agreement and all of the rights of the parties under this Agreement shall be governed by, construed under and enforced in accordance with the substantive law of France without regard to conflicts of law principles. In case of a dispute that could not be settled amicably, the courts of Nanterre shall be exclusively competent.

12.3 Assignment. InterDigital may assign this license to any third party. Such an assignment will be announced on the website as defined in article 5. Licensee may not assign this agreement to any third party without the previous written agreement from InterDigital.

The Principal Investigators can be contacted via