LaFin Dataset

  • About
  • Description
  • Download

About

The LaFin: Large-scale Flickr interestingness dataset (hereafter “the Dataset”) is a collection of Flickr image IDs corresponding to about 123k Flickr images, equally balanced between interesting and non-interesting images, and their corresponding metadata. In addition to the images, their binary labels, and associated metadata, some precomputed features are provided: CNNs, semantic features that derived from image captioning and Word2Vec representations of Flickr tags. 

It is intended to be used for analyzing socially-driven image interestingness and building a machine learning model for prediction. A detailed description of the dataset can be found on our Data Description page and in the article presenting the dataset (see below for citation). The license conditions are mentioned on the Download page.  

CITING THE LAFIN DATASET

All documents and papers that report on research that uses the LAFIN Image Interestingness Dataset must acknowledge the use of the dataset by including an appropriate citation to the following:

E.Berson, N. Q. K. Duong and C.-H. Demarty. Collecting, Analyzing and Predicting Socially-Driven Image Interestingness. In Proceedings of the EUSIPCO Conference, Spain, 2019.  

@InProceedings {Berson 2019,

Title = { Collecting, Analyzing and Predicting Socially-Driven Image Interestingness},

Author = {Elois Berson and Ngoc Q.K. Duong and Claire-H\'{e}l\`{e}ne Demarty}, Booktitle = {Proc. of the EUSIPCO conference, Spain}, Year = {2019},

}  

It would be highly appreciated if this use was shared with InterDigital, at this address: lafinmanagement@interdigital.com.

Description

The delivered package contains the dataset used in [1]. It was intended for the tasks of socially-driven image interestingness understanding and prediction. It is composed of:

  • A list of Flickr image IDs corresponding to 123k Flickr images with associated metadata
  • The binary interestingness labels
  • Extracted image and text features that were used in [1].

List of images and labels

The complete list of Flickr image IDs and their corresponding label in file IDs_list.txt.  

Features

The set of image and text features used to train the model presented in [1] is provided together with the data:

  • CNN features extracted from each image;
  • Image-captioning based features (IC) extracted for each image;
  • Word2Vec features extracted from the Flickr tags associated with each image.  

CNN features

These features are extracted from the VGG16  network  [2]  pre-trained  on  the  ImageNet dataset. For each image, a CNN feature is extracted  from  the  last  fully-connected  layer  before  the softmax and has a dimension of 4096.

Image-captioning based features (IC)

These features are computed by an image captioning (IC) system as described in [3], which has an encoder comprising a  CNN  and  a  long  short-term  memory  recurrent  network (LSTM) for learning a joint image-text embedding. The projected CNN feature for each image is extracted and has a dimension of 1024.  

Word2Vec features from Flickr tags

A  300-dimensional  word embedding   vector   [3]   for   each   tag   associated   with   one image is provided. Those Word2Vec features are then averaged to obtain a single feature vector per image.  

References

[1] E. Berson, N. Q. K. Duong and C.-H. Demarty. "Collecting, Analyzing and Predicting Socially-Driven Image Interestingness." In Proceedings of the EUSIPCO Conference, Spain, 2019.

[2] R. Kiros, R. Salakhutdinov, and R. Zemel, “Unifying visual-semantic embeddings with multimodal neural language models,” CoRR, vol.abs/1411.2539, 2014. [Online]. Available: http://arxiv.org/abs/1411.2539

[3] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” in Proceedings of Workshop at ICLR, 2013.

Download

In order to get the data (Flickr image IDs, metadata, extracted features, and the interestingness labels), you are asked to supply your name and email address. You will receive instructions on how to download the dataset via this email address. We may store the data you supplied in order to contact you later about benchmark related matters. The data will not be used in any other way.

To download the LAFIN dataset please send an email to lafinmanagement@interdigital.com. By doing this you irrevocably agree to any and all provision of the license agreement on this page.  

LIMITED DATABASE AND SOFTWARE EVALUATION LICENSE AGREEMENT

This Limited Database License Agreement (the “Agreement”) is entered into as of Your download of the database (“Effective Date”).

The following limited Database license agreement (“the Agreement”) constitutes an agreement between you (the “licensee”) and InterDigital R&D France a French company existing and organized under the laws of France with its registered offices located at 975 Avenue des champs blancs 35510 Cesson-Sévigné, FRANCE (hereinafter “InterDigital”).

This Agreement governs the download and use of the Database (as defined below). Your use of the Database is subject to the terms and conditions set forth in this Agreement. By installing, using, accessing or copying the Database, you hereby irrevocably accept the terms and conditions of this Agreement. If you do not accept all or parts of the terms and conditions of this Agreement you cannot install, use, access nor copy the Database.  

Definitions

“Authorized Purpose” means any use of the Database for research on the Database and evaluation of the Database exclusively, and academic research using the Database without any commercial use. For the avoidance of doubt, a commercial use includes, but is not limited to:

  • development of commercial systems,
  • proving the efficiency of commercial systems,
  • training or testing of commercial systems,
  • using screenshots of data from the database in advertisements,
  • selling data from the "database"

“Database” means the database which consists of:

  • A list of Flickr image IDs to about 123k Flickr images with associated metadata;
  • The binary interestingness labels;
  • Extracted image and text features.

 “Limited Period” means the life of the Intellectual Property Right owned by InterDigital on the Database in each and every country where such Intellectual Property rights would exist.

“Intellectual Property Rights” means all copyrights, trademarks, trade secrets, patents, mask works and other intellectual property rights recognized in any jurisdiction worldwide, including all applications and registrations with respect thereto.

“Materials” means the relevant Flickr images IDs and textual metadata from which the Database has been built and that are provided with the Database in the only intend to ease the use of the Database.  

License

InterDigital grants Licensee a free, worldwide, non-exclusive, license on copyright owned on the Database to download, use and reproduce solely for the Authorized Purpose for the Limited Period.  

Restrictions on use 

Licensee shall not remove, obscure or modify any copyright, trademark or other proprietary rights notices, marks or labels contained on or within the Database, falsify or delete any author attributions, legal notices or other labels of the origin or source of the Materials.

Without prior written approval from InterDigital, the Database in whole or in part, shall not be further distributed, published, copied, or disseminated in any way or form whatsoever. For the avoidance of any doubt, this prohibition does not include further distributing, copying or disseminating to a different facility or organizational unit in the same requesting university, organization, or company.

Without prior written approval from InterDigital, the Database in whole or in part, may not be modified or used for commercial purposes.

In no case should the Database be used in any way that could directly or indirectly harm InterDigital. InterDigital permits publication (paper or web-based) of the data for scientific purposes only. Any other publication without scientific and academic value is strictly prohibited.

Ownership

Title to and ownership of the Database, the Documentation and/or any Intellectual Property Right protecting the Database shall, at all times, remain with InterDigital. Licensee agrees that except for the rights granted on copyright on the Database set forth in Section 2 above, in no event does anything in this Agreement grant, provide or convey any other rights, immunities or interest in or to any Intellectual Property Rights (including especially patents) of InterDigital or any of its Affiliates whether by implication, estoppel or otherwise.  

Publication/Communication

Any publication or oral communication regarding the Database shall be elaborated in good faith and shall not be driven by a deliberate will to denigrate InterDigital or any of its products. In any publication and on any support joined to an oral communication (for instance a PowerPoint document) resulting from the use of the Database, the following statement/citation shall be inserted:

The dataset was provided by InterDigital and is described in the following publication:

E.Berson, N. Q. K. Duong and C.-H. Demarty. Collecting, Analyzing and Predicting Socially-Driven Image Interestingness. In Proceedings of the EUSIPCO Conference, Spain, 2019.  

In any oral communication resulting from the use of the Database, the Licensee shall orally indicate that the Database are InterDigital‘s property.  

No Warranty - Disclaimer

THE DATABASE ARE PROVIDED TO LICENSEE ON AN “AS IS” BASIS. INTERDIGITAL MAKES NO WARRANTY THAT THE LICENSED DATABASE WILL OPERATE ON ANY PARTICULAR HARDWARE, PLATFORM, OR ENVIRONMENT. THERE IS NO WARRANTY THAT THE OPERATION OF THE LICENSED DATABASE SHALL BE UNINTERRUPTED, WITHOUT BUGS OR ERROR-FREE. THE DATABASE AND DOCUMENTATION ARE PROVIDED HEREUNDER WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY IMPLIED LIABILITIES AND WARRANTIES OF NONINFRINGEMENT OF INTELLECTUAL PROPERTY, FREEDOM FROM INHERENT DEFECTS, CONFORMITY TO A SAMPLE OR MODEL, MERCHANTABILITY, FITNESS AND/OR SUITABILITY FOR A SPECIFIC OR GENERAL PURPOSE AND THOSE ARISING BY STATUTE OR BY LAW, OR FROM A CAUSE OF DEALING OR USAGE OF TRADE.  

Hence, the Licensee uses the Database at his own cost, risks, and responsibility. InterDigital shall not be liable for any damage that could arise to Licensee by using the Database, either in accordance with this Agreement or not.

InterDigital shall not be liable for any consequential or indirect losses, including any indirect loss of profits, revenues, business, and/or anticipated savings, whether or not in the contemplation of the Parties at the time of entering into the Agreement unless expressly set out in the Agreement, or arising from gross negligence, willful misconduct or fraud.

Licensee agrees that it will defend, indemnify and hold harmless InterDigital and its Affiliates against any and all losses, damages, costs and expenses arising from a breach by the Licensee of any of its obligations or representations hereunder, including, without limitation, any third party, and/or any claims in connection with any such breach and/or any use of the Database, including any claim from third party arising from access, use or any other activity in relation to this Database.

The Licensee shall not make any warranty, representation, or commitment on behalf of InterDigital to any other third party. 

Term and Termination

This Agreement shall terminate at the end of the Limited Period unless earlier terminated by either party on the ground of material breach by the other party, which breach is not remedied after thirty (30) days advance written notice, specifying the breach with reasonable particularity and referencing this Agreement.  

General Provisions

12.1 Severability. If any provision of this Agreement shall be held to be in contravention of applicable law, this Agreement shall be construed as if such provision were not a part thereof, and in all other respects, the terms hereof shall remain in full force and effect.

12.2 Governing Law.  Regardless of the place of execution, delivery, performance or any other aspect of this Agreement, this Agreement and all of the rights of the parties under this Agreement shall be governed by, construed under and enforced in accordance with the substantive law of France without regard to conflicts of law principles. In case of a dispute that could not be settled amicably, the courts of Rennes shall be exclusively competent.

12.3 Assignment. InterDigital may assign this license to any third party. Such an assignment will be announced on the website as defined in article 5. Licensee may not assign this agreement to any third party without the previous written agreement from InterDigital.

The Principal Investigators can be contacted via lafinmanagement@interdigital.com