• About
  • Description
  • Download
  • Sources

About

The Long Range (LoRa) protocol for low-power wide-area networks (LPWANs) is a strong candidate to enable the massive roll-out of the Internet of Things (IoT) because of its low cost, impressive sensitivity (-137dBm), and massive scalability potential. As tens of thousands of tiny LoRa devices are deployed over large geographic areas, a key component to the success of LoRa will be the development of reliable and robust authentication mechanisms. We publicly share waveform data from 100 bit-similar devices (with identical manufacturing processes) over different deployment scenarios (outdoor vs. indoor) and spanning several days.

Description

Specifications table:

Subject LoRa IoT
Type of data LoRa technology I/Q datasets in SigMF format
How was data acquired   Hardware: Our datasets were transmitted using 100 bit-similar Pycom Pysense sensors connected using 100 bit-similar FiPy radios and collected using a USRP N-210 or B-210.  
Data format   Raw I/Q samples. The data is stored in two files (i) a dataset file, a binary file of the recorded digital samples, and (ii) the SigMF metafile, which contains information that describes the dataset in plain-text JSON format. Our binary and meta format is an extension of, and compatible with the SigMF specifications [1]. Moreover, we extend the SigMF meta format to incorporate LoRa specific details. The markdown files for the LoRa and environmental extensions that we created are located with the data in the link provided.  
Parameters for data collection   Our campaign was carried out over (i) several days, and (ii) diverse environments (Arena [2] ‘‘in-the-wild’’ and an outdoor environment).  
Description of data collection   SigMF data and meta recording of over-the-air LoRa transmissions collected in the indoor environment @ Arena testbed, which is located at Northeastern University ISEC building and an outdoor environment. A transmitter is composed of the Pysense board connected with FiPy radio. The receiver consists of a fixed USRPs B-210 or N-210. The data collected here is part of a 100-radio campaign by Northeastern and InderDigital, beginning in 2020.
Data source location  

Indoor dataset:

Northeastern University Arena Testbed, Boston, MA

Outdoor dataset:

A residential area, Franklin, MA

Data organization  

The data is organized in folders that are named as follows:

-        Indoor Dataset A 2020/Outdoor Dataset A 2020/Outdoor Dataset B 2020

    • Device Number (1-100)
      • Transmission number (1-10) 

Why is this data useful?

The data could be useful for several research areas including radio fingerprinting which includes the following steps:

  • Capturing good and representative dataset per device
  • Ensuring that the data is well annotated and captured in a standardized format to ensure its longevity
  • Extracting the unique radio’s impairments associated with the transmitted waveform
  • Employing the captured features in identifying the transmitter when a new waveform is received

Our datasets employed 100 bit-similar LoRa devices (with identical manufacturing processes) over different deployment scenarios (outdoor vs. indoor) and spanning several days. We labeled our dataset using SigMF-metafiles. Moreover, we extended this format to incorporate a new extension designed to capture environmental and LoRa specific information. Each SigMF record consists of (i) the binary file containing the IQ samples and (ii) the extended SigMF metadata file. Both SigMF records made available for the research community.

Dataset collection methodology the experimental testbeds:

We performed an extensive data collection campaign, where we employed 100 bit-similar Pycom Pysense sensors connected using 100 bit-similar Pycom FiPy radios [3]; as shown in top part of Figure 1. The radios in our setup operate on a carrier of 902.3 MHz in the 915 MHz ISM band. Each transmits ten consecutive bursts of packets in each location. Each burst is separated by 1 second. The burst consists of 100 consecutive packets separated by10 ms. Each packet contains the payload information carrying the temperature, the humidity, and the device voltage readings. A USRP N-210 or B-210 is synchronized to receive the transmitted packets in the data collection testbed. The data is stored in two files (i) a dataset file, a binary file of the recorded digital samples, and (ii) the SigMF metafile, which contains information that describes the dataset in plain-text JSON format. Our binary and meta format is an extension of, and compatible with the SigMF specifications [1]. Moreover, we extend the SigMF meta format to incorporate LoRa specific details.

Indoor testbed: This testbed, shown in the bottom part of Figure 1, is an open-access wireless testbed based on a grid of 8x8 VERT2450 antennas mounted on the ceiling of 2240 square ft in-door office-space environment. Each of the 64 antennas is cabled to a programmable SDR through low-attenuation coaxial cable enabling sub-6GHz 5G-and-beyond spectrum research. This testbed contains 24 SDRs controlled by 12 Dell Power Edge R340 running Ubuntu 16.04 LTS computational servers [2]. In this initial dataset, data is collected from only one of these antennas, as indicated in Figure 1.

Outdoor testbed: We repeated the indoor experiment but move all LoRa devices outside the building and replicated the indoor transmission scenario. The receiver and the gateway were kept inside the building. The outdoor experiment was conducted in a residential area.

LoRa Dataset

 

Related research article:

The data was collected as part of the work for the following research paper: 

Authors: Amani Al-Shawabka (Northeastern University, USA), Philip Pietraski (InterDigital Communications, USA), Sudhir B Pattar (InterDigital Communications, USA), Francesco Restuccia (Northeastern University, USA), Tommaso Melodia (Northeastern University, USA)

Title: DeepLoRa: Fingerprinting LoRa Devices at Scale Through Deep Learning and Data Augmentation. In The Twenty-second International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing (MobiHoc ’21), July 26–29, 2021, Shanghai, China.

Download

To download the data set, we ask you to provide your name, email address, and affiliation and to fill in and sign the DSLA (Data Set License Agreement) form available here, by which you agree to the terms of use described below. Then send an email to loradatamanagement@interdigital.com asking for the database, with the DSLA file attached and the above-required information. You will receive instructions on how to download the dataset via the provided email address.

We may store the data you supplied in order to contact you later about benchmark related matters. The data will not be used in any other way.

Terms of use

1. Commercial use

The user may only use the dataset for academic research. The user may not use the database for any commercial purposes. Commercial purposes include, but are not limited to:

  • proving the efficiency of commercial systems,
  • training or testing of commercial systems,
  • using screenshots of data from the database in advertisements,
  • selling data from the database,
  • creating military applications.

2. Distribution

The user may not distribute the dataset or portions thereof in any way, with the exception of using small portions of data for the exclusive purpose of clarifying academic publications or presentations. Note that publications will have to comply with the terms stated in article 4.

3. Access

The user may only use the database after the Data Set User License Agreement (DSLA) has been signed and returned to the dataset administrators. The signed DSLA should be returned in digital format by including it to the mail when requesting access to the dataset. Upon receipt of the DSLA, credentials to access the dataset will be provided. The user may not grant anyone access to the database by giving out their access credentials.

4. Publications

Publications include not only papers, but also presentations for conferences or educational purposes. All documents and papers that report on research that uses the Lora Radio Data Set will cite the following paper:

Al-Shawabka, A., Pietraski, P., Pattar, S.B., Restuccia, F., & Melodia, T. (2021, July 26-29). DeepLoRa: Fingerprinting LoRa Devices at Scale Through Deep Learning and Data Augmentation. [Paper presentation]. 22nd International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing, Shanghai, China.

5. Warranty

The database comes without any warranty, and any all warranties, express or implied, are disclaimed to the fullest extent permissible by law, including but not limited to fitness for a particular purpose, merchantability, and noninfringement. InterDigital, Inc., its affiliates, and any third party shall not be held liable in any manner or for any damage (physical, financial or otherwise) caused by the use of the database. Any conflict between these Terms of Use and the DSLA shall be resolved in favor of the DSLA.

Sources

[1] https://github.com/gnuradio/SigMF/blob/master/sigmf-spec.md

[2] Bertizzolo, L. Bonati, E. Demirors, T. Melodia, Arena: A 64-antenna SDR-based Ceiling Grid Testbed for Sub-       6 GHz Radio Spectrum Research, in: Proceedings of the 13th International Workshop on Wireless Network       Testbeds, Experimental Evaluation & Characterization, 2019, pp. 5–1

[3] https://pycom.io/product/fipy/