Description
Specifications table:
Subject |
LoRa IoT |
Type of data |
LoRa technology I/Q datasets in SigMF format |
How was data acquired |
Hardware: Our datasets were transmitted using 100 bit-similar Pycom Pysense sensors connected using 100 bit-similar FiPy radios and collected using a USRP N-210 or B-210. |
Data format |
Raw I/Q samples. The data is stored in two files (i) a dataset file, a binary file of the recorded digital samples, and (ii) the SigMF metafile, which contains information that describes the dataset in plain-text JSON format. Our binary and meta format is an extension of, and compatible with the SigMF specifications [1]. Moreover, we extend the SigMF meta format to incorporate LoRa specific details. The markdown files for the LoRa and environmental extensions that we created are located with the data in the link provided. |
Parameters for data collection |
Our campaign was carried out over (i) several days, and (ii) diverse environments (Arena [2] ‘‘in-the-wild’’ and an outdoor environment). |
Description of data collection |
SigMF data and meta recording of over-the-air LoRa transmissions collected in the indoor environment @ Arena testbed, which is located at Northeastern University ISEC building and an outdoor environment. A transmitter is composed of the Pysense board connected with FiPy radio. The receiver consists of a fixed USRPs B-210 or N-210. The data collected here is part of a 100-radio campaign by Northeastern and InderDigital, beginning in 2020. |
Data source location |
Indoor dataset:
Northeastern University Arena Testbed, Boston, MA
Outdoor dataset:
A residential area, Franklin, MA
|
Data organization |
The data is organized in folders that are named as follows:
- Indoor Dataset A 2020/Outdoor Dataset A 2020/Outdoor Dataset B 2020
- Device Number (1-100)
- Transmission number (1-10)
|
Why is this data useful?
The data could be useful for several research areas including radio fingerprinting which includes the following steps:
- Capturing good and representative dataset per device
- Ensuring that the data is well annotated and captured in a standardized format to ensure its longevity
- Extracting the unique radio’s impairments associated with the transmitted waveform
- Employing the captured features in identifying the transmitter when a new waveform is received
Our datasets employed 100 bit-similar LoRa devices (with identical manufacturing processes) over different deployment scenarios (outdoor vs. indoor) and spanning several days. We labeled our dataset using SigMF-metafiles. Moreover, we extended this format to incorporate a new extension designed to capture environmental and LoRa specific information. Each SigMF record consists of (i) the binary file containing the IQ samples and (ii) the extended SigMF metadata file. Both SigMF records made available for the research community.
Dataset collection methodology the experimental testbeds:
We performed an extensive data collection campaign, where we employed 100 bit-similar Pycom Pysense sensors connected using 100 bit-similar Pycom FiPy radios [3]; as shown in top part of Figure 1. The radios in our setup operate on a carrier of 902.3 MHz in the 915 MHz ISM band. Each transmits ten consecutive bursts of packets in each location. Each burst is separated by 1 second. The burst consists of 100 consecutive packets separated by10 ms. Each packet contains the payload information carrying the temperature, the humidity, and the device voltage readings. A USRP N-210 or B-210 is synchronized to receive the transmitted packets in the data collection testbed. The data is stored in two files (i) a dataset file, a binary file of the recorded digital samples, and (ii) the SigMF metafile, which contains information that describes the dataset in plain-text JSON format. Our binary and meta format is an extension of, and compatible with the SigMF specifications [1]. Moreover, we extend the SigMF meta format to incorporate LoRa specific details.
Indoor testbed: This testbed, shown in the bottom part of Figure 1, is an open-access wireless testbed based on a grid of 8x8 VERT2450 antennas mounted on the ceiling of 2240 square ft in-door office-space environment. Each of the 64 antennas is cabled to a programmable SDR through low-attenuation coaxial cable enabling sub-6GHz 5G-and-beyond spectrum research. This testbed contains 24 SDRs controlled by 12 Dell Power Edge R340 running Ubuntu 16.04 LTS computational servers [2]. In this initial dataset, data is collected from only one of these antennas, as indicated in Figure 1.
Outdoor testbed: We repeated the indoor experiment but move all LoRa devices outside the building and replicated the indoor transmission scenario. The receiver and the gateway were kept inside the building. The outdoor experiment was conducted in a residential area.

Related research article:
The data was collected as part of the work for the following research paper:
Authors: Amani Al-Shawabka (Northeastern University, USA), Philip Pietraski (InterDigital Communications, USA), Sudhir B Pattar (InterDigital Communications, USA), Francesco Restuccia (Northeastern University, USA), Tommaso Melodia (Northeastern University, USA)
Title: DeepLoRa: Fingerprinting LoRa Devices at Scale Through Deep Learning and Data Augmentation. In The Twenty-second International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing (MobiHoc ’21), July 26–29, 2021, Shanghai, China.