An Architecture for Internet of Things Applications
In looking at many different IoT applications, a fairly common architecture emerges. Like Brutalist architecture, these applications are rugged, hard, and uncompromising, with little concern for a human scale aesthetic:
At its core it is a six stage pipeline, wherein the data is processed in a sequence. Variations on this architecture can be generated by branching off at any one of the six stages, and repeating some or all of the stages for some sub-path:
The stages correspond to different application types that are typically used in IoT systems:
- Acquisition - gathering data from device management or connectivity platforms, such as ARM mbed Device Server, PubNub, and Stream's IoT-Xtend™ Platform.
- Enhancement - augmenting data with data at rest usually from databases like Riak, PostgreSQL or MySQL.
- Analysis - applying machine learning and other forms of applied statistics to the enhanced data, such as ParStream, Simularity, or DataScription.
- Filtering - attempts to remove non-actionable data from the stream, increasing the relavence of the data to the end user, such as search by NGData, or stream processing like SQLstream, or ScaleDB.
- Transformation - converts the filtered stream into actionable workflow processing like Bipio, Medium One, or ThingWorx, or through procedural scripting like scriptr;.
- Distribution - delivers the salient information to the user including monitoring systems like Circonus, reporting like JReport or managed data feeds by Apache Nifi
One of the great pleasures of working at wot.io is seeing the development of new systems architectures and their interplay with real world constraints. As Chief Scientist, I spend a lot of my time metering and monitoring the behavior of complex soft real-time systems. In addition to looking at real world systems, I also get to design new test equipment to simulate systems that may one day go into market.
One such tool is ripple, a messaging software analog to an arbitrary waveform generator. Rather than generating a signal by changing volts over time, it generates a message waveform measured in messages per second over time. Much of the behavior of distributed IoT systems is only understandable in terms of message rate and latency. In many ways, the design constraints of these systems are more like those in designing traces on a PCB than it is like designing software. A tool like ripple allows us to simulate different types of load upon various combinations of application infrastructure.
Not all applications behave the same way under load, and not all data flows are created equal. Variations in message content and size, choice of partitioning scheme, differences in network topology, and hardware utilization, can all affect the amount of latency any processing stage introduces into the data flow. The variability in the different data pathways can result in synchronization, ordering, serialization, and consistency issues across the result set.
Consider a case where an application is spread across a few hundred data centers around the world. Due to variations in maintenance, physical failures, and the nature of the internet itself, it is not uncommon for an entire data center to go offline for some period of time. This sort of event can cause an immense backlog of messages from what is now the "distant past" (ie yesterday) to come flooding in, changing the results of the past day's analysis and reports. This problem is not just limited to hardware failures, but is common when using remote satellite based communication schemes. In these cases, a compressed batch of past data may appear all at once at a periodic basis when the weather and satellite timing permit the transmission.
Ripple was designed with these issues in mind, to make it easier to simulate these sorts of what-if scenarios we have encountered with real world systems. For our simulations, we use RabbitMQ as a message bus. It provides a reliable distributed queuing system, that is extensible. It is also convenient to use a protocol like AMQP for data interchange between processes, as it is well supported across languages. The core functionality ripple consists of:
- modeling application topologies within RabbitMQ
- creating pools of consumers which stand in for applications
- forwarding with delays which allow for simulating different latency characteristics of an application
- generating arbitrary patterns of messaging over time
- simulating "noisy networks" where in message rates vary by some random noise factor
In the next blog post, I will describe the ways to use ripple to simulate a number of different real world systems, and describe some of the architectural concepts that can address the observed behaviors.