IoT Architecture in Simulation

Oct 2015/ Posted By: wotio team

In my <a href="http://labs.wot.io/an-architecture-for-internet-of-things-applications/">last blog post</a>, I discussed a sample architecture for an IoT application:
<img src="http://idfiles.leveelabs.com/55bd0288af0b0930ba599bd0c4b7ca38/resources/img_new/labs_wot_io/architecture-1.png" alt="Sample IoT Architecture" />
where in the data is passed through a series of successive stages:
<ul>
<li>Acquisition - receiving data from the sensor farm</li>
<li>Enhancement - augmenting data in motion with data at rest</li>
<li>Analysis - applying machine learning and statistics to the data</li>
<li>Filtering - removing non-actionable data and noise</li>
<li>Transformation - converting it into an actionable format</li>
<li>Distribution - delivering to the end user or application</li>
</ul>
This architecture is based on a number of real world deployments that have been in production for more than a couple years. Each of these deployments share a number of problems in common relating to how the system architecture influences the tradeoffs between cost, throughput, and latency. These three factors are the most common real world constraints that must be taken into account when designing an IoT solution:
<ul>
<li>Cost - the money, time, and mindshare sunk into the system</li>
<li>Throughput - the volume of messages over time the system can handle</li>
<li>Latency - the time it takes for data to translate to action</li>
</ul>
At <a href="http://wot.io">wot.io</a>, we have found it necessary to build new software test equipment to better model the behavior of our production systems. Most existing load testing and modeling tools do not deal well with highly heterogenous distributed networks of applications. Towards this end, we have produced tooling like <a href="https://github.com/wotio/ripple">wotio/ripple</a> for modeling the behavior of data services:
<iframe src="https://www.youtube.com/embed/NPO5oiJoCjc" frameborder="0" width="560" height="315" allowfullscreen="allowfullscreen"></iframe>
In the above video, I simulated an application in which 1750 messages per minute, were generated in a spiky fashion similar to a couple real world systems we have encountered. Anyone who has seen a mains powered sensor farm come on after a blackout will recognize this pattern.
<img src="http://idfiles.leveelabs.com/55bd0288af0b0930ba599bd0c4b7ca38/resources/img_new/labs_wot_io/ripplea.png" alt="exchange A" />
This is a typical pattern which results when the device designers assume that the devices will come online at random times, or decide to lockstep the message sending to a GPS clock. This acquisition phase behavior can be very noisy depending on the environmental characteristics.
The next step, we simulate some acquisition and enhancement phase activity of adding data to the data in motion by querying a database. To do this, we add a 10 second delay to each of the messages. The time shifted signal looks like:
<img src="http://idfiles.leveelabs.com/55bd0288af0b0930ba599bd0c4b7ca38/resources/img_new/labs_wot_io/rippleb.png" alt="exchange B" />
The ripple software allows for simulating a delay ramp, wherein the delay increases over time based on the number of messages through the system as well. This can be invaluable for simulating systems that suffer from performance degradation due to the volume of data stored in the system. For this sample simulation, however, I've stuck with a fixed 10 second delay. Being able to simulate delays in processing can be invaluable when multiple streams of data must be coordinated.
Another common constraint one encounters is a cost vs throughput constraint. For example, you may want to license a software application that is restricted in the number of CPUs per unit price. The business may only be able to afford enough CPU licenses to account for sufficient throughput of the per minute volume, but not the instantaneous volume.
<img src="http://idfiles.leveelabs.com/55bd0288af0b0930ba599bd0c4b7ca38/resources/img_new/labs_wot_io/ripplec.png" alt="exchange C" />
For these sorts of applications, we can simulate a maximum rate limit on the application. The ripple.c exchange above demonstrates the stretching of the input signal due to queueing that data between exchanges B and C. Here, we're simulating a 40 messages per second throughput limit. Theoretically, this system could process 40 * 60 = 2400 messages per minute, which is sufficient to handle our 1750 messages per minute load, but at a cost of adding latency:
<img src="http://idfiles.leveelabs.com/55bd0288af0b0930ba599bd0c4b7ca38/resources/img_new/labs_wot_io/latency.png" alt="Latency over Time" />
Here we can see the impact of this queuing on the per message latency over time. The above graph shows about 4 minutes of messages, and the per message latency of each. The reason for this is the messages are enqueued due to not being able to process them as fast as they are coming in briefly:
<img src="http://idfiles.leveelabs.com/55bd0288af0b0930ba599bd0c4b7ca38/resources/img_new/labs_wot_io/queueb.png" alt="Queue B" />
This sawtooth graph is a result of feeding more data into the system than the rate limited process can remove it. This behavior results in highly variable latency across the lifespan of the application:
<img src="http://idfiles.leveelabs.com/55bd0288af0b0930ba599bd0c4b7ca38/resources/img_new/labs_wot_io/hist.png" alt="Latency Historgram" />
In this histogram of the 4 minute sample, you can see a spike around 10s of latency. This spike accounts for roughly 1/8th of all of the messages.The other 7/8ths of the messages however, range from 10s of latency to over 35s of latency. This variability in latency is a classic tradeoff that many IoT systems need to make in the real world. If you are expecting to act upon this data, it is important to understand how that latency impacts the timeliness of your decision.
By combining both delays and rate limits, along with different generator patterns, we can better develop models of how our systems behave under load long before they go to production. With <a href="https://github.com/wotio/ripple">wotio/ripple</a>, we were careful to keep our test generation, application simulation, and our analysis phases decoupled. The message generator and the latency report generators are separate servers capable of being run on different hardware. As the software is written in <a href="http://www.erlang.org/">Erlang</a>, it is easy to distribute across a number of Erlang VMs running on a cluster, and through Erlang's built in clustering, can be coordinated from a single shell session.
The test program used to generate the above graphs and topology is as follows:
<script src="https://gist.github.com/cthulhuology/d3f709612e1d3e6a7f10.js"></script>
This sample file demonstrates the following features:
<ul>
<li>consume, Source, Filename - consumes messages from Source and logs their latency to Filename</li>
<li>pipe, Source, Sink - consume messages from Source and forward to Sink as fast as possible</li>
<li>limit, Source, Sink, Rate - consume messages from Source and forward to Sink at a maximum rate of Rate messages per second</li>
<li>delay, Source, Sink, Base, Ramp - consume messages from Source and forward to Sink with a Base delay in ms with Ramp ms delay added for each message processed</li>
<li>generate, Message, Pattern - send the sample test message (with additional timestamp header) at a rate of messages per second specified in the Pattern.</li>
</ul>
In the near future, we will be adding support for message templates, sample message pools, and message filtering to the publicly released version of the tools. But I hope this gives you some additional tools in your toolbox for developing your IoT applications.

Blog

Blog

IoT Architecture in Simulation

READY TO LEARN MORE ?

IoT Architecture in Simulation

Related Content