Blog

The making of a Riak data service for wot.io

Nov 2015/ Posted By: wotio team

<p>Today, we wanted to give you a peek under the hood and walk through an example of how wot.io and our partners make <a href="http://wot.io">wot.io</a> data services available in the wot.io data service exchange. In particular, we will show how to create a <a href="http://basho.com/products">Riak</a> data store cluster. Naturally, we will be doing this using <a href="http://docker.com">Docker</a> since Docker containers make up the building blocks of the wot.io operating environment&trade;.</p>
<p>Riak is a key-value, No-SQL data store from <a href="http://basho.com">Basho</a>. And it's one of the data services available in wot.io's <a href="http://wot.io">data service exchange&trade;</a>. Let's see how it is done.</p>
<h2 id="initializingariakclusterwithdocker">Initializing a Riak cluster with Docker</h2>
<p>Often times, the first step in getting a service to run as a wot.io data service is to create a <a href="http://docker.com">Docker</a> container for it. In this case, much of our work has already been done for us, as a suitable recipe is already available on the public <a href="https://hub.docker.com/">Docker Hub</a>.</p>
<p>For the purposes of this example, we will be instantiating two Docker containers, each with its own Riak instance. Once we confirm that they are running successfully as independents, we will join them together into a cluster.</p>
<p>To get started, we pull the latest Riak docker image from <code>devdb/riak</code> repository on the public Docker registry.</p>
<pre><code>$ docker pull devdb/riak:latest
</code></pre>
<p>It helps to pre-download the Riak docker image into the local Docker cache.</p>
<h3 id="startingupriak1instance">Starting up riak1 instance</h3>
<p>Once the image has been stored in the Docker cache, we are now ready to kick off the first Riak instance.</p>
<pre><code>$ mkdir ~/riak_storage; cd ~/riak_storage
$ docker -H unix:///var/run/docker.sock run --dns 8.8.8.8 --name riak1 -i -d -p 18098:8098 -p 18087:8087 -v `pwd`/data:/var/lib/riak -t devdb/riak:latest
</code></pre>
<p>The Riak initialization will take a few minutes to complete. Once it's finished, we will be able to check that the instance is empty using the handy HTTP REST interface that Riak exposes:</p>
<pre><code># querying data store on riak1
$ curl -s localhost:18098/buckets?buckets=true | jq .
{
"buckets": [] # 0 items
}
</code></pre>
<p>This shows that there are no buckets in the data store currently. That's ok. We'll populate the data store in a minute.</p>
<h3 id="startingupriak2instance">Starting up riak2 instance</h3>
<p>Let's go ahead and instantiate the second Riak container as well</p>
<pre><code>$ cd ~/riak_storage
$ docker -H unix:///var/run/docker.sock run --dns 8.8.8.8 --name riak2 -i -d -p 28098:8098 -p 28087:8087 -v `pwd`/data2:/var/lib/riak -t devdb/riak:latest
</code></pre>
<p>and confirm that it, too, is empty</p>
<pre><code># querying data store on riak2
$ curl -s localhost:28098/buckets?buckets=true|jq .
{
"buckets": [] # 0 items
}
</code></pre>
<h2 id="injectingdataintoriak1datastore">Injecting data into riak1 data store</h2>
<p>Now that both Riak instances are up and running, we are ready to populate one of the instances with some test data. Once again, we can use the <code>curl</code> tool to place data on <strong>riak1</strong> using the HTTP REST interface.</p>
<pre><code># populating with for loop
for i in $(seq 1 5); do
curl -XPOST -d"content for testkey-${i}" \
localhost:18098/buckets/testbucket/keys/testkey-${i}
done
</code></pre>
<h3 id="checkingcontentsonriak1">Checking contents on riak1</h3>
<p>Now that it has some data, querying <strong>riak1</strong> should confirm for us that our POSTs had been successful</p>
<pre><code># querying data store on riak1
$ curl -s localhost:18098/buckets?buckets=true | jq .
{
"buckets": [
"testbucket" # 1 item
]
}
</code></pre>
<p>We found the Riak bucket named 'testbucket' that we created earlier. Showing what's inside 'testbucket' we can see:</p>
<pre><code>$ curl -s localhost:18098/buckets/testbucket/keys?keys=true | jq .
{
"keys": [
"testkey-1",
"testkey-5",
"testkey-4",
"testkey-2",
"testkey-3"
]
} # 5 keys
</code></pre>
<p>Querying one particular key, we also have:</p>
<pre><code>$ curl -s localhost:18098/buckets/testbucket/keys/testkey-5
content for testkey-5
</code></pre>
<h3 id="meanwhileriak2remainsempty">Meanwhile, riak2 remains empty...</h3>
<p>We can check that the data store on <strong>riak2</strong> hasn't been touched.</p>
<pre><code># querying data store on riak2 again
$ curl -s localhost:28098/buckets?buckets=true|jq .
{
"buckets": []
}
</code></pre>
<p>So far, <strong>riak2</strong> instance remains empty. In other words, so far we have two <em>independent</em> Riak data stores. But we wanted a Riak <em>cluster</em>...</p>
<h2 id="joiningthetworiakinstancesintoacluster">Joining the two Riak instances into a cluster</h2>
<p>We are now ready to join the two Riak instances, but before we do, we'll have to collect some information about them. We need to find the IP addresses of each of the containers.</p>
<p>To confirm the status of the Riak instances, we can check the <code>member-status</code> of the independent instances. This command happens to also tell us the container IP addresses. We can run <code>member-status</code> using the <code>docker exec</code> command for <strong>riak1</strong>:</p>
<pre><code># checking member-status on riak1
$ docker exec riak1 riak-admin member-status
============================ Membership =============================
Status Ring Pending Node
---------------------------------------------------------------------
valid 100.0% -- 'riak@172.17.5.247' # 1 result
---------------------------------------------------------------------
Valid:1 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
</code></pre>
<p>and again for <strong>riak2</strong>:</p>
<pre><code># checking member-status on riak2
$ docker exec riak2 riak-admin member-status
============================ Membership =============================
Status Ring Pending Node
---------------------------------------------------------------------
valid 100.0% -- 'riak@172.17.5.248' # 1 result
---------------------------------------------------------------------
Valid:1 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
</code></pre>
<p>Noting the IP addresses (for <strong>riak1</strong>: 172.17.5.247 and for <strong>riak2</strong>: 172.17.5.248), we can proceed to join <strong>riak2</strong> instance onto the <strong>riak1</strong> instance. To do so, we will run run 3 Riak commands: <code>riak-join</code>, <code>riak-plan</code> and <code>riak-commit</code>.</p>
<p>The <code>riak-join</code> command will basically register the connection on the two machines.</p>
<pre><code>$ docker exec riak2 riak-admin cluster join riak@172.17.5.247
Success: staged join request for 'riak@172.17.5.248' to 'riak@172.17.5.247'
</code></pre>
<p>The <code>riak-plan</code> command will report the connection info.</p>
<pre><code>$ docker exec riak2 riak-admin cluster plan
========================== Staged Changes ===========================
Action Details(s)
---------------------------------------------------------------------
join 'riak@172.17.5.248'
---------------------------------------------------------------------

NOTE: Applying these changes will result in 1 cluster transition

###################################################################
After cluster transition 1/1
###################################################################

============================ Membership =============================
Status Ring Pending Node
---------------------------------------------------------------------
valid 100.0% 50.0% 'riak@172.17.5.247'
valid 0.0% 50.0% 'riak@172.17.5.248'
---------------------------------------------------------------------
Valid:2 / Leaving:0 / Exiting:0 / Joining:0 / Down:0

WARNING: Not all replicas will be on distinct nodes

Transfers resulting from cluster changes: 32
32 transfers from 'riak@172.17.5.247' to 'riak@172.17.5.248'
</code></pre>
<p>And finally, the <code>riak-commit</code> will save the changes.</p>
<pre><code>$ docker exec riak2 riak-admin cluster commit
Cluster changes committed
</code></pre>
<p>Once you see this message, the two data stores will begin the cluster building process. The information on the two data stores will start to be synced.</p>
<h3 id="confirmingthedatastoresareclusteredcorrectly">Confirming the data stores are clustered correctly</h3>
<p>Now we can check the cluster status. If we immediately run <code>member-status</code> right after the <code>riak-commit</code>, we will see that membership ring at this state:</p>
<pre><code>$ docker exec riak2 riak-admin member-status
=========================== Membership ============================
Status Ring Pending Node
---------------------------------------------------------------------
valid 100.0% 50.0% 'riak@172.17.5.247'
valid 0.0% 50.0% 'riak@172.17.5.248'
---------------------------------------------------------------------
</code></pre>
<h3 id="afterdistributiontime">After distribution time</h3>
<p>Since <strong>riak1</strong> was populated with only our test entries, then it won't take long to distribute. Once the distribution is finished, the clustering will be completed. You will see:</p>
<pre><code>$ docker exec riak2 riak-admin member-status
============================ Membership =============================
Status Ring Pending Node
---------------------------------------------------------------------
valid 50.0% -- 'riak@172.17.5.247'
valid 50.0% -- 'riak@172.17.5.248'
---------------------------------------------------------------------
Valid:2 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
</code></pre>
<h3 id="checkingcontentsonriak2">Checking contents on riak2</h3>
<p>Now that the distribution has completed, listing the buckets from <strong>riak2</strong> will show the cloned dataset from the data store <strong>riak1</strong>.</p>
<pre><code># querying the buckets (now) on riak2
$ curl -s localhost:28098/buckets?buckets=true|jq .
{
"buckets": [
"testbucket"
]
}
</code></pre>
<p>And querying the <code>testbucket</code> shows our keys, as expected:</p>
<pre><code># querying the buckets (now) on riak2
$ curl -s localhost:28098/buckets/testbucket/keys?keys=true|jq .
{
"keys": [
"testkey-3",
"testkey-4",
"testkey-2",
"testkey-5",
"testkey-1"
]
}
</code></pre>
<p>And of course, querying one of these keys, we get:</p>
<pre><code># querying the key (now) on riak2
$ curl -s localhost:28098/buckets/testbucket/keys/testkey-5
content for testkey-5
</code></pre>
<p>Note that the results from <strong>riak2</strong> are the same as that from <strong>riak1</strong>. This is a basic example of how Riak clustering works and how a basic cluster can be used to distribute/clone the data store.</p>
<h2 id="encapsulatingtheriakclusterasawotiodataservice">Encapsulating the Riak cluster as a wot.io data service</h2>
<p>Now that we have the ability to instantiate two separate Riak instances as Docker containers and join them together into a single logical cluster, we have all the ingredients for a wot.io data service.</p>
<p>We would simply need to modify the Dockerfile recipe so that the <code>riak-join</code>, <code>riak-plan</code> and <code>riak-commit</code> commands are run when each container starts up. While this na&iuml;ve mechanism works, it suffers from a couple of drawbacks:</p>
<ul>
<li>Each cluster node would require its own Docker image because the startup commands are different (i.e., one node's commands have <strong>riak1</strong> as "source" and <strong>riak2</strong> as "target", while the other node's commands are reversed).</li>
<li>The IP addresses of the Riak nodes are hard coded, dramatically reducing the portability and deployability of our data service.</li>
</ul>
<p>There are other details in making a data service production ready. For example, a production data service would probably want to expose decisions like cluster size as configuration parameters. wot.io addresses these concerns with our declarative Configuration Service for orchestrating containers as data services across our operating environment. To complete the system, we would also add adapters to allow any other wot.io service to communicate with the Riak service, sending or querying data. But these are other topics for another day.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Today we've taken a brief peek under the hood of creating a wot.io data service. Thankfully, most customers would never encounter any of the complexities described in this post, because wot.io or one of its <a href="http://www.wot.io/partners/">partners</a> have already done all the heavy lifting.</p>
<p>If you are interested in making <em>your</em> data service available on the wot.io data service exchange, check out our <a href="http://wot.io/wot-io-partner-programs">partner program</a> and we'll help get you connected to the Internet of Things through an interoperable collection of device management platforms and data services.</p>