Blog

The making of a Riak data service for wot.io

November 19, 2015 / riak, basho, data service exchange, docker / Posted By: wotio team

Today, we wanted to give you a peek under the hood and walk through an example of how wot.io and our partners make wot.io data services available in the wot.io data service exchange. In particular, we will show how to create a Riak data store cluster. Naturally, we will be doing this using Docker since Docker containers make up the building blocks of the wot.io operating environment™.

Riak is a key-value, No-SQL data store from Basho. And it's one of the data services available in wot.io's data service exchange™. Let's see how it is done.

Initializing a Riak cluster with Docker

Often times, the first step in getting a service to run as a wot.io data service is to create a Docker container for it. In this case, much of our work has already been done for us, as a suitable recipe is already available on the public Docker Hub.

For the purposes of this example, we will be instantiating two Docker containers, each with its own Riak instance. Once we confirm that they are running successfully as independents, we will join them together into a cluster.

To get started, we pull the latest Riak docker image from devdb/riak repository on the public Docker registry.

$ docker pull devdb/riak:latest

It helps to pre-download the Riak docker image into the local Docker cache.

Starting up riak1 instance

Once the image has been stored in the Docker cache, we are now ready to kick off the first Riak instance.

$ mkdir ~/riak_storage; cd ~/riak_storage
$ docker -H unix:///var/run/docker.sock run --dns 8.8.8.8 --name riak1 -i -d -p 18098:8098 -p 18087:8087  -v `pwd`/data:/var/lib/riak -t devdb/riak:latest

The Riak initialization will take a few minutes to complete. Once it's finished, we will be able to check that the instance is empty using the handy HTTP REST interface that Riak exposes:

# querying data store on riak1 
$ curl -s localhost:18098/buckets?buckets=true | jq .
{
  "buckets": []      # 0 items
}

This shows that there are no buckets in the data store currently. That's ok. We'll populate the data store in a minute.

Starting up riak2 instance

Let's go ahead and instantiate the second Riak container as well

$ cd ~/riak_storage
$ docker -H unix:///var/run/docker.sock run --dns 8.8.8.8 --name riak2 -i -d -p 28098:8098 -p 28087:8087  -v `pwd`/data2:/var/lib/riak -t devdb/riak:latest

and confirm that it, too, is empty

# querying data store on riak2
$ curl -s localhost:28098/buckets?buckets=true|jq .
{
  "buckets": []      # 0 items
}

Injecting data into riak1 data store

Now that both Riak instances are up and running, we are ready to populate one of the instances with some test data. Once again, we can use the curl tool to place data on riak1 using the HTTP REST interface.

# populating with for loop
for i in $(seq 1 5); do
  curl -XPOST -d"content for testkey-${i}" \
    localhost:18098/buckets/testbucket/keys/testkey-${i} 
done

Checking contents on riak1

Now that it has some data, querying riak1 should confirm for us that our POSTs had been successful

# querying data store on riak1
$ curl -s localhost:18098/buckets?buckets=true | jq .
{
  "buckets": [
    "testbucket"      # 1 item
  ]
}

We found the Riak bucket named 'testbucket' that we created earlier. Showing what's inside 'testbucket' we can see:

$ curl -s localhost:18098/buckets/testbucket/keys?keys=true | jq .
{
  "keys": [
    "testkey-1",
    "testkey-5",
    "testkey-4",
    "testkey-2",
    "testkey-3"
  ]
}      # 5 keys

Querying one particular key, we also have:

$ curl -s localhost:18098/buckets/testbucket/keys/testkey-5
content for testkey-5

Meanwhile, riak2 remains empty...

We can check that the data store on riak2 hasn't been touched.

# querying data store on riak2 again
$ curl -s localhost:28098/buckets?buckets=true|jq .
{
  "buckets": []
}

So far, riak2 instance remains empty. In other words, so far we have two independent Riak data stores. But we wanted a Riak cluster...

Joining the two Riak instances into a cluster

We are now ready to join the two Riak instances, but before we do, we'll have to collect some information about them. We need to find the IP addresses of each of the containers.

To confirm the status of the Riak instances, we can check the member-status of the independent instances. This command happens to also tell us the container IP addresses. We can run member-status using the docker exec command for riak1:

# checking member-status on riak1
$ docker exec riak1 riak-admin member-status
============================ Membership =============================
Status     Ring    Pending    Node
---------------------------------------------------------------------
valid     100.0%      --      'riak@172.17.5.247'   # 1 result
---------------------------------------------------------------------
Valid:1 / Leaving:0 / Exiting:0 / Joining:0 / Down:0

and again for riak2:

# checking member-status on riak2
$ docker exec riak2 riak-admin member-status
============================ Membership =============================
Status     Ring    Pending    Node
---------------------------------------------------------------------
valid     100.0%      --      'riak@172.17.5.248'   # 1 result
---------------------------------------------------------------------
Valid:1 / Leaving:0 / Exiting:0 / Joining:0 / Down:0

Noting the IP addresses (for riak1: 172.17.5.247 and for riak2: 172.17.5.248), we can proceed to join riak2 instance onto the riak1 instance. To do so, we will run run 3 Riak commands: riak-join, riak-plan and riak-commit.

The riak-join command will basically register the connection on the two machines.

$ docker exec riak2 riak-admin cluster join riak@172.17.5.247
Success: staged join request for 'riak@172.17.5.248' to 'riak@172.17.5.247'

The riak-plan command will report the connection info.

$ docker exec riak2 riak-admin cluster plan
========================== Staged Changes ===========================
Action         Details(s)
---------------------------------------------------------------------
join           'riak@172.17.5.248'
---------------------------------------------------------------------

NOTE: Applying these changes will result in 1 cluster transition

###################################################################
                   After cluster transition 1/1
###################################################################

============================ Membership =============================
Status     Ring    Pending    Node
---------------------------------------------------------------------
valid     100.0%     50.0%    'riak@172.17.5.247'
valid       0.0%     50.0%    'riak@172.17.5.248'
---------------------------------------------------------------------
Valid:2 / Leaving:0 / Exiting:0 / Joining:0 / Down:0

WARNING: Not all replicas will be on distinct nodes

Transfers resulting from cluster changes: 32
  32 transfers from 'riak@172.17.5.247' to 'riak@172.17.5.248'

And finally, the riak-commit will save the changes.

$ docker exec riak2 riak-admin cluster commit
Cluster changes committed

Once you see this message, the two data stores will begin the cluster building process. The information on the two data stores will start to be synced.

Confirming the data stores are clustered correctly

Now we can check the cluster status. If we immediately run member-status right after the riak-commit, we will see that membership ring at this state:

$ docker exec riak2 riak-admin member-status
=========================== Membership ============================
Status     Ring    Pending    Node
---------------------------------------------------------------------
valid     100.0%     50.0%    'riak@172.17.5.247'
valid       0.0%     50.0%    'riak@172.17.5.248'
---------------------------------------------------------------------

After distribution time

Since riak1 was populated with only our test entries, then it won't take long to distribute. Once the distribution is finished, the clustering will be completed. You will see:

$ docker exec riak2 riak-admin member-status
============================ Membership =============================
Status     Ring    Pending    Node
---------------------------------------------------------------------
valid      50.0%      --      'riak@172.17.5.247'
valid      50.0%      --      'riak@172.17.5.248'
---------------------------------------------------------------------
Valid:2 / Leaving:0 / Exiting:0 / Joining:0 / Down:0

Checking contents on riak2

Now that the distribution has completed, listing the buckets from riak2 will show the cloned dataset from the data store riak1.

# querying the buckets (now) on riak2
$ curl -s localhost:28098/buckets?buckets=true|jq .
{
  "buckets": [
    "testbucket"
  ]
}

And querying the testbucket shows our keys, as expected:

# querying the buckets (now) on riak2
$ curl -s localhost:28098/buckets/testbucket/keys?keys=true|jq .
{
  "keys": [
    "testkey-3",
    "testkey-4",
    "testkey-2",
    "testkey-5",
    "testkey-1"
  ]
}

And of course, querying one of these keys, we get:

# querying the key (now) on riak2
$ curl -s localhost:28098/buckets/testbucket/keys/testkey-5
content for testkey-5

Note that the results from riak2 are the same as that from riak1. This is a basic example of how Riak clustering works and how a basic cluster can be used to distribute/clone the data store.

Encapsulating the Riak cluster as a wot.io data service

Now that we have the ability to instantiate two separate Riak instances as Docker containers and join them together into a single logical cluster, we have all the ingredients for a wot.io data service.

We would simply need to modify the Dockerfile recipe so that the riak-join, riak-plan and riak-commit commands are run when each container starts up. While this naïve mechanism works, it suffers from a couple of drawbacks:

  • Each cluster node would require its own Docker image because the startup commands are different (i.e., one node's commands have riak1 as "source" and riak2 as "target", while the other node's commands are reversed).
  • The IP addresses of the Riak nodes are hard coded, dramatically reducing the portability and deployability of our data service.

There are other details in making a data service production ready. For example, a production data service would probably want to expose decisions like cluster size as configuration parameters. wot.io addresses these concerns with our declarative Configuration Service for orchestrating containers as data services across our operating environment. To complete the system, we would also add adapters to allow any other wot.io service to communicate with the Riak service, sending or querying data. But these are other topics for another day.

Conclusion

Today we've taken a brief peek under the hood of creating a wot.io data service. Thankfully, most customers would never encounter any of the complexities described in this post, because wot.io or one of its partners have already done all the heavy lifting.

If you are interested in making your data service available on the wot.io data service exchange, check out our partner program and we'll help get you connected to the Internet of Things through an interoperable collection of device management platforms and data services.