Machine learning with the Data Service Exchange (part 1)

December 7, 2015 / Posted By: wotio team

One of the early inspirations for the Data Service Exchange was the need to deploy and evaluate multiple machine learning models against real time data sets. As machine learning techniques transition from an academic realm to enterprise deployments, the realities of operational costs tend to inform design decisions more than anything else, with the key forcing function becoming percentage accuracy per dollar. With this constraint in place, the choice of model often becomes a search for one that is "good enough", or which model provides adequate accuracy for minimal operational cost.

To make this concept more concrete, we can build a simple distributed OCR system using the data bus to transmit both training and production data. The Data Service Exchange currently provides access to services like facial and logo detection services through Datascription, and visual object recognition and search through Nervve. But for this demonstration, we will connect a demo application written using Google's TensorFlow machine learning library. This will allow us to demonstrate how to build and deploy a machine learning application into the Data Service Exchange. As TensorFlow is released under the Apache 2 license, we will also be able to share the code for the different models we will be testing.

Getting Started With Python

The Data Service Exchange supports a wide range of languages and protocol bindings. Currently, we have library support for JavaScript, Erlang, Python, Java, C/C++, and Perl. Since TensorFlow is written in python, our demo application will use the wot-python bindings. These bindings interface with the AMQP protocol adapter for the data bus, and model the data bus's resource model on top of the AMQP interface. To install the bindings, we'll first create a virtualenv environment in which we'll install our dependencies:


Mac OS X

This will create a virualenv environment which will contain tensorflow and the wot-python bindings for local development. While this can be useful for testing, in the production deployment we will use a Docker container for deployment. The Data Service Exchange can deploy docker container and manage their configuration cross data centers and cloud environments. As the Data Service Exchange has been deployed in Rackspace, IBM SoftLayer, and Microsoft Azure, it is useful to be able to produce a production software artifact that works across platforms.

Creating a Dockerfile

We will use the Linux version as the basis of creating a Docker container for our production application, but it can be useful. To start with we'll base our Dockerfile upon the sample code we make available for system integrators: To build this locally, it is often useful to use VirtualBox, Docker and Docker Machine to create a Docker development environment. If you are using Boot2Docker on Mac OS X, you will need to tell docker machine to grow the memory requirement for the VM itself:

docker-machine create wotio -d virtual box --virtualbox-memory "12288" wotio

As the default 1GB isn't large enough to compile some of tensorflow with LLVM, I had success with 12GB, YMMV. Once you have each of these installed for your platform, you can download our sample build environment:

This will build a sequence of docker container images, and it will be on top of the wotio/python that we will install TensorFlow. At this point you'll have a working container suitable for deploying into the Data Service Exchange. In the next blog post we'll build a sample model based on the MNIST data set and train multiple instances using the Data Bus.