Blog

Machine learning with the wot.io Data Service Exchange (part 1)

Dec 2015/ Posted By: wotio team

<p>One of the early inspirations for the wot.io Data Service Exchange was the need to deploy and evaluate multiple machine learning models against real time data sets. As machine learning techniques transition from an academic realm to enterprise deployments, the realities of operational costs tend to inform design decisions more than anything else, with the key forcing function becoming percentage accuracy per dollar. With this constraint in place, the choice of model often becomes a search for one that is "good enough", or which model provides adequate accuracy for minimal operational cost.</p>
<p>To make this concept more concrete, we can build a simple distributed OCR system using the wot.io data bus to transmit both training and production data. The wot.io Data Service Exchange currently provides access to services like facial and logo detection services through <a href="http://datascription.com">Datascription</a>, and visual object recognition and search through <a href="http://nervve.com">Nervve</a>. But for this demonstration, we will connect a demo application written using <a href="http://www.tensorflow.org">Google's TensorFlow</a> machine learning library. This will allow us to demonstrate how to build and deploy a machine learning application into the wot.io Data Service Exchange. As <a href="http://www.tensorflow.org">TensorFlow</a> is released under the Apache 2 license, we will also be able to share the code for the different models we will be testing.</p>
<h1 id="gettingstartedwithpython">Getting Started With Python</h1>
<p>The wot.io Data Service Exchange supports a wide range of languages and protocol bindings. Currently, we have library support for JavaScript, Erlang, Python, Java, C/C++, and Perl. Since <a href="http://www.tensorflow.org">TensorFlow</a> is written in python, our demo application will use the <a href="https://github.com/wotio/wot-python">wot-python</a> bindings. These bindings interface with the AMQP protocol adapter for the wot.io data bus, and model the data bus's resource model on top of the AMQP interface. To install the bindings, we'll first create a <a href="https://virtualenv.readthedocs.org/en/latest/">virtualenv</a> environment in which we'll install our dependencies:</p>
<h3 id="linux">Linux</h3>
<ul>
<li>virtualenv wotocrdemo</li>
<li>source wotocrdemo/bin/activate</li>
<li>pip install <a href="https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.5.0-cp27-none-linux">https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.5.0-cp27-none-linux</a><em>x86</em>64.whl</li>
<li>git clone <a href="https://github.com/wotio/wot-python">https://github.com/wotio/wot-python</a></li>
<li>cd wot-python</li>
<li>python setup.py install</li>
</ul>
<h3 id="macosx">Mac OS X</h3>
<ul>
<li>virtualenv wotocrdemo</li>
<li>source wotocrdemo/bin/activate</li>
<li>pip install <a href="https://storage.googleapis.com/tensorflow/mac/tensorflow-0.5.0-py2-none-any.whl">https://storage.googleapis.com/tensorflow/mac/tensorflow-0.5.0-py2-none-any.whl</a></li>
<li>git clone <a href="https://github.com/wotio/wot-python">https://github.com/wotio/wot-python</a></li>
<li>cd wot-python</li>
<li>python setup.py install</li>
</ul>
<p>This will create a virualenv environment which will contain tensorflow and the wot-python bindings for local development. While this can be useful for testing, in the production deployment we will use a <a href="https:/docker.io">Docker</a> container for deployment. The wot.io Data Service Exchange can deploy docker container and manage their configuration cross data centers and cloud environments. As the wot.io Data Service Exchange has been deployed in <a href="http://www.rackspace.com">Rackspace</a>, <a href="http://www.softlayer.com">IBM SoftLayer</a>, and <a href="https://azure.microsoft.com">Microsoft Azure</a>, it is useful to be able to produce a production software artifact that works across platforms.</p>
<h3 id="creatingadockerfile">Creating a Dockerfile</h3>
<p>We will use the Linux version as the basis of creating a Docker container for our production application, but it can be useful. To start with we'll base our Dockerfile upon the sample code we make available for system integrators: <a href="https://github.com/wotio/docker-example">https://github.com/wotio/docker-example</a>. To build this locally, it is often useful to use <a href="https://www.virtualbox.org">VirtualBox</a>, <a href="http://docs.docker.com/engine/installation/#installation">Docker</a> and <a href="https://docs.docker.com/machine/">Docker Machine</a> to create a Docker development environment. If you are using Boot2Docker on Mac OS X, you will need to tell docker machine to grow the memory requirement for the VM itself:</p>
<p><code>docker-machine create wotio -d virtual box --virtualbox-memory "12288" wotio</code></p>
<p>As the default 1GB isn't large enough to compile some of tensorflow with LLVM, I had success with 12GB, YMMV. Once you have each of these installed for your platform, you can download our sample build environment:</p>
<ul>
<li>git clone <a href="https://github.com/wotio/docker-example">https://github.com/wotio/docker-example</a></li>
<li>cd docker-example</li>
<li>docker-machine create wotio -d virtualbox --virtualbox-memory "12288"</li>
<li>eval $(docker-machine env wotio)</li>
<li>make tensorflow</li>
</ul>
<p>This will build a sequence of docker container images, and it will be on top of the wotio/python that we will install <a href="http://tensorflow.org">TensorFlow</a>. At this point you'll have a working container suitable for deploying into the wot.io Data Service Exchange. In the next blog post we'll build a sample model based on the MNIST data set and train multiple instances using the wot.io Data Bus.</p>