Machine learning with the Data Service Exchange: Training multiple models in parallel (part 3)

Machine learning with the Data Service Exchange: Training multiple models in parallel (part 3)

Machine learning with the Data Service Exchange: Training multiple models in parallel (part 3)

December 17, 2015 / Posted By: wotio team

In the past two parts, we constructed a Docker container suitable for deploying within the context of the Data Service Exchange, and then published the MNIST training set over the Data Bus. In the third part, we will retrofit a model to accept the training data from the Data Bus. The basic architecture we would like for our data flow looks like this:

We will load the MNIST training data via the we created in the last part, and send it to a mnist bus resource. Then using bindings we will copy that data to both model1 and model2 resources, from which our two models will fetch their training data. The programs and will consume the training data, and print out estimates of their accuracy based on the aggregated training set.

This architecture allows us to add additional models or swap out our training data set, without having to fiddle with resource management. As the models can be deployed in their own containers, we can evaluate the effectiveness of each model in parallel, and discard any that we are unhappy with. When dealing with large amounts of training data, this distribution methodology can be especially handy when we are uncertain of the degree to which the choice of training data and test suite influences the apparent fitness of the model.

The code in creates the mnist resource. The branch connecting mnist to model1 is created programmatically through the model1 code:

The act of creating the binding, and then consuming from the bound resource sets up the remainder of the top branch in the diagram. A similar bit of code occurs in our second model:

As the Data Bus uses software defined routing, the code will ensure that this topology will exist when the programs startup. By asserting the existence of the resource and the bindings, the under the hood configuration can abstract out the scaling of the underlying system.

In the consume_resource event, we invoke a train callback which runs the model against the training data. For each of the models the training code is largely the same:


The behavior of each is as follows:

  • receive an image and it's label from the bus
  • convert the image into a flattened array of 28*28 32 bit floating points
  • scale the 8bit image to a range of [0-1]
  • convert the label to a one hot vector
  • save the image data and label vector for future accuracy testing
  • run a single training step on the image
  • every 100 iterations, test the accuracy of the model and print it to stdout

from outside the docker container, we can inspect the output of the model by invoking the docker logs command on the model's container to see the output of the program. As long as the RUN command in the docker file was of the form

RUN python

all of the output would be directed to stdout. As the model programs work as Data Bus consumers, and never exit, these commands are properly demonized from a Docker perspective and do not need to be run in a background process.

We could further modify these models to publish their results to another bus resource or two, by adding a write_resource method into the training callback, making the accuracy data available for further storage and analysis. The code for doing so would mirror the code found in for publishing the mnist data to the bus in the first place. This accuracy data could then be stored in a search engine, database, or other analytics platform for future study and review. This capability makes it easy to run many different models against each other and build up a catalog of results to help guide the further development of our machine learning algorithms.

All of the source code for these models is available on Github and correspond the the tensorflow tutorials.