Deploy Your Machine Learning Models to Production with Kubernetes

You’re an AI expert. A deep learning Ninja. A master of machine learning. You’ve just completed another iteration of training your awesome model. This new model is the most accurate you have ever created, and it’s guaranteed to bring a lot of value to your company.

But...

You reach a road block, holding back your models potential. You have full control of the model throughout the process. You have the capabilities of training it, you can tweak it, and you can even verify it using the test set. But, time and time again, you reach the point where your model is ready for production and your progress must take a stop. You need to communicate with DevOps, who likely has a list of tasks to the floor that hold priority over your model. You patiently wait your turn, until you become unbearingly restless in your spinning chair. You have every right to be restless. You know that your model has the potential to produce record breaking results for your company. Why waste any more time?

There is another way...

Publish your models on Kubernetes. Kubernetes is quickly becoming the cloud standard. Once you know how to deploy your model on kubernetes you can do it anywhere (Google cloud or AWS)

How to deploy models to production using Kubernetes

You’ll never believe how simple deploying models can be. All you need is to wrap your code a little bit. Soon you’ll be able to build and control your machine learning models from research to production. Here’s how:

Layer 1- your predict code

Since you have already trained your model, it means you already have predict code. The predict code takes a single sample, fits the model with the sample and returns a prediction.

Below you’ll see a sample code that takes a sentence as an input, and returns a number that represents the sentence sentiment as predicted by the model. In this example, an IMDB dataset was used to train a model to predict the sentiment of a sentence.

import keras
model = keras.models.load_model("./sentiment2.model.h5")

def predict(sentence):
    encoded = encode_sentence(sentence)
    pred = np.array([encoded])
    pred = vectorize_sequences(pred)
    a = model.predict(pred)
    return a[0][0]

def vectorize_sequences(sequences, dimension=10000):
    results = np.zeros((len(sequences), dimension))
    for i, sequence in enumerate(sequences):
        results[i, sequence] = 1.
    return results

*Tip
To make deploying even easier, make sure to track all of your code dependencies in a requirements file.

Layer 2- flask server

After we have a working example of the predict code, we need to start speaking HTTP instead of Python.

The way to achieve this is to spawn a flask server that will accept the input as arguments to its requests, and return the model’s prediction in its responses.

from flask import Flask, request, jsonify
import predict

app = Flask(__name__)

@app.route('/predict', methods=['POST'])
def run():
    data = request.get_json(force=True)
    input_params = data['input']
    result =  predict.predict(input_params)
    return jsonify({'prediction': result})

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8080)

In this small snippet we import flask and define a route it should listen to. Once a request is sent to the server to the route /predict it will take the request argument and send them to the predict function we wrote in the first layer. The function return value is sent back to the client via the HTTP response.

Layer 3 — Kubernetes Deployment

And now, on to the final layer! Using kubernetes we can declare our deployment in a YAML file. This methodology is called Infrastructure as code, and it enables us to define the command we want to run in a single text file.

apiVersion: apps/v1 
kind: Deployment
metadata:
  name: predict-imdb 
spec:
  replicas: 1 
  template:
    spec:
      containers:
      - name: app
        image: tensorflow/tensorflow:latest-devel-py3
        command: ["/bin/sh", "-c"]
        args:
         - git clone https://github.com/itayariel/imdb_keras;
           cd imdb_keras;
           pip install -r requirements.txt;
           python server.py;
        ports:
        - containerPort: 8080

You can see in the file that we declared a Deployment with a single replica. Its image is based off of the tensorflow docker image, and then runs a set of four commands in order to trigger the server.

In this command, it clones the code from Github, installed the requirements and spins up the flask server written.

*Note: feel free to change the clone command to suit your needs.

Additionally, it's important to add a service that will expose deployment outside of kubernetes cluster. Be sure to check your cluster networking settings via your cloud provider.

apiVersion: v1
kind: Service
metadata:
  name: predict-imdb-service
  labels:
    app: imdb-server
spec:
  ports:
    - port: 8080
  selector:
    app: imdb-server
  type: NodePort

Send it to the cloud

Now that we have all files set, it's time to send the code to the Cloud.

Assuming you have a running kubernetes cluster - and you have its kube config file - you should run the following commands:

kubectl apply -f deployment.yml

This command will create our deployment on the cluster.

kubectl apply -f service.yml

Doing this command will create a service that will expose the endpoint to world . In this example, a NodePort service was used - meaning the service will be attached to a port on the cluster nodes.

Use the command ` kubectl get services` to find the service IP and port. Now the model can be called using HTTP with the following curl command:

curl http://node-ip:node-port/predict \
-H 'Content-Type: application/json' \
-d '{"input_params": "I loved this videoLike, love, amazing!!"}'


Wrapping it up - It’s Aliiiive!

Easy huh? Now you know how to publish models to the internet using Kuberentes. And, with just a few lines of code. It actually gets easier.

cnvrg.io model deployment

cnvrg.io provides an end-to-end platform that allows data scientists to manage, build and automate machine learning from research to production. One of the core features of cnvrg.io is the automation of model deployment. With just a single click, a data scientist can create a production-ready environment that can serve millions of requests to their model.

For every deployment environment, cnvrg.io will set up a Kubernetes cluster with all the tools integrated to help you monitor your models in real-time ( Promotheus , Grafana). It will track models at the system level and your machine learning model health. That way you can keep track of prediction confidence, input/output and basically any parameter you'd like.

Additionally, the cnvrg.io platform has integrated Istio for advanced A/B testing functionalities, webhooks, alerts and more. It’s so easy to use you’ll be surprised this solution wasn’t in your life earlier.

cnvrg.io user interface: model A/B testing

So. Go on. Take your own models and deploy away!

You can follow the full example and code from above here. You can also join our Kubernetes Workshop and learn how to set up Kubernetes for your machine learning workflows