Model Serving
Intro
After training, what do we do with our models?
Models alone don’t have much value — it’s all in how you use them. Whether that’s to drive decisions within your business, or to provide new features for your customers, the role of a serving framework is to bring your models to life.
With a model serving framework, you can
- Interact with a model via an API. Because of this, anything that talks to your model can do so without knowing any internal details such as which tools were used to train it or what language it’s written in.
- Deploy the model in the cloud alongside other components of your applications.
- Scale the model easily to meet user demand.
For a concrete example, suppose you run an online store, and you want each of your customers to see personalised product recommendations. There are lots of ways to train a model for this task, but assuming you’ve already done that part, the next challenge is in getting the website talking to it.
Even though the model might be complex, a model serving framework will hide that complexity, leaving us with a simple API so that, whenever we want a customer to see recommendations, all we need to do is query that API.
Batch vs real-time
Sometimes, you want a model to give instant results. This is the case in the product recommendation example, where we want to serve relevant suggestions to a customer while they browse a website.
In other cases, results don’t need to be instant, and the model is accessed on a schedule. Imagine we have some products whose price gets updated every week using a model that’s trained to price things according to seasonal trends.
Many model serving frameworks are suited to both real-time and batch usage, but it’s important to know which approach you need before implementing model serving.
Do I need it?
While we all need to interact with our models in some way, a model serving framework isn’t the only way to do this.
The strength of model serving is that it can hide complex models behind simple APIs, making this approach a perfect fit for any application that runs in the cloud, including web applications.
But not everything is cloud-based; edge/IOT applications come to mind. Take as an example a smart camera that uses a model to detect faces. In this case the model needs to run directly on the camera’s hardware, because streaming the video to a remote server would simply be too slow.
Do I need it?
While we all need to interact with our models in some way, a model serving framework isn’t the only way to do this.
The strength of model serving is that it can hide complex models behind simple APIs, making this approach a perfect fit for any application that runs in the cloud, including web applications.
But not everything is cloud-based; edge/IOT applications come to mind. Take as an example a smart camera that uses a model to detect faces. In this case the model needs to run directly on the camera’s hardware, because streaming the video to a remote server would simply be too slow.
What are the options?
The choices for open source model serving frameworks are vast. To narrow it down a little, it’s helpful to consider a few factors:
- Machine learning library support. Any model will have been trained using an ML library such as TensorFlow, PyTorch, or SKLearn. Some serving tools support multiple ML libraries, while others might support only TensorFlow, for example.
- How the model is packaged. A typical model is made up of the raw model assets and a bunch of code dependencies. The serving tools in this guide all work by packaging model + dependencies into a Docker container. Docker is the industry standard way to package, distribute and deploy software to modern infrastructure.
- Where the model runs. Some serving frameworks simply give you a container that you can run anywhere that supports Docker. Others are built on top of Kubernetes, which is the most popular open source solution for automating the deployment, scaling and management of containers.
With these in mind, let’s look at some options.