This project shows how to serve an ONNX-optimized image classification model as a web service with FastAPI, Docker, and Kubernetes.

Last update: Dec 23, 2022

Overview

Deploying ML models with FastAPI, Docker, and Kubernetes

By: Sayak Paul and Chansung Park

This project shows how to serve an ONNX-optimized image classification model as a RESTful web service with FastAPI, Docker, and Kubernetes (k8s). The idea is to first Dockerize the API and then deploy it on a k8s cluster running on Google Kubernetes Engine (GKE). We do this integration using GitHub Actions.

👋 Note: Even though this project uses an image classification its structure and techniques can be used to serve other models as well.

Deploying the model as a service with k8s

We decouple the model optimization part from our API code. The optimization part is available within the notebooks/TF_to_ONNX.ipynb notebook.
Then we locally test the API. You can find the instructions within the api directory.
To deploy the API, we define our deployment.yaml workflow file inside .github/workflows. It does the following tasks:
- Looks for any changes in the specified directory. If there are any changes:
- Builds and pushes the latest Docker image to Google Container Register (GCR).
- Deploys the Docker container on the k8s cluster running on GKE.

Configurations needed beforehand

Create a k8s cluster on GKE. Here's a relevant resource.
Create a service account key (JSON) file. It's a good practice to only grant it the roles required for the project. For example, for this project, we created a fresh service account and granted it permissions for the following: Storage Admin, GKE Developer, and GCR Developer.
Crete a secret named GCP_CREDENTIALS on your GitHub repository and copy paste the contents of the service account key file into the secret.

Configure bucket storage related permissions for the service account:

$ export PROJECT_ID=<PROJECT_ID>
$ export ACCOUNT=<ACCOUNT>

$ gcloud -q projects add-iam-policy-binding ${PROJECT_ID} \
    --member=serviceAccount:${ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com \
    --role roles/storage.admin

$ gcloud -q projects add-iam-policy-binding ${PROJECT_ID} \
    --member=serviceAccount:${ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com \
    --role roles/storage.objectAdmin

gcloud -q projects add-iam-policy-binding ${PROJECT_ID} \
    --member=serviceAccount:${ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com \
    --role roles/storage.objectCreator

If you're on the main branch already then upon a new push, the worflow defined in .github/workflows/deployment.yaml should automatically run. Here's how the final outputs should look like so (run link):

Notes

Since we use CPU-based pods within the k8s cluster, we use ONNX optimizations since they are known to provide performance speed-ups for CPU-based environments. If you are using GPU-based pods then look into TensorRT.
We use Kustomize to manage the deployment on k8s.

Querying the API endpoint

From workflow outputs, you should see something like so:

NAME             TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)        AGE
fastapi-server   LoadBalancer   xxxxxxxxxx   xxxxxxxxxx        80:30768/TCP   23m
kubernetes       ClusterIP      xxxxxxxxxx     <none>          443/TCP        160m

Note the EXTERNAL-IP corresponding to fastapi-server (iff you have named your service like so). Then cURL it:

curl -X POST -F [email protected] -F with_resize=True -F with_post_process=True http://{EXTERNAL-IP}:80/predict/image

You should get the following output (if you're using the cat.jpg image present in the api directory):

"{\"Label\": \"tabby\", \"Score\": \"0.538\"}"

The request assumes that you have a file called cat.jpg present in your working directory.

TODO (s)

Set up logging for the k8s pods.
Find a better way to report the latest API endpoint.

Acknowledgements

ML-GDE program for providing GCP credit support.

Comments

Feat/locust grpc

@deep-diver currently, the load test runs into:

I have ensured https://github.com/sayakpaul/ml-deployment-k8s-fastapi/blob/feat/locust-grpc/locust/grpc/locustfile.py#L49 returns the correct output. But after a few requests, I run into the above problem.

Also, I should mention that the gRPC client currently does not take care of image resizing which makes it a bit less comparable to the REST client which handles preprocessing as well postprocessing.

opened by sayakpaul 18
Setup TF Serving based deployment
In this new feature, the following works are expected

~~Update the notebook~~ Create a new notebook with the TF Serving prototype based on both gRPC(Ref) and RestAPI(Ref).

~~Update the notebook~~ Update the newly created notebook to check the %%timeit on the TF Serving server locally.

Build/Commit docker image based on TF Serving base image using this method.

Deploy the built docker image on GKE cluster

Check the deployed model's performance with a various scenarios (maybe the same ones applied to ONNX+FastAPI scenarios)

new feature
opened by deep-diver 11
Perform load testing with Locust
Resources:

https://towardsdatascience.com/performance-testing-an-ml-serving-api-with-locust-ecd98ab9b7f7

https://microsoft.github.io/PartsUnlimitedMRP/pandp/200.1x-PandP-LocustTest.html

https://github.com/https-deeplearning-ai/machine-learning-engineering-for-production-public/tree/main/course4/week2-ungraded-labs/C4_W2_Lab_3_Latency_Test_Compose
opened by sayakpaul 10
4 dockerize
fix

move api/utils/requirements.txt to /api

add missing dependency python-multipart to the requirements.txt

add

Dockerfile

Closes https://github.com/sayakpaul/ml-deployment-k8s-fastapi/issues/4
opened by deep-diver 4
Deployment on GKE with GitHub Actions

Closes https://github.com/sayakpaul/ml-deployment-k8s-fastapi/issues/5, https://github.com/sayakpaul/ml-deployment-k8s-fastapi/issues/7, and https://github.com/sayakpaul/ml-deployment-k8s-fastapi/issues/6.

opened by sayakpaul 2
chore: refactored the colab notebook.

Just added a text cell explaining why it's better to include the preprocessing function in the final exported model. Also, added a cell to show if the TF and ONNX outputs match with np.testing.assert_allclose().

opened by sayakpaul 2

Releases(v1.0.0)

v1.0.0(Feb 21, 2022)

Source code(tar.gz)
Source code(zip)
resnet50_w_preprocessing.onnx(97.42 MB)
resnet50_w_preprocessing_tf.tar.gz(101.89 MB)

Owner

Sayak Paul

ML Engineer at @carted | One PR at a time

GitHub Repository

A server hosts a FastAPI application and multiple clients can be connected to it via SocketIO.

FastAPI_and_SocketIO A server hosts a FastAPI application and multiple clients can be connected to it via SocketIO. Executing server.py sets up the se

2 Mar 04, 2022

Mixer -- Is a fixtures replacement. Supported Django, Flask, SqlAlchemy and custom python objects.

The Mixer is a helper to generate instances of Django or SQLAlchemy models. It's useful for testing and fixture replacement. Fast and convenient test-

871 Dec 25, 2022

flask extension for integration with the awesome pydantic package

249 Jan 06, 2023

Middleware for Starlette that allows you to store and access the context data of a request. Can be used with logging so logs automatically use request headers such as x-request-id or x-correlation-id.

starlette context Middleware for Starlette that allows you to store and access the context data of a request. Can be used with logging so logs automat

300 Dec 26, 2022

API written using Fast API to manage events and implement a leaderboard / badge system.

Open Food Facts Events API written using Fast API to manage events and implement a leaderboard / badge system. Installation To run the API locally, ru

5 Jan 07, 2023

A Sample App to Demonstrate React Native and FastAPI Integration

React Native - Service Integration with FastAPI Backend. A Sample App to Demonstrate React Native and FastAPI Integration UI Based on NativeBase toolk

4 Nov 17, 2022

Instrument your FastAPI app

Prometheus FastAPI Instrumentator A configurable and modular Prometheus Instrumentator for your FastAPI. Install prometheus-fastapi-instrumentator fro

441 Jan 05, 2023

A kedro-plugin to serve Kedro Pipelines as API

General informations Software repository Latest release Total downloads Pypi Code health Branch Tests Coverage Links Documentation Deployment Activity

12 Jul 15, 2022

FastAPI Server Session is a dependency-based extension for FastAPI that adds support for server-sided session management

FastAPI Server-sided Session FastAPI Server Session is a dependency-based extension for FastAPI that adds support for server-sided session management.

5 Dec 23, 2022

Opinionated set of utilities on top of FastAPI

FastAPI Contrib Opinionated set of utilities on top of FastAPI Free software: MIT license Documentation: https://fastapi-contrib.readthedocs.io. Featu

543 Jan 05, 2023

Qwerkey is a social media platform for connecting and learning more about mechanical keyboards built on React and Redux in the frontend and Flask in the backend on top of a PostgreSQL database.

Flask React Project This is the backend for the Flask React project. Getting started Clone this repository (only this branch) git clone https://github

22 Dec 20, 2022

This project shows how to serve an ONNX-optimized image classification model as a web service with FastAPI, Docker, and Kubernetes.

Related tags

Overview

Deploying ML models with FastAPI, Docker, and Kubernetes

Deploying the model as a service with k8s

Configurations needed beforehand

Notes

Querying the API endpoint

TODO (s)

Acknowledgements

Comments

Feat/locust grpc

Setup TF Serving based deployment

Perform load testing with Locust

4 dockerize

Deployment on GKE with GitHub Actions

chore: refactored the colab notebook.

Releases(v1.0.0)

v1.0.0(Feb 21, 2022)

Owner

Sayak Paul

A server hosts a FastAPI application and multiple clients can be connected to it via SocketIO.

Mixer -- Is a fixtures replacement. Supported Django, Flask, SqlAlchemy and custom python objects.

flask extension for integration with the awesome pydantic package

Middleware for Starlette that allows you to store and access the context data of a request. Can be used with logging so logs automatically use request headers such as x-request-id or x-correlation-id.

API written using Fast API to manage events and implement a leaderboard / badge system.

A Sample App to Demonstrate React Native and FastAPI Integration

Instrument your FastAPI app

A kedro-plugin to serve Kedro Pipelines as API

FastAPI Server Session is a dependency-based extension for FastAPI that adds support for server-sided session management

Opinionated set of utilities on top of FastAPI

Qwerkey is a social media platform for connecting and learning more about mechanical keyboards built on React and Redux in the frontend and Flask in the backend on top of a PostgreSQL database.

Ansible Inventory Plugin, created to get hosts from HTTP API.

Пример использования GraphQL Ariadne с FastAPI и сравнение его с GraphQL Graphene FastAPI

Sample-fastapi - A sample app using Fastapi that you can deploy on App Platform

Github timeline htmx based web app rewritten from Common Lisp to Python FastAPI

A rate limiter for Starlette and FastAPI

Local Telegram Bot With FastAPI & Ngrok

Simple FastAPI Example : Blog API using FastAPI : Beginner Friendly

FastAPI interesting concepts.

A minimum reproducible repository for embedding panel in FastAPI