Deploy a ML inference service on a budget in less than 10 lines of code.

Last update: Dec 25, 2022

Overview

BudgetML: Deploy ML models on a budget

Installation • Quickstart • Community • Docs

Give us a

GitHub star to show your love!

Why

BudgetML is perfect for practitioners who would like to quickly deploy their models to an endpoint, but not waste a lot of time, money, and effort trying to figure out how to do this end-to-end.

We built BudgetML because it's hard to find a simple way to get a model in production fast and cheaply.

Cloud functions are limited in memory and cost a lot at scale.
Kubernetes clusters are an overkill for one single model.
Deploying from scratch involves learning too many different concepts like SSL certificate generation, Docker, REST, Uvicorn/Gunicorn, backend servers etc., that are simply not within the scope of a typical data scientist.

BudgetML is our answer to this challenge. It is supposed to be fast, easy, and developer-friendly. It is by no means meant to be used in a full-fledged production-ready setup. It is simply a means to get a server up and running as fast as possible with the lowest costs possible.

BudgetML lets you deploy your model on a Google Cloud Platform preemptible instance (which is ~80% cheaper than a regular instance) with a secured HTTPS API endpoint. The tool sets it up in a way that the instance autostarts when it shuts down (at least once every 24 hours) with only a few minutes of downtime. BudgetML ensures the cheapest possible API endpoint with the lowest possible downtime.

Key Features

Automatic FastAPI server endpoint generation (its faster than Flask).
Fully interactive docs via Swagger.
Built-in SSL certificate generation via LetsEncrypt and docker-swag.
Uses cheap preemtible instances but has 99% uptime!
Complete OAuth2 secured endpoints with Password and Bearer pattern.

Cost comparison

BudgetML uses Google Cloud Preemptible instances under-the-hood to reduce costs by 80%. This can potentially mean hundreds of dollars worth of savings. Here is a screenshot of the e2-highmem GCP series, which is regular family of instances to be using for memory intense tasks like ML model inference functions. See the following price comparison (as of Jan 31, 2021 [source])

Even with the lowest machine_type, there is a $46/month savings, and with the highest configuration this is $370/month savings!

Installation

BudgetML is available for easy installation into your environment via PyPI:

pip install budgetml

Alternatively, if you’re feeling brave, feel free to install the bleeding edge:

NOTE: Do so on your own risk, no guarantees given!

pip install git+https://github.com/ebhy/[email protected] --upgrade

Quickstart

BudgetML aims for as simple a process as possible. First set up a predictor:

# predictor.py
class Predictor:
    def load(self):
        from transformers import pipeline
        self.model = pipeline(task="sentiment-analysis")

    async def predict(self, request):
        # We know we are going to use the `predict_dict` method, so we use
        # the request.payload pattern
        req = request.payload
        return self.model(req["text"])[0]

Then launch it with a simple script:

# deploy.py
import budgetml
from predictor import Predictor

# add your GCP project name here.
budgetml = budgetml.BudgetML(project='GCP_PROJECT')

# launch endpoint
budgetml.launch(
    Predictor,
    domain="example.com",
    subdomain="api",
    static_ip="32.32.32.322",
    machine_type="e2-medium",
    requirements=['tensorflow==2.3.0', 'transformers'],
)

For a deeper dive, check out the detailed guide in the examples directory. For more information about the BudgetML API, refer to the docs.

Screenshots

Interactive docs to test endpoints. Support for Images.

Password-protected endpoints:

Simple prediction interface:

Projects using BudgetML

We are proud that BudgetML is actively being used in the following live products:

ZenML: For production scenarios

BudgetML is for users on a budget. If you're working in a more serious production environment, then consider using ZenML as the perfect open-source MLOPs framework for ML production needs. It does more than just deployments, and is more suited for professional workplaces.

Proudly built by two brothers

We are two brothers who love building products, especially ML-related products that make life easier for people. If you use this tool for any of your products, we would love to hear about it and potentially add it to this space. Please get in touch via email.

Oh and please do consider giving us a GitHub star if you like the repository - open-source is hard, and the support keeps us going.

Comments

Extra files/scripts in Docker container

Hi @htahir1 , thanks for the super handy library !

I am wondering whether or not it is possible to include some extra python file when creating the Docker container? I am attempting to infer a custom model and thus I need a bunch of files like: checkpoint, model file, config and so on.. I couldn't find anything mentioning this in the docs.

Thanks for your help 😄

opened by JulesBelveze 4
[FEATURE] Quickstart example for sockeye

Is your feature request related to a problem? Please describe. I'm not sure how to run a sockeye (https://github.com/awslabs/sockeye) model with budgetml

Describe the solution you'd like A quickstart example to run a sockeye model. For example the model built in https://awslabs.github.io/sockeye/tutorials/wmt.html .

Describe alternatives you've considered Using https://github.com/jamesewoo/sockeye-serving/tree/master/src/sockeye_serving or writing FastAPI endpoints that import sockeye.

Additional context https://github.com/jamesewoo/sockeye-serving/tree/master/src/sockeye_serving does not seem to be in active development.

opened by michaelhochleitner 3
[BUG]
Describe the bug A clear and concise description of what the bug is.

To Reproduce Steps to reproduce the behavior:

Go to '...'

Click on '....'

Scroll down to '....'

See error

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Stack Trace If applicable, add the error stack trace to help explain your problem.

** Context (please complete the following information):**

OS: [e.g. Ubuntu 18.04]

Python Version: [e.g. 3.6.6]

BudgetML Version: [e.g. 0.1.0]

Additional information Add any other context about the problem here.
opened by aniket23456 2
Location error

Describe the bug As a newbie in GCP, I'm trying to run BudgetML with the "getting started" code shared. After setting up GCP, and running run_budget_ml.py (which contains the budget_ml.launch() call), I get the following error:

Traceback (most recent call last): File "run_budget_ml.py", line 24, in budgetml.launch( File "/Users/yadapruksachatkun/opt/anaconda3/lib/python3.8/site-packages/budgetml/main.py", line 321, in launch self.create_scheduler_job( File "/Users/yadapruksachatkun/opt/anaconda3/lib/python3.8/site-packages/budgetml/main.py", line 266, in create_scheduler_job create_gcp_scheduler_job(project_id, topic, schedule, region) File "/Users/yadapruksachatkun/opt/anaconda3/lib/python3.8/site-packages/budgetml/gcp/scheduler.py", line 30, in create_scheduler_job response = client.create_job( File "/Users/yadapruksachatkun/opt/anaconda3/lib/python3.8/site-packages/google/cloud/scheduler_v1/services/cloud_scheduler/client.py", line 595, in create_job response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,) File "/Users/yadapruksachatkun/opt/anaconda3/lib/python3.8/site-packages/google/api_core/gapic_v1/method.py", line 145, in call return wrapped_func(*args, **kwargs) File "/Users/yadapruksachatkun/opt/anaconda3/lib/python3.8/site-packages/google/api_core/grpc_helpers.py", line 75, in error_remapped_callable six.raise_from(exceptions.from_grpc_error(exc), exc) File "", line 3, in raise_from google.api_core.exceptions.InvalidArgument: 400 Location must equal us-west2 because the App Engine app that is associated with this project is located in us-west2

My app engine region is us-west-2, and I also set my project region to us-west-2. What region should I be setting? Thank you!

opened by pruksmhc 1
[BUG] Better alignment with REST API: send 500 not 400 if predictor couldn't get loaded
Describe the bug Hi! first of all, thanks for such a neat tool! :tada:

It's not a bug, I just thought that sending HTTP 400 is not good when the predictor couldn't get loaded (all /predict* routes):

https://github.com/ebhy/budgetml/blob/7ade99c795451656401b3abdbd088b87eb8538eb/server/app/main.py#L96-L105

I think, it's better to use a 5XX server-side error:

HTTP 400 means that there was a client error (e.g., malformed request syntax, invalid request message framing, or deceptive request routing).

HTTP 500 means that the server encountered an unexpected condition that prevented it from fulfilling the request. This error response is a generic "catch-all" response. Usually, this indicates the server cannot find a better 5xx error code to response.
opened by atemate 1
Bump fastapi from 0.63.0 to 0.65.2 in /server
Bumps fastapi from 0.63.0 to 0.65.2.

Release notes

Sourced from fastapi's releases.

0.65.2

Security fixes

🔒 Check Content-Type request header before assuming JSON. Initial PR #2118 by @patrickkwang.

This change fixes a CSRF security vulnerability when using cookies for authentication in path operations with JSON payloads sent by browsers.

In versions lower than 0.65.2, FastAPI would try to read the request payload as JSON even if the content-type header sent was not set to application/json or a compatible JSON media type (e.g. application/geo+json).

So, a request with a content type of text/plain containing JSON data would be accepted and the JSON data would be extracted.

But requests with content type text/plain are exempt from CORS preflights, for being considered Simple requests. So, the browser would execute them right away including cookies, and the text content could be a JSON string that would be parsed and accepted by the FastAPI application.

See CVE-2021-32677 for more details.

Thanks to Dima Boger for the security report! 🙇🔒

Internal

🔧 Update sponsors badge, course bundle. PR #3340 by @tiangolo.

🔧 Add new gold sponsor Jina 🎉. PR #3291 by @tiangolo.

🔧 Add new banner sponsor badge for FastAPI courses bundle. PR #3288 by @tiangolo.

👷 Upgrade Issue Manager GitHub Action. PR #3236 by @tiangolo.

0.65.1

Security fixes

📌 Upgrade pydantic pin, to handle security vulnerability CVE-2021-29510. PR #3213 by @tiangolo.

0.65.0

Breaking Changes - Upgrade

⬆️ Upgrade Starlette to 0.14.2, including internal UJSONResponse migrated from Starlette. This includes several bug fixes and features from Starlette. PR #2335 by @hanneskuettner.

Translations

🌐 Initialize new language Polish for translations. PR #3170 by @neternefer.

Internal

👷 Add GitHub Action cache to speed up CI installs. PR #3204 by @tiangolo.

⬆️ Upgrade setup-python GitHub Action to v2. PR #3203 by @tiangolo.

🐛 Fix docs script to generate a new translation language with overrides boilerplate. PR #3202 by @tiangolo.

✨ Add new Deta banner badge with new sponsorship tier 🙇. PR #3194 by @tiangolo.

👥 Update FastAPI People. PR #3189 by @github-actions[bot].

🔊 Update FastAPI People to allow better debugging. PR #3188 by @tiangolo.

0.64.0

Features

... (truncated)

Commits

4d91f97 🔖 Release version 0.65.2

aabe2c7 📝 Update release notes

377234a 🔒 Create Security Policy

38b7858 📝 Update release notes

fa7e3c9 🐛 Check Content-Type request header before assuming JSON (#2118)

90120dd 📝 Update release notes

3677254 🔧 Update sponsors badge, course bundle (#3340)

40bb0c5 📝 Update release notes

60918d2 🔧 Add new gold sponsor Jina 🎉 (#3291)

3afce2c 📝 Update release notes

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Improve HTTP status codes
Submitting this PR in hopes of making the HTTP Status codes more consistent through the project.

HTTP 401 Unauthorized (https://tools.ietf.org/html/rfc7235#section-3.1) for when authentication fails

HTTP 500 when the Predictor is not initialized correctly

Feel free to reject this PR if it is not large enough, but just wanted to bring awareness to consistency in the HTTP Status codes your API is sending
opened by bradleybonitatibus 0

Releases(0.1.0)

0.1.0(Jan 31, 2021)
Launch Release

First release for the public!

Features

Integration with Google Cloud Platform.

Auto-start orchestration automation.

Easy SSL certificate generation via LetsEncrypt.

FastAPI server with predict, predict_dict, and predict_image endpoints supported.

Custom requirements support.

Custom Docker image support.

Bare-bones docs and examples.

Source code(tar.gz)
Source code(zip)

Owner

GitHub Repository

PyTorch implementation of EfficientNetV2

[NEW!] Check out our latest work involution accepted to CVPR'21 that introduces a new neural operator, other than convolution and self-attention. PyTo

375 Jan 03, 2023

This is the repo for the paper `SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization'. (published in Bioinformatics'21)

SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization This is the code for our paper ``SumGNN: Multi-typed Drug

58 Dec 21, 2022

YKKDetector For Python

YKKDetector OpenCVを利用した機械学習データをもとに、VRChatのスクリーンショットなどからＹＫＫさん（もとい「幽狐族のお姉様」）を検出できるソフトウェアです。マニュアルこちらから実行環境のセットアップから解説する詳細なマニュアルをご覧いただけます。ライセンス本ソフトウェア

5 Dec 07, 2021

Explanatory Learning: Beyond Empiricism in Neural Networks

Explanatory Learning This is the official repository for "Explanatory Learning: Beyond Empiricism in Neural Networks". Datasets Download the datasets

10 Dec 06, 2022

Code for the paper "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021)

MASTER-PyTorch PyTorch reimplementation of "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021). This projec

255 Dec 29, 2022

Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Robust Video Matting (RVM) English | 中文 Official repository for the paper Robust High-Resolution Video Matting with Temporal Guidance. RVM is specific

2 Aug 21, 2022

Zalo AI challenge 2021 task hum to song

Zalo AI challenge 2021 task Hum to Song pipeline: Chuẩn bị dữ liệu cho quá trình train: Sửa các file đường dẫn trong config/preprocess.yaml raw_path:

105 Dec 16, 2022

Python based framework for Automatic AI for Regression and Classification over numerical data.

Python based framework for Automatic AI for Regression and Classification over numerical data. Performs model search, hyper-parameter tuning, and high-quality Jupyter Notebook code generation.

141 Dec 21, 2022

Codes for our paper The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders published to EMNLP 2021.

The Stem Cell Hypothesis Codes for our paper The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders published to EMNLP

5 Jul 08, 2022

DSL for matching Python ASTs

py-ast-rule-engine This library provides a DSL (domain-specific language) to match a pattern inside a Python AST (abstract syntax tree). The library i

1 Dec 18, 2021

Classification of Long Sequential Data using Circular Dilated Convolutional Neural Networks

Classification of Long Sequential Data using Circular Dilated Convolutional Neural Networks arXiv preprint: https://arxiv.org/abs/2201.02143. Architec

19 Nov 30, 2022

Discovering Interpretable GAN Controls [NeurIPS 2020]

GANSpace: Discovering Interpretable GAN Controls Figure 1: Sequences of image edits performed using control discovered with our method, applied to thr

1.7k Jan 03, 2023

Combinatorially Hard Games where the levels are procedurally generated

puzzlegen Implementation of two procedurally simulated environments with gym interfaces. IceSlider: the agent needs to reach and stop on the pink squa

3 Jun 26, 2022

Source code of SIGIR2021 Paper 'One Chatbot Per Person: Creating Personalized Chatbots based on Implicit Profiles'

DHAP Source code of SIGIR2021 Long Paper: One Chatbot Per Person: Creating Personalized Chatbots based on Implicit User Profiles . Preinstallation Fir

32 Dec 06, 2022

Learning infinite-resolution image processing with GAN and RL from unpaired image datasets, using a differentiable photo editing model.

Exposure: A White-Box Photo Post-Processing Framework ACM Transactions on Graphics (presented at SIGGRAPH 2018) Yuanming Hu1,2, Hao He1,2, Chenxi Xu1,

719 Dec 29, 2022

Activity image-based video retrieval

Cross-modal-retrieval Our approach is focus on Activity Image-to-Video Retrieval (AIVR) task. The compared methods are state-of-the-art single modalit

75 Oct 21, 2021

This is the official implementation for the paper "(Almost) Free Incentivized Exploration from Decentralized Learning Agents" in NeurIPS 2021.

Observe then Incentivize Experiments This is the code used for the paper "(Almost) Free Incentivized Exploration from Decentralized Learning Agents",

0 Mar 08, 2022

Image Matching Evaluation

Image Matching Evaluation (IME) IME provides to test any feature matching algorithm on datasets containing ground-truth homographies. Also, one can re

32 Nov 17, 2022

Ansible Automation Example: JSNAPY PRE/POST Upgrade Validation

Ansible Automation Example: JSNAPY PRE/POST Upgrade Validation Overview This example will show how to validate the status of our firewall before and a

1 Jan 07, 2022

[CVPRW 21] "BNN - BN = ? Training Binary Neural Networks without Batch Normalization", Tianlong Chen, Zhenyu Zhang, Xu Ouyang, Zechun Liu, Zhiqiang Shen, Zhangyang Wang

BNN - BN = ? Training Binary Neural Networks without Batch Normalization Codes for this paper BNN - BN = ? Training Binary Neural Networks without Bat

40 Dec 30, 2022

Deploy a ML inference service on a budget in less than 10 lines of code.

Related tags

Overview

BudgetML: Deploy ML models on a budget

Why

Key Features

Cost comparison

Installation

Quickstart

Screenshots

Projects using BudgetML

ZenML: For production scenarios

Proudly built by two brothers

Comments

Extra files/scripts in Docker container

[FEATURE] Quickstart example for sockeye

[BUG]

Location error

[BUG] Better alignment with REST API: send 500 not 400 if predictor couldn't get loaded

Bump fastapi from 0.63.0 to 0.65.2 in /server

0.65.2

Security fixes

Internal

0.65.1