Free MLOps course from DataTalks.Club

Overview

MLOps Zoomcamp

Our MLOps Zoomcamp course

Overview

Objective

Teach practical aspects of productionizing ML services — from collecting requirements to model deployment and monitoring.

Target audience

Data scientists and ML engineers. Also software engineers and data engineers interested in learning about putting ML in production.

Pre-requisites

  • Python
  • Docker
  • Being comfortable with command line
  • Prior exposure to machine learning (at work or from other courses, e.g. from ML Zoomcamp)
  • Prior programming experience (at least 1+ year)

Timeline

Course start: 16 of May

Syllabus

This is a draft and will change.

Module 1: Introduction

  • What is MLOps
  • MLOps maturity model
  • Running example: NY Taxi trips dataset
  • Why do we need MLOps
  • Course overview
  • Environment preparation
  • Homework

More details

Module 2: Experiment tracking and model management

  • Experiment tracking intro
  • Getting started with MLflow
  • Experiment tracking with MLflow
  • Saving and loading models with MLflow
  • Model registry
  • MLflow in practice
  • Homework

More details

Module 3: Orchestration and ML Pipelines

  • ML Pipelines: introduction
  • Prefect
  • Turning a notebook into a pipeline
  • Kubeflow Pipelines
  • Homework

Module 4: Model Deployment

  • Batch vs online
  • For online: web services vs streaming
  • Serving models in Batch mode
  • Web services
  • Streaming (Kinesis/SQS + AWS Lambda)
  • Homework

Module 5: Model Monitoring

  • ML monitoring vs software monitoring
  • Data quality monitoring
  • Data drift / concept drift
  • Batch vs real-time monitoring
  • Tools: Evidently, Prometheus and Grafana
  • Homework

Module 6: Best Practices

  • Devops
  • Virtual environments and Docker
  • Python: logging, linting
  • Testing: unit, integration, regression
  • CI/CD (github actions)
  • Infrastructure as code (terraform, cloudformation)
  • Cookiecutter
  • Makefiles
  • Homework

Module 7: Processes

  • CRISP-DM, CRISP-ML
  • ML Canvas
  • Data Landscape canvas
  • MLOps Stack Canvas
  • Documentation practices in ML projects (Model Cards Toolkit)

Project

  • End-to-end project with all the things above

Running example

To make it easier to connect different modules together, we’d like to use the same running example throughout the course.

Possible candidates:

Instructors

  • Larysa Visengeriyeva
  • Cristian Martinez
  • Kevin Kho
  • Theofilos Papapanagiotou
  • Alexey Grigorev
  • Emeli Dral
  • Sejal Vaidya

Other courses from DataTalks.Club:

FAQ

I want to start preparing for the course. What can I do?

If you haven't used Flask or Docker

If you have no previous experience with ML

  • Check Module 1 from ML Zoomcamp for an overview
  • Module 3 will also be helpful if you want to learn Scikit-Learn (we'll use it in this course)
  • We'll also use XGBoost. You don't have to know it well, but if you want to learn more about it, refer to module 6 of ML Zoomcamp

I registered but haven't received an invite link. Is it normal?

Yes, we haven't automated it. You'll get a mail from us eventually, don't worry.

If you want to make sure you don't miss anything:

Is it going to be live?

No and yes. There will be two parts:

  • Lectures: Pre-recorded, you can watch them when it's convenient for you.
  • Office hours: Live on Mondays (17:00 CET), but recorded, so you can watch later.

Supporters and partners

Thanks to the course sponsors for making it possible to create this course

Thanks to our friends for spreading the word about the course

Owner
DataTalksClub
The place to talk about data
DataTalksClub
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 6.9k Jan 05, 2023
jaxfg - Factor graph-based nonlinear optimization library for JAX.

Factor graphs + nonlinear optimization in JAX

Brent Yi 134 Dec 21, 2022
A simple guide to MLOps through ZenML and its various integrations.

ZenBytes Join our Slack Community and become part of the ZenML family Give the main ZenML repo a GitHub star to show your love ZenBytes is a series of

ZenML 127 Dec 27, 2022
Microsoft Machine Learning for Apache Spark

Microsoft Machine Learning for Apache Spark MMLSpark is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark

Microsoft Azure 3.9k Dec 30, 2022
A repository for collating all the resources such as articles, blogs, papers, and books related to Bayesian Statistics.

A repository for collating all the resources such as articles, blogs, papers, and books related to Bayesian Statistics.

Aayush Malik 80 Dec 12, 2022
scikit-learn: machine learning in Python

scikit-learn is a Python module for machine learning built on top of SciPy and is distributed under the 3-Clause BSD license. The project was started

neurodata 3 Dec 16, 2022
Kaggler is a Python package for lightweight online machine learning algorithms and utility functions for ETL and data analysis.

Kaggler is a Python package for lightweight online machine learning algorithms and utility functions for ETL and data analysis. It is distributed under the MIT License.

Jeong-Yoon Lee 720 Dec 25, 2022
A project based example of Data pipelines, ML workflow management, API endpoints and Monitoring.

MLOps template with examples for Data pipelines, ML workflow management, API development and Monitoring.

Utsav 33 Dec 03, 2022
Machine learning model evaluation made easy: plots, tables, HTML reports, experiment tracking and Jupyter notebook analysis.

sklearn-evaluation Machine learning model evaluation made easy: plots, tables, HTML reports, experiment tracking, and Jupyter notebook analysis. Suppo

Eduardo Blancas 354 Dec 31, 2022
This is my implementation on the K-nearest neighbors algorithm from scratch using Python

K Nearest Neighbors (KNN) algorithm In this Machine Learning world, there are various algorithms designed for classification problems such as Logistic

sonny1902 1 Jan 08, 2022
Machine learning algorithms implementation

Machine learning algorithms implementation This repository consisits of implementation of various machine learning algorithms. The algorithms implemen

Karun Dawadi 1 Jan 03, 2022
Visualize classified time series data with interactive Sankey plots in Google Earth Engine

sankee Visualize changes in classified time series data with interactive Sankey plots in Google Earth Engine Contents Description Installation Using P

Aaron Zuspan 76 Dec 15, 2022
UpliftML: A Python Package for Scalable Uplift Modeling

UpliftML is a Python package for scalable unconstrained and constrained uplift modeling from experimental data. To accommodate working with big data, the package uses PySpark and H2O models as base l

Booking.com 254 Dec 31, 2022
Nixtla is an open-source time series forecasting library.

Nixtla Nixtla is an open-source time series forecasting library. We are helping data scientists and developers to have access to open source state-of-

Nixtla 401 Jan 08, 2023
Stock Price Prediction Bank Jago Using Facebook Prophet Machine Learning & Python

Stock Price Prediction Bank Jago Using Facebook Prophet Machine Learning & Python Overview Bank Jago has attracted investors' attention since the end

Najibulloh Asror 3 Feb 10, 2022
SPCL 48 Dec 12, 2022
Python based GBDT implementation

Py-boost: a research tool for exploring GBDTs Modern gradient boosting toolkits are very complex and are written in low-level programming languages. A

Sberbank AI Lab 20 Sep 21, 2022
Datetimes for Humans™

Maya: Datetimes for Humans™ Datetimes are very frustrating to work with in Python, especially when dealing with different locales on different systems

Timo Furrer 3.4k Dec 28, 2022
SmartSim makes it easier to use common Machine Learning (ML) libraries like PyTorch and TensorFlow

SmartSim makes it easier to use common Machine Learning (ML) libraries like PyTorch and TensorFlow, in High Performance Computing (HPC) simulations and workloads.

Penguins species predictor app is used to classify penguins species created using python's scikit-learn, fastapi, numpy and joblib packages.

Penguins Classification App Penguins species predictor app is used to classify penguins species using their island, sex, bill length (mm), bill depth

Siva Prakash 3 Apr 05, 2022