Model factory is a ML training platform to help engineers to build ML models at scale

Last update: Sep 23, 2022

Related tags

Overview

Model Factory

Machine learning today is powering many businesses today, e.g., search engine, e-commerce, news or feed recommendation. Training high quality ML models is critical to all of these systems.

However, training a model is not trivial. Traditionally, engineers use single devvm to train models. It might be doable if you were only to build a few models. If you are interested in exploring hundreds or even thousands of ideas, repeating the workflow manually will be a painful process.

There are many issues with the above workflow:

Hard to scale
No tracking
No monitor
No end-to-end automation
Not easy to share with others
No centralized model management

The above pain points really slows engineers down when they are developing their ML models. Model factory is a project that targets at addressing the above issues.

Background

There are existing work in the industry which tries to address the above issues as well, e.g., Facebook fblearner, Google Kubeflow.

The key difference between model factory and other projects is that model factory promotes a pure python based authoring experience, while most others uses DAG (Directed Acyclic Graph). The philosophy gives model factory the following advantages:

Easy to learn: there is almost no learning curve. As long as you know how to write python, you know how to use model factory.
More flexible: control flow logic can be easily implemented on it.
Allow communication between nodes: free form communication can be done between operators, which opens up the possibility of building distributed training on top of model factory.

Installation

Please follow the Installation page to deploy model factory in your production or testing environment.

Development Guide

Please follow the Development Guide page to try out your first model factory pipeline.

Model factory is a ML training platform to help engineers to build ML models at scale

Related tags

Overview

Model Factory

Background

Installation

Development Guide

Owner

Flask app to predict daily radiation from the time series of Solcast from Islamabad, Pakistan

A flexible CTF contest platform for coming PKU GeekGame events

K-means clustering is a method used for clustering analysis, especially in data mining and statistics.

scikit-multimodallearn is a Python package implementing algorithms multimodal data.

A quick reference guide to the most commonly used patterns and functions in PySpark SQL

List of Data Science Cheatsheets to rule the world

neurodsp is a collection of approaches for applying digital signal processing to neural time series

Used Logistic Regression, Random Forest, and XGBoost to predict the outcome of Search & Destroy games from the Call of Duty World League for the 2018 and 2019 seasons.

A Python-based application demonstrating various search algorithms, namely Depth-First Search (DFS), Breadth-First Search (BFS), and A* Search (Manhattan Distance Heuristic)

Scikit-Garden or skgarden is a garden for Scikit-Learn compatible decision trees and forests.

Stats, linear algebra and einops for xarray

Polyglot Machine Learning example for scraping similar news articles.

Timeseries analysis for neuroscience data

Anytime Learning At Macroscale

WAGMA-SGD is a decentralized asynchronous SGD for distributed deep learning training based on model averaging.

Decision Tree Regression algorithm implemented on Python from scratch.

Intel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application

Dive into Machine Learning

A webpage that utilizes machine learning to extract sentiments from tweets.

A library of extension and helper modules for Python's data analysis and machine learning libraries.