This is an example of a reproducible modelling project

Last update: Oct 26, 2021

Related tags

Overview

An example of a reproducible modelling project

What are we doing?

This example was created for the 2021 fall lecture series of Stanford's Center for Open and REproducible Science (CORES).

A video of the talk can be found at: https://youtu.be/JAQot6b1Cng

The goal of this exemplary analysis is to explore the effect of varying different hyper-parameters of the training of a simple classification model on its performance in scikit-learn's handwritten digit dataset.

Specifically, we will study the effect of varying the learning rate, regularisation strength, number of gradient descent steps, and random shuffling of the data on the 3-fold cross-validation performance of scikit-learn's linear support vector machine classifier.

Importantly, each hyper-parameter is varied separately while all other hyper-parameters are set to default values (for details, see scripts/evaluate_hyper_params_effect.py).

Project organization

├── LICENSE            <- MIT License
├── Makefile           <- Makefile with targets to 'load', 'evaluate', and 'plot' ('make all' runs all three analysis steps)
├── poetry.lock        <- Details of used package versions
├── pyproject.toml     <- Lists all dependencies
├── README.md          <- This README file.
├── docs/              
|    └──               <- Slides of the practical tutorial
├── data/
|    └──               <- A copy of the handwritten digit dataset provided by scikit-learn
|
├── results/
|    ├── estimates/
|    │    └──          <- Generated estimates of classifier performance
|    └── figures/
|         └──          <- Generated figures
|
├── scrips/
|    ├── load_data.py                       <- Downloads the dataset to specified 'data-path'
|    ├── evaluate_hyper_params_effect.py    <- Runs cross-validated hyper-parameter evaluation
|    ├── plot_hyper_params_effect.py        <- Summarizes results of evaluation in a figure
|    └── run_analysis.sh                    <- Runs all analysis steps
|
└── src/
    ├── hyper/
    │    ├──  __init__.py                   <- Makes 'hyper' a Python module
    │    ├── grid.py                        <- Functionality to sample hyper-parameter grid
    │    ├── evaluation.py                  <- Functionality to evaluate classifier performance, given hyper-parameters
    │    └── plotting.py                    <- Functionality to visualize results
    └── setup.py                            <- Makes 'hyper' pip-installable (pip install -e .)

Data description

We use the handwritten digits dataset provided by scikit-learn. For details on this dataset, see scikit-learn's documentation:

https://scikit-learn.org/stable/datasets/toy_dataset.html#digits-dataset

Installation

This project is written for Python 3.9.5 (we recommend pyenv for Python version management).

All software dependencies of this project are managed with Python Poetry. All details about the used package versions are provided in pyproject.toml.

To clone this repository to your local machine, run:

git clone https://github.com/athms/reproducible-modelling

To install all dependencies with poetry, run:

cd reproducible-modelling/
poetry install

To reproduce our analyses, you additionally need to install our custom Python module (src/hyper) in your poetry environment:

cd src/
poetry run pip install -e .

Reproducing our analysis

Our analysis can be reproduced either by running scripts/run_analysis.sh:

cd scripts
poetry run bash run_analysis.sh

..or by the use of make:

poetry run make <ANALYSIS TARGET>

We provide the following targets for make:

Analysis target	Description
all	Runs the entire analysis pipeline
load	Downloads scikit-learn's handwritten digit dataset
evaluate	Runs our cross-validated hyper-parameter evaluation
plot	Creates our results figure

This README file is strongly inspired by the Cookiecutter Data Science Structure

This is an example of a reproducible modelling project

Related tags

Overview

An example of a reproducible modelling project

What are we doing?

Project organization

Data description

Installation

Reproducing our analysis

Owner

Armin Thomas

This is the latest version of the PULP SDK

Sequence to Sequence (seq2seq) Recurrent Neural Network (RNN) for Time Series Forecasting

A TensorFlow implementation of the Mnemonic Descent Method.

Open-source python package for the extraction of Radiomics features from 2D and 3D images and binary masks.

👐OpenHands : Making Sign Language Recognition Accessible (WiP 🚧👷‍♂️🏗)

Image Segmentation using U-Net, U-Net with skip connections and M-Net architectures

Python Assignments for the Deep Learning lectures by Andrew NG on coursera with complete submission for grading capability.

Elastic weight consolidation technique for incremental learning.

Step by Step on how to create an vision recognition model using LOBE.ai, export the model and run the model in an Azure Function

Frequency Spectrum Augmentation Consistency for Domain Adaptive Object Detection

Dataset para entrenamiento de yoloV3 para 4 clases

Predicting lncRNA–protein interactions based on graph autoencoders and collaborative training

AtlasNet: A Papier-Mâché Approach to Learning 3D Surface Generation

Learning with Noisy Labels via Sparse Regularization, ICCV2021

Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.

learned_optimization: Training and evaluating learned optimizers in JAX

Hierarchical Clustering: O(1)-Approximation for Well-Clustered Graphs

An Unsupervised Graph-based Toolbox for Fraud Detection

🥇 LG-AI-Challenge 2022 1위 솔루션 입니다.

[NeurIPS'21] "AugMax: Adversarial Composition of Random Augmentations for Robust Training" by Haotao Wang, Chaowei Xiao, Jean Kossaifi, Zhiding Yu, Animashree Anandkumar, and Zhangyang Wang.