Near-Optimal Sparse Allreduce for Distributed Deep Learning (published in PPoPP'22)

Last update: Oct 29, 2022

Related tags

Overview

Near-Optimal Sparse Allreduce for Distributed Deep Learning (published in PPoPP'22)

Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k communication volume which is asymptotically optimal) with the decentralized parallel Stochastic Gradient Descent (SGD) optimizer, and its convergence is proved theoretically and empirically.

Setup the environment

To install the required Python modules:

conda create --name py38_oktopk python=3.8

conda activate py38_oktopk

pip3 install pip==20.2.4

pip install -r requirements.txt

MPICC="cc -shared" pip install --no-binary=mpi4py mpi4py

git clone https://github.com/NVIDIA/apex

cd apex

pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Prepare Datasets

Cifar-10 for VGG

cd ./VGG/vgg_data

wget https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz

tar -zxvf cifar-10-python.tar.gz

AN4 for LSTM

cd ./LSTM/audio_data

wget https://www.dropbox.com/s/l5w4up20u5pfjxf/an4.zip

unzip an4.zip

Wikipedia for BERT

cd ./BERT/bert/bert_data/

Prepare the dataset according to the README file.

Run jobs

We run experiments on GPU clusters with SLURM job scheduler. To evaluate the performance of Ok-Topk, Gaussiank, gtopk, topkA, topkDSA, and dense, run the jobs as follows.

To run VGG jobs

cd ./VGG

./sbatch_vgg_jobs.sh

To run LSTM jobs

cd ./LSTM

./sbatch_lstm_jobs.sh

To run BERT jobs

cd ./BERT/bert/

./sbatch_bert_jobs.sh

Publication

The work of Ok-Topk is pulished in PPoPP'22. DOI

License

See LICENSE.

Near-Optimal Sparse Allreduce for Distributed Deep Learning (published in PPoPP'22)

Related tags

Overview

Near-Optimal Sparse Allreduce for Distributed Deep Learning (published in PPoPP'22)

Setup the environment

Prepare Datasets

Cifar-10 for VGG

AN4 for LSTM

Wikipedia for BERT

Run jobs

To run VGG jobs

To run LSTM jobs

To run BERT jobs

Publication

License

Owner

Shigang Li

Official implementation of the NeurIPS 2021 paper Online Learning Of Neural Computations From Sparse Temporal Feedback

Official implementation of the paper 'Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution' in CVPR 2022

PyTorch implementation of SQN based on CloserLook3D's encoder

Zeyuan Chen, Yangchao Wang, Yang Yang and Dong Liu.

PINN Burgers - 1D Burgers equation simulated by PINN

Black-Box-Tuning - Black-Box Tuning for Language-Model-as-a-Service

Code To Tune or Not To Tune? Zero-shot Models for Legal Case Entailment.

BOVText: A Large-Scale, Multidimensional Multilingual Dataset for Video Text Spotting

D-NeRF: Neural Radiance Fields for Dynamic Scenes

CBREN: Convolutional Neural Networks for Constant Bit Rate Video Quality Enhancement

Repository for reproducing `Model-Based Robust Deep Learning`

Make your master artistic punk avatar through machine learning world famous paintings.

Deep Inside Convolutional Networks - This is a caffe implementation to visualize the learnt model

A Python Library for Graph Outlier Detection (Anomaly Detection)

Distilled coarse part of LoFTR adapted for compatibility with TensorRT and embedded divices

Machine-in-the-Loop Rewriting for Creative Image Captioning

Machine learning, in numpy

Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer.

Code and Resources for the Transformer Encoder Reasoning Network (TERN)

Image Super-Resolution by Neural Texture Transfer