AdamW optimizer and cosine learning rate annealing with restarts

Last update: Dec 20, 2022

Overview

AdamW optimizer and cosine learning rate annealing with restarts

This repository contains an implementation of AdamW optimization algorithm and cosine learning rate scheduler described in "Decoupled Weight Decay Regularization". AdamW implementation is straightforward and does not differ much from existing Adam implementation for PyTorch, except that it separates weight decaying from batch gradient calculations. Cosine annealing scheduler with restarts allows model to converge to a (possibly) different local minimum on every restart and normalizes weight decay hyperparameter value according to the length of restart period. Unlike schedulers presented in standard PyTorch scheduler suite this scheduler adjusts optimizer's learning rate not on every epoch, but on every batch update, according to the paper.

Cyclical Learning Rates

Besides "cosine" and "arccosine" policies (arccosine has steeper profile at the limiting points), there are "triangular", triangular2 and exp_range, which implement policies proposed in "Cyclical Learning Rates for Training Neural Networks". The ratio of increasing and decreasing phases for triangular policy could be adjusted with triangular_step parameter. Minimum allowed lr is adjusted by min_lr parameter.

triangular schedule is enabled by passing policy="triangular" parameter.
triangular2 schedule reduces maximum lr by half on each restart cycle and is enabled by passing policy="triangular2" parameter, or by combining parameters policy="triangular", eta_on_restart_cb=ReduceMaxLROnRestart(ratio=0.5). The ratio parameter regulates the factor by which lr is scaled on each restart.
exp_range schedule is enabled by passing policy="exp_range" parameter. It exponentially scales maximum lr depending on iteration count. The base of exponentiation is set by gamma parameter.

These schedules could be combined with shrinking/expanding restart periods, weight decay normalization and could be used with AdamW and other PyTorch optimizers.

Example:

    batch_size = 32
    epoch_size = 1024
    model = resnet()
    optimizer = AdamW(model.parameters(), lr=1e-3, weight_decay=1e-5)
    scheduler = CyclicLRWithRestarts(optimizer, batch_size, epoch_size, restart_period=5, t_mult=1.2, policy="cosine")
    for epoch in range(100):
        scheduler.step()
        train_for_every_batch(...)
            ...
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            scheduler.batch_step()
        validate(...)

AdamW optimizer and cosine learning rate annealing with restarts

Related tags

Overview

AdamW optimizer and cosine learning rate annealing with restarts

Cyclical Learning Rates

Example:

Owner

Maksym Pyrozhok

Add gui for YoloV5 using PyQt5

ObjDetApp deploys a pytorch model for object detection

Code repository for the paper "Doubly-Trained Adversarial Data Augmentation for Neural Machine Translation" with instructions to reproduce the results.

A general framework for deep learning experiments under PyTorch based on pytorch-lightning

A CNN implementation using only numpy. Supports multidimensional images, stride, etc.

dualFace: Two-Stage Drawing Guidance for Freehand Portrait Sketching (CVMJ)

Measures input lag without dedicated hardware, performing motion detection on recorded or live video

Exact Pareto Optimal solutions for preference based Multi-Objective Optimization

📚 A collection of all the Deep Learning Metrics that I came across which are not accuracy/loss.

Code for CPM-2 Pre-Train

Code for the Weighted, Accelerated and Restarted Primal-dual algorithm. This algorithm achieves stable linear convergence for reconstruction from undersampled noisy measurements under an approximate sharpness condition. See the paper for details.

Python implementation of the multistate Bennett acceptance ratio (MBAR)

Dynamic Bottleneck for Robust Self-Supervised Exploration

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Efficient Training of Audio Transformers with Patchout

A curated list of neural rendering resources.

Consensus score for tripadvisor

A tutorial on DataFrames.jl prepared for JuliaCon2021

face property detection pytorch

PyTorch version of the paper 'Enhanced Deep Residual Networks for Single Image Super-Resolution' (CVPRW 2017)

AdamW optimizer and cosine learning rate annealing with restarts

Related tags

Overview

AdamW optimizer and cosine learning rate annealing with restarts

Cyclical Learning Rates

Example:

Owner

Maksym Pyrozhok

Add gui for YoloV5 using PyQt5

*ObjDetApp* deploys a pytorch model for object detection

Code repository for the paper "Doubly-Trained Adversarial Data Augmentation for Neural Machine Translation" with instructions to reproduce the results.

A general framework for deep learning experiments under PyTorch based on pytorch-lightning

A CNN implementation using only numpy. Supports multidimensional images, stride, etc.

dualFace: Two-Stage Drawing Guidance for Freehand Portrait Sketching (CVMJ)

Measures input lag without dedicated hardware, performing motion detection on recorded or live video

Exact Pareto Optimal solutions for preference based Multi-Objective Optimization

📚 A collection of all the Deep Learning Metrics that I came across which are not accuracy/loss.

Code for CPM-2 Pre-Train

Code for the Weighted, Accelerated and Restarted Primal-dual algorithm. This algorithm achieves stable linear convergence for reconstruction from undersampled noisy measurements under an approximate sharpness condition. See the paper for details.

Python implementation of the multistate Bennett acceptance ratio (MBAR)

Dynamic Bottleneck for Robust Self-Supervised Exploration

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Efficient Training of Audio Transformers with Patchout

A curated list of neural rendering resources.

Consensus score for tripadvisor

A tutorial on DataFrames.jl prepared for JuliaCon2021

face property detection pytorch

PyTorch version of the paper 'Enhanced Deep Residual Networks for Single Image Super-Resolution' (CVPRW 2017)

ObjDetApp deploys a pytorch model for object detection