Code for STFT Transformer used in BirdCLEF 2021 competition.

Last update: Sep 29, 2022

Related tags

Overview

STFT_Transformer

Code for STFT Transformer used in BirdCLEF 2021 competition.

The STFT Transformer is a new way to use Transformers similar to Vision Transformers on audio data. It has been developed for the BirdCLEF 2021 competition hosted on Kaggle. The pdf document gives more context. It has been submitted to the BIRDCLEF 2021 workshop.

The code is provided as is, it has not been rewritten. Given competitions are done in a hurry, code may not meet usual open source standard.

The code assumes this directory structure:

<base_dir>/code

<base_dir>/input

<base_dir>/input/freefield1010

<base_dir>/checkpoints

<base_dir>/data

Code has to be run in the code directory. Competition data has to be downloaded in the input directory. freefield1010 data must also be downloaded in the freefield1010 directory. data_final.py should be run first. It reads audio files from input and stores the relevant part in data directory as numpy files.

Then stft_transformer_final.py can be run to train one fold model. During the competition I ran 5 folds, by editing the FOLD global variable in the script (I know, this is sub standard).

Once all 5 models are trained one can upload the weights to a kaggle dataset and use the submission notebook I used. This should get a score worth the 15th rank in the competition. Achieving this rank with a single model is significant, as all top teams used an ensemble of models.

Code for STFT Transformer used in BirdCLEF 2021 competition.

Related tags

Overview

STFT_Transformer

Owner

Jean-François Puget

Lenia - Mathematical Life Forms

DivNoising is an unsupervised denoising method to generate diverse denoised samples for any noisy input image. This repository contains the code to reproduce the results reported in the paper https://openreview.net/pdf?id=agHLCOBM5jP

Code for Boundary-Aware Segmentation Network for Mobile and Web Applications

(ImageNet pretrained models) The official pytorch implemention of the TPAMI paper "Res2Net: A New Multi-scale Backbone Architecture"

CondenseNet V2: Sparse Feature Reactivation for Deep Networks

Optimized primitives for collective multi-GPU communication

Ensemble Learning Priors Driven Deep Unfolding for Scalable Snapshot Compressive Imaging [PyTorch]

Implementation for Simple Spectral Graph Convolution in ICLR 2021

Episodic-memory - Ego4D Episodic Memory Benchmark

A graphical Semi-automatic annotation tool based on labelImg and Yolov5

Aircraft design optimization made fast through modern automatic differentiation

Repository for "Improving evidential deep learning via multi-task learning," published in AAAI2022

PyTorch implementation of Trust Region Policy Optimization

Existing Literature about Machine Unlearning

alfred-py: A deep learning utility library for human

A simple, fully convolutional model for real-time instance segmentation.

School of Artificial Intelligence at the Nanjing University (NJU)School of Artificial Intelligence at the Nanjing University (NJU)

PyTorch implementation for the ICLR 2020 paper "Understanding the Limitations of Variational Mutual Information Estimators"

SW components and demos for visual kinship recognition. An emphasis is put on the FIW dataset-- data loaders, benchmarks, results in summary.

This repository contains the code needed to train Mega-NeRF models and generate the sparse voxel octrees

Code for STFT Transformer used in BirdCLEF 2021 competition.

Related tags

Overview

STFT_Transformer

Owner

Jean-François Puget

Lenia - Mathematical Life Forms

DivNoising is an unsupervised denoising method to generate diverse denoised samples for any noisy input image. This repository contains the code to reproduce the results reported in the paper https://openreview.net/pdf?id=agHLCOBM5jP

Code for Boundary-Aware Segmentation Network for Mobile and Web Applications

(ImageNet pretrained models) The official pytorch implemention of the TPAMI paper "Res2Net: A New Multi-scale Backbone Architecture"

CondenseNet V2: Sparse Feature Reactivation for Deep Networks

Optimized primitives for collective multi-GPU communication

Ensemble Learning Priors Driven Deep Unfolding for Scalable Snapshot Compressive Imaging [PyTorch]

Implementation for Simple Spectral Graph Convolution in ICLR 2021

Episodic-memory - Ego4D Episodic Memory Benchmark

A graphical Semi-automatic annotation tool based on labelImg and Yolov5

Aircraft design optimization made fast through modern automatic differentiation

Repository for "Improving evidential deep learning via multi-task learning," published in AAAI2022

PyTorch implementation of Trust Region Policy Optimization

Existing Literature about Machine Unlearning

alfred-py: A deep learning utility library for **human**

A simple, fully convolutional model for real-time instance segmentation.

School of Artificial Intelligence at the Nanjing University (NJU)School of Artificial Intelligence at the Nanjing University (NJU)

PyTorch implementation for the ICLR 2020 paper "Understanding the Limitations of Variational Mutual Information Estimators"

SW components and demos for visual kinship recognition. An emphasis is put on the FIW dataset-- data loaders, benchmarks, results in summary.

This repository contains the code needed to train Mega-NeRF models and generate the sparse voxel octrees

alfred-py: A deep learning utility library for human