YOLOX_AUDIO is an audio event detection model based on YOLOX

Overview

Introduction

YOLOX_AUDIO is an audio event detection model based on YOLOX, an anchor-free version of YOLO. This repo is an implementated by PyTorch. Main goal of YOLOX_AUDIO is to detect and classify pre-defined audio events in multi-spectrogram domain using image object detection frameworks.

Updates!!

  • 【2021/11/15】 We released YOLOX_AUDIO to public

Quick Start

Installation

Step1. Install YOLOX_AUDIO.

git clone https://github.com/intflow/YOLOX_AUDIO.git
cd YOLOX_AUDIO
pip3 install -U pip && pip3 install -r requirements.txt
pip3 install -v -e .  # or  python3 setup.py develop

Step2. Install pycocotools.

pip3 install cython; pip3 install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
Data Preparation

Step1. Prepare audio wavform files for training. AUDIO_DATAPATH/wav

Step2. Write audio annotation files for training. AUDIO_DATAPATH/label.json

{
    "00000.wav": {
        "speaker": [
            "W",
            "M",
            "C",
            "W"
        ],
        "on_offset": [
            [
                1.34425,
                2.4083125
            ],
            [
                4.0082708333333334,
                4.5560625
            ],
            [
                6.2560416666666665,
                7.956104166666666
            ],
            [
                9.756083333333333,
                10.876624999999999
            ]
        ]
    },
    "00001.wav": {
        "speaker": [
            "W",
            "M",
            "C",
            "M",
            "W",
            "C"
        ],
        "on_offset": [
            [
                1.4325416666666666,
                2.7918958333333332
            ],
            [
                2.1762916666666667,
                4.109729166666667
            ],
            [
                7.109708333333334,
                8.530916666666666
            ],
            [
                8.514125,
                9.306104166666668
            ],
            [
                12.606083333333334,
                14.3345625
            ],
            [
                14.148958333333333,
                15.362958333333333
            ]
        ]
    },
    ...
}

Step3. Convert audio files into spectrogram images.

python tools/json_gen_audio2coco.py

Please change the dataset path and file names for your needs

root = '/data/AIGC_3rd_2021/GIST_tr2_veryhard5000_all_tr2'
os.system('rm -rf '+root+'/img/')
os.system('mkdir '+root+'/img/')
wav_folder_path = os.path.join(root, 'wav')
img_folder_path = os.path.join(root, 'img')
train_label_path = os.path.join(root, 'tr2_devel_5000.json')
train_label_merge_out = os.path.join(root, 'label_coco_bbox.json')
Training

Step1. Change Data loading path of exps/yolox_audio__tr2/yolox_x.py

        self.train_path = '/data/AIGC_3rd_2021/GIST_tr2_veryhard5000_all_tr2'
        self.val_path = '/data/AIGC_3rd_2021/tr2_set_01_tune'
        self.train_ann = "label_coco_bbox.json"
        self.val_ann = "label_coco_bbox.json"

Step2. Begin training:

python3 tools/train.py -expn yolox_audio__tr2 -n yolox_audio_x \
-f exps/yolox_audio__tr2/yolox_x.py -d 4 -b 32 --fp16 \
-c /data/pretrained/yolox_x.pth
  • -d: number of gpu devices
  • -b: total batch size, the recommended number for -b is num-gpu * 8
  • -f: path of experiement file
  • --fp16: mixed precision training
  • --cache: caching imgs into RAM to accelarate training, which need large system RAM.

We are encouraged to use pretrained YOLOX model for the training. https://github.com/Megvii-BaseDetection/YOLOX

Inference Run following demo_audio.py
python3 tools/demo.py --demo image -expn yolox_audio__tr2 -n yolox_audio_x \
-f exps/yolox_audio__tr2/yolox_x.py \
-c YOLOX_outputs/yolox_audio__tr2/best_ckpt.pth \
--path /data/AIGC_3rd_2021/GIST_tr2_100/img/ \
--save_folder /data/yolox_out \
--conf 0.2 --nms 0.65 --tsize 256 --save_result --device gpu

From the demo_audio.py you can get on-offset VAD time and class of each audio chunk.

References

  • YOLOX baseline implemented by PyTorch: YOLOX
 @article{yolox2021,
  title={YOLOX: Exceeding YOLO Series in 2021},
  author={Ge, Zheng and Liu, Songtao and Wang, Feng and Li, Zeming and Sun, Jian},
  journal={arXiv preprint arXiv:2107.08430},
  year={2021}
}
  • Librosa for audio feature extraction: librosa
McFee, Brian, Colin Raffel, Dawen Liang, Daniel PW Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. “librosa: Audio and music signal analysis in python.” In Proceedings of the 14th python in science conference, pp. 18-25. 2015.

Acknowledgement

This work was supported by the Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2021-0-00014).

Owner
intflow Inc.
Official Code Repositories of intflow.ai
intflow Inc.
[ACM MM 2021] Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation)

Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation) [arXiv] [paper] @inproceedings{hou2021multiview, title={Multiview

Yunzhong Hou 27 Dec 13, 2022
Learning Correspondence from the Cycle-consistency of Time (CVPR 2019)

TimeCycle Code for Learning Correspondence from the Cycle-consistency of Time (CVPR 2019, Oral). The code is developed based on the PyTorch framework,

Xiaolong Wang 706 Nov 29, 2022
A large-scale benchmark for co-optimizing the design and control of soft robots, as seen in NeurIPS 2021.

Evolution Gym A large-scale benchmark for co-optimizing the design and control of soft robots. As seen in Evolution Gym: A Large-Scale Benchmark for E

121 Dec 14, 2022
Few-shot Learning of GPT-3

Few-shot Learning With Language Models This is a codebase to perform few-shot "in-context" learning using language models similar to the GPT-3 paper.

Tony Z. Zhao 224 Dec 28, 2022
ICLR2021 (Under Review)

Self-Supervised Time Series Representation Learning by Inter-Intra Relational Reasoning This repository contains the official PyTorch implementation o

Haoyi Fan 58 Dec 30, 2022
Boost learning for GNNs from the graph structure under challenging heterophily settings. (NeurIPS'20)

Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs Jiong Zhu, Yujun Yan, Lingxiao Zhao, Mark Heimann, Leman Akoglu,

GEMS Lab: Graph Exploration & Mining at Scale, University of Michigan 70 Dec 18, 2022
Voice control for Garry's Mod

WIP: Talonvoice GMod integrations Very work in progress voice control demo for Garry's Mod. HOWTO Install https://talonvoice.com/ Press https://i.imgu

Meta Construct 5 Nov 15, 2022
Pytorch implementation for ACMMM2021 paper "I2V-GAN: Unpaired Infrared-to-Visible Video Translation".

I2V-GAN This repository is the official Pytorch implementation for ACMMM2021 paper "I2V-GAN: Unpaired Infrared-to-Visible Video Translation". Traffic

69 Dec 31, 2022
Code for "R-GCN: The R Could Stand for Random"

RR-GCN: Random Relational Graph Convolutional Networks PyTorch Geometric code for the paper "R-GCN: The R Could Stand for Random" RR-GCN is an extensi

PreDiCT.IDLab 31 Sep 07, 2022
PyContinual (An Easy and Extendible Framework for Continual Learning)

PyContinual (An Easy and Extendible Framework for Continual Learning) Easy to Use You can sumply change the baseline, backbone and task, and then read

176 Jan 05, 2023
Official PaddlePaddle implementation of Paint Transformer

Paint Transformer: Feed Forward Neural Painting with Stroke Prediction [Paper] [Paddle Implementation] Update We have optimized the serial inference p

TianweiLin 284 Dec 31, 2022
Flappy bird automation using Neuroevolution of Augmenting Topologies (NEAT) in Python

FlappyAI Flappy bird automation using Neuroevolution of Augmenting Topologies (NEAT) in Python Everything Used Genetic Algorithm especially NEAT conce

Eryawan Presma Y. 2 Mar 24, 2022
RoadMap and preparation material for Machine Learning and Data Science - From beginner to expert.

ML-and-DataScience-preparation This repository has the goal to create a learning and preparation roadMap for Machine Learning Engineers and Data Scien

33 Dec 29, 2022
yufan 81 Dec 08, 2022
Unofficial implementation (replicates paper results!) of MINER: Multiscale Implicit Neural Representations in pytorch-lightning

MINER_pl Unofficial implementation of MINER: Multiscale Implicit Neural Representations in pytorch-lightning. 📖 Ref readings Laplacian pyramid explan

AI葵 51 Nov 28, 2022
Semi-Supervised Graph Prototypical Networks for Hyperspectral Image Classification, IGARSS, 2021.

Semi-Supervised Graph Prototypical Networks for Hyperspectral Image Classification, IGARSS, 2021. Bobo Xi, Jiaojiao Li, Yunsong Li and Qian Du. Code f

Bobo Xi 7 Nov 03, 2022
Code for our NeurIPS 2021 paper 'Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation'

Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation (NeurIPS 2021) Code for our NeurIPS 2021 paper 'Exploiting the Intri

Shiqi Yang 53 Dec 25, 2022
Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model

Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model About This repository contains the code to replicate the syn

Haruka Kiyohara 12 Dec 07, 2022
A deep learning CNN model to identify and classify and check if a person is wearing a mask or not.

Face Mask Detection The Model is designed to check if any human is wearing a mask or not. Dataset Description The Dataset contains a total of 11,792 i

1 Mar 01, 2022
Split Variational AutoEncoder

Split-VAE Split Variational AutoEncoder Introduction This repository contains and implemementation of a Split Variational AutoEncoder (SVAE). In a SVA

Andrea Asperti 2 Sep 02, 2022