Utility tools for the "Divide and Remaster" dataset, introduced as part of the Cocktail Fork problem paper

Overview

Divide and Remaster Utility Tools

CFP Icon

Utility tools for the "Divide and Remaster" dataset, introduced as part of the Cocktail Fork problem paper

The DnR dataset is build from three, well-established, audio datasets; Librispeech, Free Music Archive (FMA), and Freesound Dataset 50k (FSD50K). We offer our dataset in both 16kHz and 44.1kHz sampling-rate along time-stamped annotations for each of the classes (genre for 'music', audio-tags for 'sound-effects', and transcription for 'speech'). We provide below more informations on how the dataset is build and what it's consists of exactly. We also go over the process of building the dataset from scratch for the cases it needs to.



Dataset Overview

The Divide and Remaster (DnR) dataset is a dataset aiming at providing research support for a relatively unexplored case of source separation with mixtures involving music, speech, and sound-effects (SFX) as their sources. The dataset is build from three, well-established, datasets. Consequently if one wants to build DnR from scratch, the aforementioned datasets will have to be downloaded first. Alternatively, DnR is also available on Zenodo

Get the DnR Dataset

In order to obtain DnR, several options are available depending on the task at hand:

Download

  • DnR-HQ (44.1kHz) is available on Zenodo at the following or simply run:
link to the Zenodo dataset coming soon ...
  • Alternatively, if DnR-16kHz is needed, please first download DnR-HQ locally. You can then downsample the dataset (either in-place or not) by cloning the dnr-utils repository and running:
python dnr_utils.py --task=downsample --inplace=True

Building DnR From Scratch

In the section, we go over the DnR building process. Since DnR is directly drawn from *FSD50K*, *LibriSpeech*/*LibriVox*, and *FMA, we first need to download these datasets. Please head to the following links for more details on how to get them:

Datasets Downloads

FSD50K
FMA-Medium Set
LibriSpeech/LibriVox



Please note that for FMA, the medium set only is required. In addition to the audio files, the metadata should also be downloaded. For LibriSpeech DnR uses dev-clean, test-clean, and train-clean-100. DnR will use the folder structure as well as metadata from LibriSpeech, but ultimately will build the LibriSpeech-HQ dataset off the original LibriVox mp3s, which is why we need them both for building DnR.

After download, all four datasets are expected to be found in the same root directory. Our root tree may look something like that. As the standardization script will look for specific file name, please make sure that all directory names conform to the ones described below:

root
├── fma-medium
│   ├── fma_metadata
│   │   ├── genres.csv
│   │   └── tracks.csv
│   ├── 008
│   ├── 008
│   ├── 009
│   └── 010
│   └── ...
├── fsd50k
│   ├── FSD50K.dev_audio
│   ├── FSD50K.eval_audio
│   └── FSD50K.ground_truth
│   │   ├── dev.csv
│   │   ├── eval.csv
│   │   └── vocabulary.csv
├── librispeech
│   ├── dev-clean
│   ├── test-clean
│   └── train-clean-100
└── librivox
    ├── 14
    ├── 16
    └── 17
    └── ...

Datasets Standardization

Once all four datasets are downloaded, some standardization work needs to be taken care of. The standardization process can be be executed by running standardization.py, which can be found in the dnr-utils repository. Prior to running the script you may want to install all the necessary dependencies included as part of the requirement.txt with pip install -r requirements.txt. Note: pydub uses ffmpeg under its hood, a system install of fmmpeg is thus required. Please see pydub's install instructions for more information. The standardization command may look something like:

python standardization.py --fsd50k-path=./FSD50K --fma-path=./FMA --librivox-path=./LibriVox --librispeech-path=./LibiSpeech  --dest-dir=./dest --validate-audio=True

DnR Dataset Compilation

Once the three resulting datasets are standardized, we are ready to finally compile DnR. At this point you should already have cloned the dnr-utils repository, which contains two key files:

  • config.py contains some configuration entries needed by the main script builder. You want to set all the appropriate paths pointing to your local datasets and ground truth files in there.
  • The compilation for a given set (here, train, val, and eval) can be executed with compile_dataset.py, for example by running the following commands for each set:
python compile_dataset.py with cfg.train
python compile_dataset.py with cfg.val
python compile_dataset.py with cfg.eval

Known Issues

Some known bugs and issues that we're aware. if not listed below, feel free to open a new issue here:

  • If building from scratch, pydub will fail at reading 15 mp3 files from the FMA medium-set and will return the following error: mp3 @ 0x559b8b084880] Failed to read frame size: Could not seek to 1026.

  • If building DnR from scratch, the script may return the following error, coming from pyloudnorm: Audio must be have length greater than the block size. That's because some audio segment, especially SFX events, may be shorter than 0.2 seconds, which is the minimum sample length (window) required by pyloudnorm for normalizing the audio. We just ignore these segments.


Contact and Support

Have an issue, concern, or question about DnR or its utility tools ? If so, please open an issue here

For any other inquiries, feel free to shoot an email at: [email protected], my name is Darius Petermann ;)


Owner
Darius Petermann
Signal Processing and Machine Learning for Audio
Darius Petermann
The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient.

You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient (paper) @misc{zhang2021compress,

46 Dec 07, 2022
Code for our work "Activation to Saliency: Forming High-Quality Labels for Unsupervised Salient Object Detection".

A2S-USOD Code for our work "Activation to Saliency: Forming High-Quality Labels for Unsupervised Salient Object Detection". Code will be released upon

15 Dec 16, 2022
[Link]mareteutral - pars tradg wth M []

pairs-trading-with-ML Jonathan Larkin, August 2017 One popular strategy classification is Pairs Trading. Though this category of strategies can exhibi

Jonathan Larkin 134 Jan 06, 2023
How Effective is Incongruity? Implications for Code-mix Sarcasm Detection.

Code for the paper: How Effective is Incongruity? Implications for Code-mix Sarcasm Detection - ICON ACL 2021

2 Jun 05, 2022
Python-experiments - A Repository which contains python scripts to automate things and make your life easier with python

Python Experiments A Repository which contains python scripts to automate things

Vivek Kumar Singh 11 Sep 25, 2022
SlideGraph+: Whole Slide Image Level Graphs to Predict HER2 Status in Breast Cancer

SlideGraph+: Whole Slide Image Level Graphs to Predict HER2 Status in Breast Cancer A novel graph neural network (GNN) based model (termed SlideGraph+

28 Dec 24, 2022
Optimizes image files by converting them to webp while also updating all references.

About Optimizes images by (re-)saving them as webp. For every file it replaced it automatically updates all references. Works on single files as well

Watermelon Wolverine 18 Dec 23, 2022
McGill Physics Hackathon 2021: Reaction-Diffusion Models for the Generation of Biological Patterns

DiffuseAnimals: Reaction-Diffusion Models for the Generation of Biological Patterns Introduction Reaction-diffusion equations can be utilized in order

Austin Szuminsky 2 Mar 07, 2022
Neural Scene Flow Fields using pytorch-lightning, with potential improvements

nsff_pl Neural Scene Flow Fields using pytorch-lightning. This repo reimplements the NSFF idea, but modifies several operations based on observation o

AI葵 178 Dec 21, 2022
Source code for the paper "PLOME: Pre-training with Misspelled Knowledge for Chinese Spelling Correction" in ACL2021

PLOME:Pre-training with Misspelled Knowledge for Chinese Spelling Correction (ACL2021) This repository provides the code and data of the work in ACL20

197 Nov 26, 2022
Learning To Have An Ear For Face Super-Resolution

Learning To Have An Ear For Face Super-Resolution [Project Page] This repository contains demo code of our CVPR2020 paper. Training and evaluation on

50 Nov 16, 2022
Code and data for the EMNLP 2021 paper "Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts". Coming soon!

ToxiChat Code and data for the EMNLP 2021 paper "Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts". Install depen

Ashutosh Baheti 11 Jan 01, 2023
Patch SVDD for Image anomaly detection

Patch SVDD Patch SVDD for Image anomaly detection. Paper: https://arxiv.org/abs/2006.16067 (published in ACCV 2020). Original Code : https://github.co

Hong-Jeongmin 0 Dec 03, 2021
Spline is a tool that is capable of running locally as well as part of well known pipelines like Jenkins (Jenkinsfile), Travis CI (.travis.yml) or similar ones.

Welcome to spline - the pipeline tool Important note: Since change in my job I didn't had the chance to continue on this project. My main new project

Thomas Lehmann 29 Aug 22, 2022
Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

CoProtector Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

Zhensu Sun 1 Oct 26, 2021
Experiments with differentiable stacks and queues in PyTorch

Please use stacknn-core instead! StackNN This project implements differentiable stacks and queues in PyTorch. The data structures are implemented in s

Will Merrill 141 Oct 06, 2022
3DMV jointly combines RGB color and geometric information to perform 3D semantic segmentation of RGB-D scans.

3DMV 3DMV jointly combines RGB color and geometric information to perform 3D semantic segmentation of RGB-D scans. This work is based on our ECCV'18 p

Владислав Молодцов 0 Feb 06, 2022
code for Multi-scale Matching Networks for Semantic Correspondence, ICCV

MMNet This repo is the official implementation of ICCV 2021 paper "Multi-scale Matching Networks for Semantic Correspondence.". Pre-requisite conda cr

joey zhao 25 Dec 12, 2022
Custom Implementation of Non-Deep Networks

ParNet Custom Implementation of Non-deep Networks arXiv:2110.07641 Ankit Goyal, Alexey Bochkovskiy, Jia Deng, Vladlen Koltun Official Repository https

Pritama Kumar Nayak 20 May 27, 2022
A Pytorch Implementation of ClariNet

ClariNet A Pytorch Implementation of ClariNet (Mel Spectrogram -- Waveform) Requirements PyTorch 0.4.1 & python 3.6 & Librosa Examples Step 1. Downlo

Sungwon Kim 286 Sep 15, 2022