The mini-MusicNet dataset

Overview

mini-MusicNet

A music-domain dataset for multi-label classification

Music transcription is sequence-to-sequence prediction problem: given an audio performance, we must predict a corresponding sequence of notes. If we ignore correlations in the sequence of notes, music transcription simplifies to a multi-label classification problem. Given an audio performance, we are tasked with predicting the set of notes present in an audio performance at a given time. The mini-MusicNet dataset is derived from the MusicNet dataset, providing a scaled-down, pre-processed subset of MusicNet suitable for multi-label classification.

This repository provides information for downloading and interacting with mini-MusicNet, as well as some algorithmic baselines for multi-label classification with mini-MusicNet.

About mini-MusicNet

Download. The mini-MusicNet dataset can be downloaded here. To follow the tutorial in the next section or run explore.ipynb, please download mini-MusicNet to the minimusic sub-directory of the root of this repository.

This dataset consists of n = 82,500 data points with d = 4,096 features and k = 128 binary labels per datapoint. Each data point is an approximately 9ms audio clip: these clips are sampled at regular intervals from the underlying MusicNet dataset. Each clip is normalized to amplitudes in [-1,1]. The label on a datapoint is a binary k-dimensional (multi-hot) vector that indicates the notes being performed at the center of the audio clip. We define train, validation, and test splits with n = 62,500, 10,000, and 10,000 data points respectively. The mini-MusicNet dataset can be acquired here. Alternatively, you can use construct.py to reconstruct mini-MusicNet from a copy of MusicNet.

Exploring mini-MusicNet

To get started, let's load and visualize the training data. The contents of this section are summarized in the explore.ipynb notebook.

import numpy as np
import matplotlib.pyplot as plt

Xtrain = np.load('minimusic/audio-train.npy')
Ytrain = np.load('minimusic/labels-train.npy')

fig, ax = plt.subplots(1, 2, figsize=(10,2))
ax[0].set_title('Raw acoustic features')
ax[0].plot(Xtrain[0])
ax[1].set_title('Fourier transform of the raw features')
ax[1].plot(np.abs(np.fft.rfft(Xtrain[0])[0:256])) # clip to 256 features for easier visualization

Now let's see how linear (ridge) regression performs on the raw audio features. We'll measure results using average precision.

from sklearn.metrics import average_precision_score

Xtest = np.load('minimusic/audio-test.npy')
Ytest = np.load('minimusic/labels-test.npy')

R = .001
beta = np.dot(np.linalg.inv(np.dot(Xtrain.T,Xtrain) + R*np.eye(Xtrain.shape[1])),np.dot(Xtrain.T,Ytrain))

print('Train AP:', round(average_precision_score(Ytrain, np.dot(Xtrain, beta), average='micro'), 2))
print('Test AP:', round(average_precision_score(Ytest, np.dot(Xtest, beta), average='micro'), 2))

Train AP: 0.19 Test AP: 0.04

That's not so great. We can do much better by transforming our audio wave to the Fourier domain.

Xtrainfft = np.abs(np.fft.rfft(Xtrain))
Xtestfft = np.abs(np.fft.rfft(Xtest))

R = .001
beta = np.dot(np.linalg.inv(np.dot(Xtrainfft.T,Xtrainfft) + R*np.eye(Xtrainfft.shape[1])),np.dot(Xtrainfft.T,Ytrain))

print('Train AP:', round(average_precision_score(Ytrain, np.dot(Xtrainfft, beta), average='micro'), 2))
print('Test AP:', round(average_precision_score(Ytest, np.dot(Xtestfft, beta), average='micro'), 2))

Train AP: 0.57 Test AP: 0.47

Finally, it can often be more revealing to look at a precision-recall curve, rather than the scalar average precision (the area under the P/R curve). Let's see what our full P/R curve looks like for ridge regression on Fourier features.

fig, ax = plt.subplots(1, 2, figsize=(10,4))
ax[0].set_title('Train P/R Curve')
plot_pr_curve(ax[0], Ytrain, np.dot(Xtrainfft, beta))
ax[1].set_title('Test P/R Curve')
plot_pr_curve(ax[1], Ytest, np.dot(Xtestfft, beta))

And that's enough to get us started! We hope that mini-MusicNet can be a useful resource for empirical work in multi-label classification.

References

For further information about MusicNet, or if you want to cite this work, please see:

@inproceedings{thickstun2017learning,
  author    = {John Thickstun and Zaid Harchaoui and Sham M. Kakade},
  title     = {Learning Features of Music from Scratch},
  booktitle = {International Conference on Learning Representations},
  year      = {2017},
}
Owner
John Thickstun
John Thickstun
Official page of Struct-MDC (RA-L'22 with IROS'22 option); Depth completion from Visual-SLAM using point & line features

Struct-MDC (click the above buttons for redirection!) Official page of "Struct-MDC: Mesh-Refined Unsupervised Depth Completion Leveraging Structural R

Urban Robotics Lab. @ KAIST 37 Dec 22, 2022
Official PyTorch Implementation of Mask-aware IoU and maYOLACT Detector [BMVC2021]

The official implementation of Mask-aware IoU and maYOLACT detector. Our implementation is based on mmdetection. Mask-aware IoU for Anchor Assignment

Kemal Oksuz 46 Sep 29, 2022
Source code for CVPR 2020 paper "Learning to Forget for Meta-Learning"

L2F - Learning to Forget for Meta-Learning Sungyong Baik, Seokil Hong, Kyoung Mu Lee Source code for CVPR 2020 paper "Learning to Forget for Meta-Lear

Sungyong Baik 29 May 22, 2022
Computer Vision and Pattern Recognition, NUS CS4243, 2022

CS4243_2022 Computer Vision and Pattern Recognition, NUS CS4243, 2022 Cloud Machine #1 : Google Colab (Free GPU) Follow this Notebook installation : h

Xavier Bresson 142 Dec 15, 2022
[NeurIPS'20] Multiscale Deep Equilibrium Models

Multiscale Deep Equilibrium Models 💥 💥 💥 💥 This repo is deprecated and we will soon stop actively maintaining it, as a more up-to-date (and simple

CMU Locus Lab 221 Dec 26, 2022
OpenLT: An open-source project for long-tail classification

OpenLT: An open-source project for long-tail classification Supported Methods for Long-tailed Recognition: Cross-Entropy Loss Focal Loss (ICCV'17) Cla

Ming Li 37 Sep 15, 2022
Convenient tool for speeding up the intern/officer review process.

icpc-app-screen Convenient tool for speeding up the intern/officer applicant review process. Eliminates the pain from reading application responses of

1 Oct 30, 2021
MISSFormer: An Effective Medical Image Segmentation Transformer

MISSFormer Code for paper "MISSFormer: An Effective Medical Image Segmentation Transformer". Please read our preprint at the following link: paper_add

Fong 22 Dec 24, 2022
Rl-quickstart - Reinforcement Learning Quickstart

Reinforcement Learning Quickstart To get setup with the repository, git clone ht

UCLA DataRes 3 Jun 16, 2022
SwinIR: Image Restoration Using Swin Transformer

SwinIR: Image Restoration Using Swin Transformer This repository is the official PyTorch implementation of SwinIR: Image Restoration Using Shifted Win

Jingyun Liang 2.4k Jan 05, 2023
PixelPick This is an official implementation of the paper "All you need are a few pixels: semantic segmentation with PixelPick."

PixelPick This is an official implementation of the paper "All you need are a few pixels: semantic segmentation with PixelPick." [Project page] [Paper

Gyungin Shin 59 Sep 25, 2022
This is the code for HOI Transformer

HOI Transformer Code for CVPR 2021 accepted paper End-to-End Human Object Interaction Detection with HOI Transformer. Reproduction We recomend you to

BigBangEpoch 124 Dec 29, 2022
Using Streamlit to host a multi-page tool with model specs and classification metrics, while also accepting user input values for prediction.

Predicitng_viability Using Streamlit to host a multi-page tool with model specs and classification metrics, while also accepting user input values for

Gopalika Sharma 1 Nov 08, 2021
Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression

Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression YOLOv5 with alpha-IoU losses implemented in PyTorch. Example r

Jacobi(Jiabo He) 147 Dec 05, 2022
Towards Flexible Blind JPEG Artifacts Removal (FBCNN, ICCV 2021)

Towards Flexible Blind JPEG Artifacts Removal (FBCNN, ICCV 2021) Jiaxi Jiang, Kai Zhang, Radu Timofte Computer Vision Lab, ETH Zurich, Switzerland 🔥

Jiaxi Jiang 282 Jan 02, 2023
Mixup for Supervision, Semi- and Self-Supervision Learning Toolbox and Benchmark

OpenSelfSup News Downstream tasks now support more methods(Mask RCNN-FPN, RetinaNet, Keypoints RCNN) and more datasets(Cityscapes). 'GaussianBlur' is

AI Lab, Westlake University 332 Jan 03, 2023
用opencv的dnn模块做yolov5目标检测,包含C++和Python两个版本的程序

yolov5-dnn-cpp-py yolov5s,yolov5l,yolov5m,yolov5x的onnx文件在百度云盘下载, 链接:https://pan.baidu.com/s/1d67LUlOoPFQy0MV39gpJiw 提取码:bayj python版本的主程序是main_yolov5.

365 Jan 04, 2023
It is a simple library to speed up CLIP inference up to 3x (K80 GPU)

CLIP-ONNX It is a simple library to speed up CLIP inference up to 3x (K80 GPU) Usage Install clip-onnx module and requirements first. Use this trick !

Gerasimov Maxim 93 Dec 20, 2022
A PyTorch implementation for PyramidNets (Deep Pyramidal Residual Networks)

A PyTorch implementation for PyramidNets (Deep Pyramidal Residual Networks) This repository contains a PyTorch implementation for the paper: Deep Pyra

Greg Dongyoon Han 262 Jan 03, 2023
Dense Passage Retriever - is a set of tools and models for open domain Q&A task.

Dense Passage Retrieval Dense Passage Retrieval (DPR) - is a set of tools and models for state-of-the-art open-domain Q&A research. It is based on the

Meta Research 1.1k Jan 03, 2023