Text-to-Music Retrieval using Pre-defined/Data-driven Emotion Embeddings

Last update: Dec 05, 2022

Overview

Text2Music Emotion Embedding

Text-to-Music Retrieval using Pre-defined/Data-driven Emotion Embeddings

Reference

Emotion Embedding Spaces for Matching Music to Stories, ISMIR 2021 [paper]

-- Minz Won, Justin Salamon, Nicholas J. Bryan, Gautham J. Mysore, and Xavier Serra

@inproceedings{won2021emotion,
  title={Emotion embedding spaces for matching music to stories},
  author={Won, Minz. and Salamon, Justin. and Bryan, Nicholas J. and Mysore, Gautham J. and Serra, Xavier.},
  booktitle={ISMIR},
  year={2021}
}

Requirements

conda create -n YOUR_ENV_NAME python=3.7
conda activate YOUR_ENV_NAME
pip install -r requirements.txt

Data

You need to collect audio files of AudioSet mood subset (link).
Read the audio files and store them into .npy format.
Other relevant data including Alm's dataset (original link), ISEAR dataset (original link), emotion embeddings, pretrained Word2Vec, and data splits are all available here (link).
Unzip ttm_data.tar.gz and locate the extracted data folder under text2music-emotion-embedding/.

Training

Here is an example for training a metric learning model.

python3 src/metric_learning/main.py \
        --dataset 'isear' \
        --num_branches 3 \
        --data_path YOUR_DATA_PATH_TO_AUDIOSET

Fore more examples, check bash files under scripts folder.

Test

Here is an example for the test.

python3 src/metric_learning/main.py \
        --mode 'TEST' \
        --dataset 'alm' \
        --model_load_path 'data/pretrained/alm_cross.ckpt' \
        --data_path 'YOUR_DATA_PATH_TO_AUDIOSET'

Pretrained three-branch metric learning models (alm_cross.ckpt and isear_cross.ckpt) are included in ttm_data.tar.gz. This code is reproducible by locating the unzipped data folder under text2music-emotion-embedding/.

Visualization

Embedding distribution of each model can be projected onto 2-dimensional space. We used uniform manifold approximation and projection (UMAP) to visualize the distribution. UMAP is known to preserve more of global structure compared to t-SNE.

Demo

Please try some examples done by the three-branch metric learning model [Soundcloud].

License

Some License

Text-to-Music Retrieval using Pre-defined/Data-driven Emotion Embeddings

Related tags

Overview

Text2Music Emotion Embedding

Reference

Requirements

Data

Training

Test

Visualization

Demo

License

Owner

Minz Won

Leibniz is a python package which provide facilities to express learnable partial differential equations with PyTorch

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

Visualization toolkit for neural networks in PyTorch! Demo -->

Tensorflow implementation of "Learning Deconvolution Network for Semantic Segmentation"

Code for our paper Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation

Official Code For TDEER: An Efficient Translating Decoding Schema for Joint Extraction of Entities and Relations (EMNLP2021)

Codes for NeurIPS 2021 paper "Adversarial Neuron Pruning Purifies Backdoored Deep Models"

I-SECRET: Importance-guided fundus image enhancement via semi-supervised contrastive constraining

pixelNeRF: Neural Radiance Fields from One or Few Images

The code from the paper Character Transformations for Non-Autoregressive GEC Tagging

This project contains an implemented version of Face Detection using OpenCV and Mediapipe. This is a code snippet and can be used in projects.

Yoloxkeypointsegment - An anchor-free version of YOLO, with a simpler design but better performance

A Game-Theoretic Perspective on Risk-Sensitive Reinforcement Learning

Repository for the paper "From global to local MDI variable importances for random forests and when they are Shapley values"

JumpDiff: Non-parametric estimator for Jump-diffusion processes for Python

Continual Learning of Electronic Health Records (EHR).

Code for "ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on", accepted at WACV 2021 Generation of Human Behavior Workshop.

2021-MICCAI-Progressively Normalized Self-Attention Network for Video Polyp Segmentation

Applicator Kit for Modo allow you to apply Apple ARKit Face Tracking data from your iPhone or iPad to your characters in Modo.