The official repository for Audio ALBERT

Related tags

AudioAALBERT
Overview

AALBERT

Here is also the official repository of AALBERT, which is Pytorch lightning reimplementation of the paper, Audio ALBERT: A Lite Bert for Self-Supervised Learning of Audio Representation. The original code is in AlbertNew branch of s3prl repo. In the paper, we proposed Audio ALBERT, which achieves performance comparable with massive pre-trained networks in the downstream tasks while having 91% fewer parameters.

drawingdrawing

Dependencies

  • Python 3.8
  • Computing power (high-end GPU) and memory space (both RAM/GPU's RAM) is extremely important if you'd like to train your own model.
  • Required packages and their use are listed requirements.txt.
  • pip install -r requirements.txt

Pretrain Stage

We use LibriSpeech as our pretraining stage dataset. You can download dataset by this link.

  • Stage 1: modify dataset path to your local dataset path:

    • AALBERT: config path: upstream/aalbert/pretrain_config.yaml
          line 16: datarc:
                  {Your dataset key name}: {your local dataset path}
    • Mockingjay: upstream/mockingjay/pretrain_config.yaml
          line 16: datarc:
                  {Your dataset key name}: {your local dataset path}
  • Stage 2: run pretraining script

    python run_pretrain.py -n aalbert_pretrained -u aalbert

    • -n : experiment_name
    • -u : upstream model: {two option: aalbert / mockingjay}
    • model will save on result folder after finish pretraining stage.

Downstream Stage

Here, we take voxceleb1 speaker classification as our downstream task. You can download dataset from their official website.

After pretraining, We can extract the pretrained model feature on different downstream tasks.

  • Stage 1: modify dataset path to your local dataset path
    • voxceleb1_speaker: config path: downstream/voxceleb1_speaker/train_config.yaml
    line  9: datarc:
    line 10:    file_path: {your dataset folder path}
    line 11:    meta_path: {your label file path}
  • Stage 2: run downstream script
    • voxceleb1_speaker:
      python run_downstream.py \
      -c downstream/voxceleb1_speaker/train_config.yaml \
      -g result/pretrain/{your_pretrained_model_folder}/model_config.yaml  \
      -t result/pretrain/{your_pretrained_model_folder}/pretrained_config.yaml \
      -u aalbert \
      -d voxceleb1_speaker \
      -k result/pretrained/{your pretrained_model_folder}/checkpoints/{checkpoint_you_want_to_use.ckpt} \
      -n voxceleb1_result
    • -n: experiment name
    • -c: downstream training config
    • -g: pretrained model config
    • -t: load pretrained model pretrained config
    • -u: upstream model: {two option: aalbert / mockingjay}
    • -d: downstream task name
    • -k: model checkpoint path
    • -f: finetune pretrained model or not, default=False
Owner
pohan
pohan
Port Hitsuboku Kumi Chinese CVVC voicebank to deepvocal. / 筆墨クミDeepvocal中文音源

Hitsuboku Kumi (筆墨クミ) is a UTAU virtual singer developed by Cubialpha. This project ports Hitsuboku Kumi Chinese CVVC voicebank to deepvocal. This is the first open-source deepvocal voicebank on Gith

8 Apr 26, 2022
Spotifyd - An open source Spotify client running as a UNIX daemon.

Spotifyd An open source Spotify client running as a UNIX daemon. Spotifyd streams music just like the official client, but is more lightweight and sup

8.5k Jan 09, 2023
cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding for Python

audioread Decode audio files using whichever backend is available. The library currently supports: Gstreamer via PyGObject. Core Audio on Mac OS X via

beetbox 419 Dec 26, 2022
A python library for working with praat, textgrids, time aligned audio transcripts, and audio files.

praatIO Questions? Comments? Feedback? A library for working with praat, time aligned audio transcripts, and audio files that comes with batteries inc

Tim 224 Dec 19, 2022
Cobra is a highly-accurate and lightweight voice activity detection (VAD) engine.

On-device voice activity detection (VAD) powered by deep learning.

Picovoice 88 Dec 16, 2022
A rofi-blocks script that searches youtube and plays the selected audio on mpv.

rofi-ytm A rofi-blocks script that searches youtube and plays the selected audio on mpv. To use the script, run the following command rofi -modi block

Cliford 26 Dec 21, 2022
a library for audio and music analysis

aubio aubio is a library to label music and sounds. It listens to audio signals and attempts to detect events. For instance, when a drum is hit, at wh

aubio 2.9k Dec 30, 2022
An app made in Python using the PyTube and Tkinter libraries to download videos and MP3 audio.

yt-dl (GUI Edition) An app made in Python using the PyTube and Tkinter libraries to download videos and MP3 audio. How do I download this? Windows: Fi

1 Oct 23, 2021
digital audio workstation, instrument and effect plugins, wave editor

digital audio workstation, instrument and effect plugins, wave editor

306 Jan 05, 2023
A python program for visualizing MIDI files, and displaying them in a spiral layout

SpiralMusic_python A python program for visualizing MIDI files, and displaying them in a spiral layout For a hardware version using Teensy & LED displ

Gavin 6 Nov 23, 2022
Music player and music library manager for Linux, Windows, and macOS

Ex Falso / Quod Libet - A Music Library / Editor / Player Quod Libet is a music management program. It provides several different ways to view your au

Quod Libet 1.2k Jan 07, 2023
In this project we can see how we can generate automatic music using character RNN.

Automatic Music Genaration Table of Contents Project Description Approach towards the problem Limitations Libraries Used Summary Applications Referenc

Pronay Ghosh 2 May 27, 2022
A music player designed for a University Project.

A music player designed for a University Project. Very flexibe and easy to use, a real life working application with user friendly controls. Hope u enjoy!!

Aditya Johorey 1 Nov 19, 2021
A python package for calculating the PESQ.

PyPESQ (WIP) Pypesq is a python wrapper for the PESQ score calculation C routine. It only can be used in evaluation purpose. INSTALL pip install https

Jingdong Li 269 Dec 18, 2022
SomaFM Plugin for Kodi

SomaFM XBMC Plugin This description is a bit outdated. You can simply install this addon by browsing the official repositories from within Kodi. Insta

7 Jan 21, 2022
ianZiPu is a way to write notation for Guqin (古琴) music.

PyBetween Wrapper for Between - 비트윈을 위한 파이썬 라이브러리 Legal Disclaimer 오직 교육적 목적으로만 사용할수 있으며, 비트윈은 VCNC의 자산입니다. 악의적 공격에 이용할시 처벌 받을수 있습니다. 사용에 따른 책임은 사용자가

Nancy Yi Liang 8 Nov 25, 2022
Any-to-any voice conversion using synthetic specific-speaker speeches as intermedium features

MediumVC MediumVC is an utterance-level method towards any-to-any VC. Before that, we propose SingleVC to perform A2O tasks(Xi → Ŷi) , Xi means utter

谷下雨 47 Dec 25, 2022
Using python to generate a bat script of repetitive lines of code that differ in some way but can sort out a group of audio files according to their common names

Batch Sorting Using python to generate a bat script of repetitive lines of code that differ in some way but can sort out a group of audio files accord

David Mainoo 1 Oct 29, 2021
Carnatic Notes Predictor for audio files

Carnatic Notes Predictor for audio files Link for live application: https://share.streamlit.io/pradeepak1/carnatic-notes-predictor-for-audio-files/mai

1 Nov 06, 2021
:notes: Cross-platform music player

Exaile Exaile is a music player with a simple interface and powerful music management capabilities. Features include automatic fetching of album art,

Exaile 327 Dec 19, 2022