Code for the ICASSP-2021 paper: Continuous Speech Separation with Conformer.

Last update: Nov 28, 2022

Related tags

Overview

Continuous Speech Separation with Conformer

Introduction

We examine the use of the Conformer architecture for continuous speech separation. Conformer allows the separation model to efficiently capture both local and global context information, which is helpful for speech separation. Experimental results using the LibriCSS dataset show that the Conformer separation model achieves state of the art results for both single-channel and multi-channel settings.

For a detailed description and experimental results, please refer to our paper: Continuous Speech Separation with Conformer (Accepted by ICASSP 2021).

Environment

python 3.6.9, torch 1.7.1

Get Started

Download the overlapped speech of LibriCSS dataset.

wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1PdloA-V8HGxkRu9MnT35_civpc3YXJsT' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1PdloA-V8HGxkRu9MnT35_civpc3YXJsT" -O overlapped_speech.zip && rm -rf /tmp/cookies.txt && unzip overlapped_speech.zip && rm overlapped_speech.zip

Download the Conformer separation models.

wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1OlTbEvxYUoqWIHfeAXCftL9srbWUo4I1' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1OlTbEvxYUoqWIHfeAXCftL9srbWUo4I1" -O checkpoints.zip && rm -rf /tmp/cookies.txt && unzip checkpoints.zip && rm checkpoints.zip

Run the separation.

3.1 Single-channel separation

export MODEL_NAME=1ch_conformer_base
python3 separate.py \
    --checkpoint checkpoints/$MODEL_NAME \
    --mix-scp utils/overlapped_speech_1ch.scp \
    --dump-dir separated_speech/monaural/utterances_with_$MODEL_NAME \
    --device-id 0 \
    --num_spks 2

The separated speech can be found in the directory 'separated_speech/monaural/utterances_with_$MODEL_NAME'

3.2 Seven-channel separation

export MODEL_NAME=conformer_base
python3 separate.py \
    --checkpoint checkpoints/$MODEL_NAME \
    --mix-scp utils/overlapped_speech_7ch.scp \
    --dump-dir separated_speech/7ch/utterances_with_$MODEL_NAME \
    --device-id 0 \
    --num_spks 2 \
    --mvdr True

The separated speech can be found in the directory 'separated_speech/7ch/utterances_with_$MODEL_NAME'

Citation

If you find our work useful, please cite our paper:

@inproceedings{CSS_with_Conformer,
  title={Continuous speech separation with conformer},
  author={Chen, Sanyuan and Wu, Yu and Chen, Zhuo and Wu, Jian and Li, Jinyu and Yoshioka, Takuya and Wang, Chengyi and Liu, Shujie and Zhou, Ming},
  booktitle={ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={5749--5753},
  year={2021},
  organization={IEEE}
}

Code for the ICASSP-2021 paper: Continuous Speech Separation with Conformer.

Related tags

Overview

Continuous Speech Separation with Conformer

Introduction

Environment

Get Started

Citation

Owner

Sanyuan Chen (陈三元)

TiP-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling

Python KNN model: Predicting a probability of getting a work visa. Tableau: Non-immigrant visas over the years.

Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding

PyTorch implementation of the paper Dynamic Token Normalization Improves Vision Transfromers.

The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training

source code of Adversarial Feedback Loop Paper

LIMEcraft: Handcrafted superpixel selectionand inspection for Visual eXplanations

PyTorch implementation of NeurIPS 2021 paper: "CoFiNet: Reliable Coarse-to-fine Correspondences for Robust Point Cloud Registration"

Isaac Gym Reinforcement Learning Environments

ConformalLayers: A non-linear sequential neural network with associative layers

TensorFlow implementation of "TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?"

Learning Super-Features for Image Retrieval

Imbalanced Gradients: A Subtle Cause of Overestimated Adversarial Robustness

Object detection (YOLO) with pytorch, OpenCV and python

Conditional Gradients For The Approximately Vanishing Ideal

[2021][ICCV][FSNet] Full-Duplex Strategy for Video Object Segmentation

CLASP - Contrastive Language-Aminoacid Sequence Pretraining

AWS documentation corpus for zero-shot open-book question answering.

PyTorch version repo for CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

CoaT: Co-Scale Conv-Attentional Image Transformers