Auxiliary Raw Net (ARawNet) is a ASVSpoof detection model taking both raw waveform and handcrafted features as inputs, to balance the trade-off between performance and model complexity.

Last update: Jul 08, 2022

Related tags

Deep Learning AuxiliaryRawNet

Overview

This repository is an implementation of the Auxiliary Raw Net (ARawNet), which is ASVSpoof detection system taking both raw waveform and handcrafted features as inputs,to balance the trade-off between performance and model complexity. The paper can be checked here.

The model performance is tested on the ASVSpoof 2019 Dataset.

Setup

Environment

Show details

speechbrain==0.5.7
pandas
torch==1.9.1
torchaudio==0.9.1
nnAudio==0.2.6
ptflops==0.6.6

Create a conda environment with conda env create -f environment.yml.
Activate the conda environment with conda activate .

Data preprocessing

.
├── data                       
│   │
│   ├── PA                  
│   │   └── ...
│   └── LA           
│       ├── ASVspoof2019_LA_asv_protocols
│       ├── ASVspoof2019_LA_asv_scores
│       ├── ASVspoof2019_LA_cm_protocols
│       ├── ASVspoof2019_LA_train
│       ├── ASVspoof2019_LA_dev
│       
│
└── ARawNet

Download dataset. Our experiment is trained on the Logical access (LA) scenario of the ASVspoof 2019 dataset. Dataset can be downloaded here.
Unzip and save the data to a folder data in the same directory as ARawNet as shown in below.
Run python preprocess.py Or you can use our processed data directly under "/processed_data".

Train

python train_raw_net.py yaml/RawSNet.yaml --data_parallel_backend -data_parallel_count=2

Evaluate

python eval.py

Check Model Size and multiply-and-accumulates (MACs)

python check_model_size.py yaml/RawSNet.yaml

Model Performance

Accuracy metric

min t−DCF =min{βPcm (s)+Pcm(s)}

Explanations can be found here: t-DCF

Experiment Results

	Front-end	Main Encoder	E_A	EER	min-tDCF
Res2Net	Spec	Res2Net	-	8.783	0.2237
	LFCC		-	2.869	0.0786
	CQT		-	2.502	0.0743
Rawnet2	Raw waveforms	Rawnet2	-	5.13	0.1175
ARawNet	Mel-Spectrogram	XVector	✅	1.32	0.03894
			-	2.39320	0.06875
ARawNet	Mel-Spectrogram	ECAPA-TDNN	✅	1.39	0.04316
			-	2.11	0.06425
ARawNet	CQT	XVector	✅	1.74	0.05194
			-	3.39875	0.09510
ARawNet	CQT	ECAPA-TDNN	✅	1.11	0.03645
			-	1.72667	0.05077

Main Encoder	Auxiliary Encoder	Parameters	MACs
Rawnet2	-	25.43 M	7.61 GMac
Res2Net	-	0.92 M	1.11 GMac
XVector	✅	5.81 M	2.71 GMac
XVector	-	4.66M	1.88 GMac
ECAPA-TDNN	✅	7.18 M	3.19 GMac
ECAPA-TDNN	-	6.03M	2.36 GMac

Cite Our Paper

If you use this repository, please consider citing:

@inproceedings{Teng2021ComplementingHF, title={Complementing Handcrafted Features with Raw Waveform Using a Light-weight Auxiliary Model}, author={Zhongwei Teng and Quchen Fu and Jules White and M. Powell and Douglas C. Schmidt}, year={2021} }

@inproceedings{Fu2021FastAudioAL, title={FastAudio: A Learnable Audio Front-End for Spoof Speech Detection}, author={Quchen Fu and Zhongwei Teng and Jules White and M. Powell and Douglas C. Schmidt}, year={2021} }

Auxiliary Raw Net (ARawNet) is a ASVSpoof detection model taking both raw waveform and handcrafted features as inputs, to balance the trade-off between performance and model complexity.

Related tags

Overview

Overview

Setup

Environment

Data preprocessing

Train

Evaluate

Check Model Size and multiply-and-accumulates (MACs)

Model Performance

Accuracy metric

Experiment Results

Cite Our Paper

Owner

Alleviating Over-segmentation Errors by Detecting Action Boundaries

Record radiologists' eye gaze when they are labeling images.

You are AllSet: A Multiset Function Framework for Hypergraph Neural Networks.

Pytorch Implementation for Dilated Continuous Random Field

A curated list of long-tailed recognition resources.

Pytorch implementation of PTNet for high-resolution and longitudinal infant MRI synthesis

Generalized Decision Transformer for Offline Hindsight Information Matching

A PyTorch Implementation of the Luna: Linear Unified Nested Attention

A JAX implementation of Broaden Your Views for Self-Supervised Video Learning, or BraVe for short.

A computational block to solve entity alignment over textual attributes in a knowledge graph creation pipeline.

Code, final versions, and information on the Sparkfun Graphical Datasheets

DeepLabv3+：Encoder-Decoder with Atrous Separable Convolution语义分割模型在tensorflow2当中的实现

Greedy Gaussian Segmentation

JFB: Jacobian-Free Backpropagation for Implicit Models

An end-to-end machine learning library to directly optimize AUC loss

CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation

Source code for the paper "PLOME: Pre-training with Misspelled Knowledge for Chinese Spelling Correction" in ACL2021

A no-BS, dead-simple training visualizer for tf-keras

HuSpaCy: industrial-strength Hungarian natural language processing