MicRank is a Learning to Rank neural channel selection framework where a DNN is trained to rank microphone channels.

Last update: Nov 10, 2022

Overview

MicRank: Learning to Rank Microphones for Distant Speech Recognition

Application Scenario

Many applications nowadays envision the presence of multiple heterogeneous recording devices (e.g. Microsoft Project Denmark, CHiME-5, CHiME-6 and Voices from a Distance Challenges, DIRHA project et cetera).

Audio signals captured by different microphones can be suitably combined at front-end level by using beamforming techniques. However this combination could be very challenging as in an ad-hoc microphone network microphones can be very far from each other. Moreover some could be close to noise sources or, for a particular utterance, too far from the speaker to be of any usefulness and, to further complicate things, synchronization issues may appear.

An intriguing approach could be to select only the best microphone for each utterance or instead to select only a promising subset of microphones for beamforming or ROVER combination, thus potentially saving resources and/or improving results by excluding "bad" channels. This can be performed by suitable automatic Channel Selection or Channel Ranking algorithms.

What is MicRank

MicRank is a Learning to Rank neural channel selection framework where a DNN is trained to rank microphone channels based on ASR-backend performance or any other metric/back-end task (e.g. STOI if one wishes to rank microphones based on speech intelligibility et cetera).

It is agnostic with respect to the array geometry and type of recognition back-end and it does not require sample-level synchronization between devices.

Remarkably, it is able to considerably improve over previous selection techniques, reaching comparable and in some instances better performance than oracle signal-based measures like PESQ, STOI or SDR. This is achieved with a very small model with only 266k learnable parameters, making this method much more computationally efficient than decoder or posterior based channel selection methods.

LibriAdHoc Synthetic Dataset Recipe

Coming Soon

citing MicRank

If this code has been useful, use this:

@misc{cornell2021learning,
      title={Learning to Rank Microphones for Distant Speech Recognition}, 
      author={Samuele Cornell and Alessio Brutti and Marco Matassoni and Stefano Squartini},
      year={2021},
      eprint={2104.02819},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}

MicRank is a Learning to Rank neural channel selection framework where a DNN is trained to rank microphone channels.

Related tags

Overview

MicRank: Learning to Rank Microphones for Distant Speech Recognition

Application Scenario

What is MicRank

LibriAdHoc Synthetic Dataset Recipe

citing MicRank

Owner

Samuele Cornell

ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis

Continual World is a benchmark for continual reinforcement learning

Pytorch Implementation of Value Retrieval with Arbitrary Queries for Form-like Documents.

VISNOTATE: An Opensource tool for Gaze-based Annotation of WSI Data

PyTorch implementation of CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition

Implementation of the method described in the Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

Single-Stage Instance Shadow Detection with Bidirectional Relation Learning (CVPR 2021 Oral)

Pytorch implementation of the paper SPICE: Semantic Pseudo-labeling for Image Clustering

HashNeRF-pytorch - Pure PyTorch Implementation of NVIDIA paper on Instant Training of Neural Graphics primitives

Block-wisely Supervised Neural Architecture Search with Knowledge Distillation (CVPR 2020)

pytorchのスライス代入操作をonnxに変換する際にScatterNDならないようにするサンプル

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

[ArXiv 2021] One-Shot Generative Domain Adaptation

An efficient and effective learning to rank algorithm by mining information across ranking candidates. This repository contains the tensorflow implementation of SERank model. The code is developed based on TF-Ranking.

Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals.

Chinese clinical named entity recognition using pre-trained BERT model

Nested cross-validation is necessary to avoid biased model performance in embedded feature selection in high-dimensional data with tiny sample sizes

Neighborhood Reconstructing Autoencoders

InsightFace: 2D and 3D Face Analysis Project on MXNet and PyTorch

This repository contains the entire code for our work "Two-Timescale End-to-End Learning for Channel Acquisition and Hybrid Precoding"