TalkNet: Audio-visual active speaker detection Model

Overview

Is someone talking? TalkNet: Audio-visual active speaker detection Model

This repository contains the code for our ACM MM 2021 paper, TalkNet, an active speaker detection model to detect 'whether the face in the screen is speaking or not?'. [Paper] [Video_English] [Video_Chinese].

overall.png

  • Awesome ASD: Papers about active speaker detection in last years.

  • TalkNet in AVA-Activespeaker dataset: The code to preprocess the AVA-ActiveSpeaker dataset, train TalkNet in AVA train set and evaluate it in AVA val/test set.

  • TalkNet in TalkSet and Columbia ASD dataset: The code to generate TalkSet, an ASD dataset in the wild, based on VoxCeleb2 and LRS3, train TalkNet in TalkSet and evaluate it in Columnbia ASD dataset.

  • An ASD Demo with pretrained TalkNet model: An end-to-end script to detect and mark the speaking face by the pretrained TalkNet model.


Dependencies

Start from building the environment

conda create -n TalkNet python=3.7.9 anaconda
conda activate TalkNet
pip install -r requirement.txt

Start from the existing environment

pip install -r requirement.txt

TalkNet in AVA-Activespeaker dataset

Data preparation

The following script can be used to download and prepare the AVA dataset for training.

python trainTalkNet.py --dataPathAVA AVADataPath --download 

AVADataPath is the folder you want to save the AVA dataset and its preprocessing outputs, the details can be found in here . Please read them carefully.

Training

Then you can train TalkNet in AVA end-to-end by using:

python trainTalkNet.py --dataPathAVA AVADataPath

exps/exps1/score.txt: output score file, exps/exp1/model/model_00xx.model: trained model, exps/exps1/val_res.csv: prediction for val set.

Pretrained model

Our pretrained model performs mAP: 92.3 in validation set, you can check it by using:

python trainTalkNet.py --dataPathAVA AVADataPath --evaluation

The pretrained model will automaticly be downloaded into TalkNet_ASD/pretrain_AVA.model. It performs mAP: 90.8 in the testing set.


TalkNet in TalkSet and Columbia ASD dataset

Data preparation

We find that it is challenge to apply the model we trained in AVA for the videos not in AVA (Reason is here, Q1). So we build TalkSet, an active speaker detection dataset in the wild, based on VoxCeleb2 and LRS3.

We do not plan to upload this dataset since we just modify it, instead of building it. In TalkSet folder we provide these .txt files to describe which files we used to generate the TalkSet and their ASD labels. You can generate this TalkSet if you are interested to train an ASD model in the wild.

Also, we have provided our pretrained TalkNet model in TalkSet. You can evaluate it in Columbia ASD dataset or other raw videos in the wild.

Usage

A pretrain model in TalkSet will be download into TalkNet_ASD/pretrain_TalkSet.model when using the following script:

python demoTalkNet.py --evalCol --colSavePath colDataPath

Also, Columnbia ASD dataset and the labels will be downloaded into colDataPath. Finally you can get the following F1 result.

Name Bell Boll Lieb Long Sick Avg.
F1 98.1 88.8 98.7 98.0 97.7 96.3

(This result is different from that in our paper because we train the model again, while the avg. F1 is very similar)


An ASD Demo with pretrained TalkNet model

Data preparation

We build an end-to-end script to detect and extract the active speaker from the raw video by our pretrain model in TalkSet.

You can put the raw video (.mp4 and .avi are both fine) into the demo folder, such as 001.mp4.

Usage

python demoTalkNet.py --videoName 001

A pretrain model in TalkSet will be downloaded into TalkNet_ASD/pretrain_TalkSet.model. The structure of the output reults can be found in here.

You can get the output video demo/001/pyavi/video_out.avi, which has marked the active speaker by green box and non-active speaker by red box.


Citation

Please cite the following if our paper or code is helpful to your research.

@article{tao2021TalkNet,
  title={Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection},
  author={Ruijie Tao, Zexu Pan, Rohan Kumar Das, Xinyuan Qian, Mike Zheng Shou, Haizhou Li},
  journal={ACM Multimedia (MM)},
  year={2021}
}

I have summaried some potential FAQs. This is my first open-source work, please let me know if I can future improve in this repositories. Thanks for your support!

Owner
NUS ECE PhD student
RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2

RoNER RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2. It is meant to be an easy to use, hi

Stefan Dumitrescu 9 Nov 07, 2022
The projects lets you extract glossary words and their definitions from a given piece of text automatically using NLP techniques

Unsupervised technique to Glossary and Definition Extraction Code Files GPT2-DefinitionModel.ipynb - GPT-2 model for definition generation. Data_Gener

Prakhar Mishra 28 May 25, 2021
An end to end ASR Transformer model training repo

END TO END ASR TRANSFORMER 本项目基于transformer 6*encoder+6*decoder的基本结构构造的端到端的语音识别系统 Model Instructions 1.数据准备: 自行下载数据,遵循文件结构如下: ├── data │ ├── train │

旷视天元 MegEngine 10 Jul 19, 2022
A retro text-to-speech bot for Discord

hawking A retro text-to-speech bot for Discord, designed to work with all of the stuff you might've seen in Moonbase Alpha, using the existing command

Nick Schorr 23 Dec 25, 2022
GPT-3 command line interaction

Writer_unblock Straight-forward command line interfacing with GPT-3. Finding yourself stuck at a conceptual stage? Spinning your wheels needlessly on

Seth Nuzum 6 Feb 10, 2022
Lattice methods in TensorFlow

TensorFlow Lattice TensorFlow Lattice is a library that implements constrained and interpretable lattice based models. It is an implementation of Mono

504 Dec 20, 2022
Auto translate textbox from Japanese to English or Indonesia

priconne-auto-translate Auto translate textbox from Japanese to English or Indonesia How to use Install python first, Anaconda is recommended Install

Aji Priyo Wibowo 5 Aug 25, 2022
基于Transformer的单模型、多尺度的VAE模型

UniVAE 基于Transformer的单模型、多尺度的VAE模型 介绍 https://kexue.fm/archives/8475 依赖 需要大于0.10.6版本的bert4keras(当前还没有推到pypi上,可以直接从GitHub上clone最新版)。 引用 @misc{univae,

苏剑林(Jianlin Su) 49 Aug 24, 2022
Simple Python script to scrape youtube channles of "Parity Technologies and Web3 Foundation" and translate them to well-known braille language or any language

Simple Python script to scrape youtube channles of "Parity Technologies and Web3 Foundation" and translate them to well-known braille language or any

Little Endian 1 Apr 28, 2022
KakaoBrain KoGPT (Korean Generative Pre-trained Transformer)

KoGPT KoGPT (Korean Generative Pre-trained Transformer) https://github.com/kakaobrain/kogpt https://huggingface.co/kakaobrain/kogpt Model Descriptions

Kakao Brain 797 Dec 26, 2022
NLP-based analysis of poor Chinese movie reviews on Douban

douban_embedding 豆瓣中文影评差评分析 1. NLP NLP(Natural Language Processing)是指自然语言处理,他的目的是让计算机可以听懂人话。 下面是我将2万条豆瓣影评训练之后,随意输入一段新影评交给神经网络,最终AI推断出的结果。 "很好,演技不错

3 Apr 15, 2022
Bot to connect a real Telegram user, simulating responses with OpenAI's davinci GPT-3 model.

AI-BOT Bot to connect a real Telegram user, simulating responses with OpenAI's davinci GPT-3 model.

Thempra 2 Dec 21, 2022
SummerTime - Text Summarization Toolkit for Non-experts

A library to help users choose appropriate summarization tools based on their specific tasks or needs. Includes models, evaluation metrics, and datasets.

Yale-LILY 213 Jan 04, 2023
Negative sampling for solving the unlabeled entity problem in NER. ICLR-2021 paper: Empirical Analysis of Unlabeled Entity Problem in Named Entity Recognition.

Negative Sampling for NER Unlabeled entity problem is prevalent in many NER scenarios (e.g., weakly supervised NER). Our paper in ICLR-2021 proposes u

Yangming Li 128 Dec 29, 2022
Lumped-element impedance calculator and frequency-domain plotter.

fastZ: Lumped-Element Impedance Calculator fastZ is a small tool for calculating and visualizing electrical impedance in Python. Features include: Sup

Wesley Hileman 47 Nov 18, 2022
Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents

Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents [Project Page] [Paper] [Video] Wenlong Huang1, Pieter Abbee

Wenlong Huang 114 Dec 29, 2022
NeuTex: Neural Texture Mapping for Volumetric Neural Rendering

NeuTex: Neural Texture Mapping for Volumetric Neural Rendering Paper: https://arxiv.org/abs/2103.00762 Running Run on the provided DTU scene cd run ba

Fanbo Xiang 68 Jan 06, 2023
ElasticBERT: A pre-trained model with multi-exit transformer architecture.

This repository contains finetuning code and checkpoints for ElasticBERT. Towards Efficient NLP: A Standard Evaluation and A Strong Baseli

fastNLP 48 Dec 14, 2022
Simple bots or Simbots is a library designed to create simple bots using the power of python. This library utilises Intent, Entity, Relation and Context model to create bots .

Simple bots or Simbots is a library designed to create simple chat bots using the power of python. This library utilises Intent, Entity, Relation and

14 Dec 15, 2021
PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

StyleSpeech - PyTorch Implementation PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation. Status (2021.06.09

Keon Lee 142 Jan 06, 2023