This repo contains simple to use, pretrained/training-less models for speaker diarization.

Last update: Jan 20, 2022

Related tags

Text Data & NLP pydiar

Overview

PyDiar

This repo contains simple to use, pretrained/training-less models for speaker diarization.

Supported Models

Binary Key Speaker Modeling

Based on pyBK by Jose Patino which implements the diarization system from "The EURECOM submission to the first DIHARD Challenge" by Patino, Jose and Delgado, Héctor and Evans, Nicholas

If you have any other models you would like to see added, please open an issue.

Usage

This library seeks to provide a very basic interface. To use the Binary Key model on a file, do something like this:

import numpy as np
from pydiar.models import BinaryKeyDiarizationModel, Segment
from pydiar.util.misc import optimize_segments
from pydub import AudioSegment

INPUT_FILE = "test.wav"

sample_rate = 32000
audio = AudioSegment.from_wav(test.wav)
audio = audio.set_frame_rate(sample_rate)
audio = audio.set_channels(1)

diarization_model = BinaryKeyDiarizationModel()
segments = diarization_model.diarize(
    sample_rate, np.array(audio.get_array_of_samples())
)
optimized_segments = optimize_segments(segments)

Now optimized_segments contains a list of segments with their start, length and speaker id

Example

A simple script which reads an audio file, diarizes it and transcribes it into the WebVTT format can be found in examples/generate_webvtt.py. To use it, download a vosk model from https://alphacephei.com/vosk/models and then run the script using

poetry install
poetry run python -m examples.generate_webvtt -i PATH/TO/INPUT.wav -m PATH/TO/VOSK_MODEL

This repo contains simple to use, pretrained/training-less models for speaker diarization.

Related tags

Overview

PyDiar

Supported Models

Usage

Example

Owner

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

Switch spaces for knowledge graph embeddings

NVDA, the free and open source Screen Reader for Microsoft Windows

Contains analysis of trends from Fitbit Dataset (source: Kaggle) to see how the trends can be applied to Bellabeat customers and Bellabeat products

🏆 • 5050 most frequent words in 109 languages

🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

Python interface for converting Penn Treebank trees to Stanford Dependencies and Universal Depenencies

Two-stage text summarization with BERT and BART

This is a GUI program that will generate a word search puzzle image

硕士期间自学的NLP子任务，供学习参考

ZUNIT - Toward Zero-Shot Unsupervised Image-to-Image Translation

Code for the paper "Are Sixteen Heads Really Better than One?"

Yomichad - a Japanese pop-up dictionary that can display readings and English definitions of Japanese words

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL 2021.

nlpcommon is a python Open Source Toolkit for text classification.

⛵️The official PyTorch implementation for "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing" (EMNLP 2020).

Meta learning algorithms to train cross-lingual NLI (multi-task) models

NLP: SLU tagging

A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN

Baseline code for Korean open domain question answering(ODQA)