In this repository, I have developed an end to end Automatic speech recognition project. I have developed the neural network model for automatic speech recognition with PyTorch and used MLflow to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry.

Overview

End to End Automatic Speech Recognition

architecture

In this repository, I have developed an end to end Automatic speech recognition project. I have developed the neural network model for automatic speech recognition with PyTorch and used MLflow to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry. The Neural Acoustic model is built with reference to the DeepSpeech2 model, but not the exact DeepSpeach2 model or the DeepSpeech model as mentioned in their respective research papers.

Technologies used:

  1. MLflow.mlflow
    • to manage the ML lifecycle.
    • to track and compare model performance in the ml lifecyle.
    • experimentation, reproducibility, deployment, and a central model registry.
  2. Pytorch.pytorch
    • The Acoustic Neural Network is implemented with pytorch.
    • torchaudio for feature extraction and data pre-processing.

Speech Recognition Pipeline

architecture1

Dataset

In this project, the LibriSpeech dataset has been used to train and validate the model. It has audio data for input and text speech for the respective audio to be predicted by our model. Also, I have used a subset of 2000 files from the training and test set of the LibriSpeech dataset for faster training and validation over limited GPU power and usage limit.

Pre-Processing

In this process Torchaudio has been used to extract the waveform and sampling rate from the audiofile. Then have been used MFCC(Mel-frequency cepstrum coefficients) for feature extraction from the waveform. MelSpectogram, Spectogram and Frequency Masking could also be used in this case for feature exxtraction.

Acoustic Model architecture.

The Neural Network architecture consist of Residul-CNN blocks, BidirectionalGRU blocks, and fully connected Linear layers for final classification. From the input layer, we have two Residual CNN blocks(in sequential) with batch normalization, followed by a fully connected layer, hence connecting it to three bi-directional GRU blocks(in sequential) and finally fully connected linear layers for classification.

CTC(Connectionist Temporal Classification) Loss as the base loss function for our model and AdamW as the optimizer.

Decoding

We have used Greedy Decoder which argmax's the output of the Neural Network and transforms it into text through character mapping.

ML Lifecycle Pipeline

architecture2
We start by initializing the mlflow server where we need to specify backend storage, artifact uri, host and the port. Then we create an experiment, start a run within the experiment which inturn tracks the training and validation loss. Then we save the model followed by registring it and further use the registered model for deployment over the production.

Implementation

First we need to initialize the mlflow server.

mlflow run -e server . 

To start the server in a non-conda environmnet

mlflow run -e server . --no-conda

the server could also be initialized directly from the terminal by the following command. But for this the tracking uri need to be set manually.

mlflow server \
--backend-store-uri sqlite:///mlflow.db \
--default-artifact-root ./mlruns \
--host 127.0.0.1

Then we need to start the model training.

mlflow run -e train --experiment-name "SpeechRecognition" . -P epoch=20 -P batch=32

To train in non-conda environment.

mlflow run -e train --experiment-name "SpeechRecognition" . -P epoch=20 -P batch=32 --no-conda

To train the model through python command.

python main.py --epoch=20 --batch=20

This command functions the same as the above mlflow commands. It's just that I was facing some issues or bugs while running with mlflow command which worked prefectly fine while running with the python command.

Trained model performance

trainloss
testloss
lr
wer
cer
Now its time to validate the registered model. Enter the registered model name with respective model stage and version and file_id of the LibriSpeech dataset Test file.

mlflow run -e validate . -P train=False -P registered_model=SpeechRecognitionModel -P model_stage=Production file_id=1089-134686-0000
python main.py --train=False --registered_model=SpeechRecognitionModel --model_stage=Production --file_id=1089-134686-0000

Dashboard

dashboard

Registered model

regmodel

Artifacts

artifacts

Results

testresult

Target: she spoke with a sudden energy which partook of fear and passion and flushed her thin cheek and made her languid eyes flash
Predicted: she spot with a sudn inderge which pert huopk obeer an pasion amd hust her sting cheek and mad herlang wld ise flush
Target: we look for that reward which eye hath not seen nor ear heard neither hath entered into the heart of man
Predicted: we look forthat rewrd which i havt notse mor iear herd meter hat entere incs the hard oftmon
Target: there was a grim smile of amusement on his shrewd face
Predicted: there was a grim smiriel of a mise men puisoreud face
Target: if this matter is not to become public we must give ourselves certain powers and resolve ourselves into a small private court martial
Predicted: if this motere is not to mecome pubotk we mestgoeourselv certan pouors and resal orselveent a srmall pribut court nmatheld
Taarget: no good my dear watson
Predicted: no good my deare otsen 
Target: well she was better though she had had a bad night
Predicted: all she ws bhatter thu shu oid hahabaut night 
Target: the air is heavy the sea is calm
Predicted: the ar is haavyd the see is coomd 
Target: i left you on a continent and here i have the honor of finding you on an island
Predicted: i left you n a contonent and herei hafe the aner a find de youw on an ihalnd 
Target: the young man is in bondage and much i fear his death is decreed
Predicted: th young manis an bondage end much iffeer his dethis de creed 
Target: hay fever a heart trouble caused by falling in love with a grass widow
Predicted: hay fever ahar trbrl cawaese buy fallling itlelov wit the gressh wideo
Target: bravely and generously has he battled in my behalf and this and more will i dare in his service
Predicted: bravly ansjenereusly has he btaoled and miy ba hah andthis en morera welig darind his serves 

Future Scopes

  • There are other Neural Network models like Wav2Vec, Jasper which also be used and tested against for better model performance.
  • This is not a real-time automatic speech recognition project, where human speech would be decoded to text in real-time like in Amazon Alexa and Google Assistant. It takes the audio file as input and returns predicted speech. So, this could be taken to further limits by developing it into real-time automatic speech recognition.
  • The entire project has been done for local deployment. For the Productionisation of the model and datasets AWS s3 bucket and Microsoft Azure could be used, Kubernetes would also serve as a better option for the Productionisation of the model.
Owner
Victor Basu
Hello! I am Data Scientist and I love to do research on Data Science and Machine Learning
Victor Basu
⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x using fastT5.

Reduce T5 model size by 3X and increase the inference speed up to 5X. Install Usage Details Functionalities Benchmarks Onnx model Quantized onnx model

Kiran R 399 Jan 05, 2023
BookNLP, a natural language processing pipeline for books

BookNLP BookNLP is a natural language processing pipeline that scales to books and other long documents (in English), including: Part-of-speech taggin

654 Jan 02, 2023
Transformer Based Korean Sentence Spacing Corrector

TKOrrector Transformer Based Korean Sentence Spacing Corrector License Summary This solution is made available under Apache 2 license. See the LICENSE

Paul Hyung Yuel Kim 3 Apr 18, 2022
Python generation script for BitBirds

BitBirds generation script Intro This is published under MIT license, which means you can do whatever you want with it - entirely at your own risk. Pl

286 Dec 06, 2022
Various Algorithms for Short Text Mining

Short Text Mining in Python Introduction This package shorttext is a Python package that facilitates supervised and unsupervised learning for short te

Kwan-Yuet 466 Dec 06, 2022
STT for TorchScript is a port of Coqui STT based on DeepSpeech to PyTorch.

st3 STT for TorchScript is a port of Coqui STT based on DeepSpeech to PyTorch. Currently it supports converting pbmm models to pt scripts with integra

Vlad Ki 8 Oct 18, 2021
Machine learning models from Singapore's NLP research community

SG-NLP Machine learning models from Singapore's natural language processing (NLP) research community. sgnlp is a Python package that allows you to eas

AI Singapore | AI Makerspace 21 Dec 17, 2022
Open source code for AlphaFold.

AlphaFold This package provides an implementation of the inference pipeline of AlphaFold v2.0. This is a completely new model that was entered in CASP

DeepMind 9.7k Jan 02, 2023
GCRC: A Gaokao Chinese Reading Comprehension dataset for interpretable Evaluation

GCRC GCRC: A New Challenging MRC Dataset from Gaokao Chinese for Explainable Eva

Yunxiao Zhao 5 Nov 04, 2022
超轻量级bert的pytorch版本,大量中文注释,容易修改结构,持续更新

bert4pytorch 2021年8月27更新: 感谢大家的star,最近有小伙伴反映了一些小的bug,我也注意到了,奈何这个月工作上实在太忙,更新不及时,大约会在9月中旬集中更新一个只需要pip一下就完全可用的版本,然后会新添加一些关键注释。 再增加对抗训练的内容,更新一个完整的finetune

muqiu 317 Dec 18, 2022
Deep Learning for Natural Language Processing - Lectures 2021

This repository contains slides for the course "20-00-0947: Deep Learning for Natural Language Processing" (Technical University of Darmstadt, Summer term 2021).

0 Feb 21, 2022
Crowd sourced training data for Rasa NLU models

NLU Training Data Crowd-sourced training data for the development and testing of Rasa NLU models. If you're interested in grabbing some data feel free

Rasa 169 Dec 26, 2022
📔️ Generate a text-based journal from a template file.

JGen 📔️ Generate a text-based journal from a template file. Contents Getting Started Example Overview Usage Details Reserved Keywords Gotchas Getting

Harrison Broadbent 21 Sep 25, 2022
All the code I wrote for Overwatch-related projects that I still own the rights to.

overwatch_shit.zip This is (eventually) going to contain all the software I wrote during my five-year imprisonment stay playing Overwatch. I'll be add

zkxjzmswkwl 2 Dec 31, 2021
PUA Programming Language written in Python.

pua-lang PUA Programming Language written in Python. Installation git clone https://github.com/zhaoyang97/pua-lang.git cd pua-lang pip install . Try

zy 4 Feb 19, 2022
Auto-researching tool generating word documents.

About ResearchTE automates researching by generating document with answers to given questions. Supports getting results from: Google DuckDuckGo (with

1 Feb 14, 2022
Some embedding layer implementation using ivy library

ivy-manual-embeddings Some embedding layer implementation using ivy library. Just for fun. It is based on NYCTaxiFare dataset from kaggle (cut down to

Ishtiaq Hussain 2 Feb 10, 2022
⛵️The official PyTorch implementation for "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing" (EMNLP 2020).

BERT-of-Theseus Code for paper "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing". BERT-of-Theseus is a new compressed BERT by progre

Kevin Canwen Xu 284 Nov 25, 2022
A python framework to transform natural language questions to queries in a database query language.

__ _ _ _ ___ _ __ _ _ / _` | | | |/ _ \ '_ \| | | | | (_| | |_| | __/ |_) | |_| | \__, |\__,_|\___| .__/ \__, | |_| |_| |___/

Machinalis 1.2k Dec 18, 2022
Maix Speech AI lib, including ASR, chat, TTS etc.

Maix-Speech 中文 | English Brief Now only support Chinese, See 中文 Build Clone code by: git clone https://github.com/sipeed/Maix-Speech Compile x86x64 c

Sipeed 267 Dec 25, 2022