Scribosermo STT Setup

Scribosermo is a LGPL licensed, open-source speech recognition engine to "Train fast Speech-to-Text networks in different languages".

Evaluation tests for German language suggest that it's currently one of the fastest and most accurate open-source STT systems.

This repository trys to offer build scripts to run and test Scribosermo on different platforms focussing on Raspberry Pi SBC. Ultimately the goal is to build a module for SEPIA STT-Server.

Test Scribosermo

The easiest way to get started is to build and use the Docker container:

Use the scripts inside the build folder. Tested on aarch64 and amd64 platforms.
Download a model. Check tests folder for more info and licenses.
Put the model inside a folder and share this folder with your Docker container, e.g. use a run flag similar to: -v my/model/folder:/home/admin/scribosermo-stt-setup/tests/model
Run the container. It will automatically call the Python test script testing_tflite.py.
NOTE: The Python test script is currently configured to use German. You may need to modify it if you change the model or language.

Build wheels on Debian 10 (the long way)

If you can't find matching Python wheel files for your build this might help to fill the missing parts:

Install required packages: apt-get update && apt-get install -y --no-install-recommends sudo git wget curl nano unzip zip procps build-essential cmake python3-pip python3-dev python3-setuptools python3-wheel python3-venv libsndfile1
Install Rust compiler (might be required): curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh and refresh terminal source $HOME/.cargo/env
Create and activate Python virtual env: mkdir -p install && cd install && python3 -m venv env && source env/bin/activate
Make sure pip is updated (tested v21.3.1): pip3 install --upgrade pip
Install part 1: pip3 install wheel setuptools setuptools_rust transformers tqdm librosa datasets jiwer
Install part 2: pip3 install --extra-index-url https://google-coral.github.io/py-repo/ tflite_runtime
Install part 3: pip3 install ds-ctcdecoder==0.10.0a3;
Create wheels as needed: pip3 wheel [package]

Credits

DanBmh - Development and maintaining of Scribosermo
Domcross - German STT evaluation, scripts and packages
SEPIA Framework - Open assistant and STT server stuff

Athena is an open-source implementation of end-to-end speech processing engine.

Athena is an open-source implementation of end-to-end speech processing engine. Our vision is to empower both industrial application and academic research on end-to-end models for speech processing. To make speech processing available to everyone, we're also releasing example implementation and recipe on some opensource dataset for various tasks (Automatic Speech Recognition, Speech Synthesis, Voice Conversion, Speaker Recognition, etc).

34 Sep 8, 2022

This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

Proteno This is the data release associated with the corresponding NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deploymen

37 Dec 4, 2022

Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration This repo contains only model Implementation of Zero-Shot Text-to-Speech for Text

33 Sep 22, 2022

glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end.

Glow-Speak glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end. Installation git clone https://g

8 Dec 25, 2022

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

⚠️ Checkout develop branch to see what is coming in pyannote.audio 2.0: a much smaller and cleaner codebase Python-first API (the good old pyannote-au

2.2k Jan 9, 2023

Speech Recognition for Uyghur using Speech transformer

Speech Recognition for Uyghur using Speech transformer Training: this model using CTC loss and Cross Entropy loss for training. Download pretrained mo

11 Nov 17, 2022

Text-Summarization-using-NLP - Text Summarization using NLP to fetch BBC News Article and summarize its text and also it includes custom article Summarization

Text-Summarization-using-NLP Text Summarization using NLP to fetch BBC News Arti

21 Aug 6, 2022

Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

TextBlob: Simplified Text Processing Homepage: https://textblob.readthedocs.io/ TextBlob is a Python (2 and 3) library for processing textual data. It

8.4k Dec 26, 2022

7.5k Feb 17, 2021

Installation, test and evaluation of Scribosermo speech-to-text engine

Related tags

Overview

Scribosermo STT Setup

Test Scribosermo

Build wheels on Debian 10 (the long way)

Credits

You might also like...

Athena is an open-source implementation of end-to-end speech processing engine.

This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end.

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Speech Recognition for Uyghur using Speech transformer

Text-Summarization-using-NLP - Text Summarization using NLP to fetch BBC News Article and summarize its text and also it includes custom article Summarization

Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

Releases(v0.0.1)

v0.0.1(Oct 31, 2021)

Required files/pre-built libraries to install Scribosermo

Owner

Florian Quirin

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

vits chinese, tts chinese, tts mandarin

LCG T-TEST USING EUCLIDEAN METHOD

A library that integrates huggingface transformers with the world of fastai, giving fastai devs everything they need to train, evaluate, and deploy transformer specific models.

🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

This is a simple item2vec implementation using gensim for recbole

Tools for curating biomedical training data for large-scale language modeling

Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料，该资料目前包含 自然语言处理各领域的 面试题积累。

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.

⚡ Automatically decrypt encryptions without knowing the key or cipher, decode encodings, and crack hashes ⚡

Yet Another Sequence Encoder - Encode sequences to vector of vector in python !

The source code of "Language Models are Few-shot Multilingual Learners" (MRL @ EMNLP 2021)

A music comments dataset, containing 39,051 comments for 27,384 songs.

Fuzzy String Matching in Python

A python package to fine-tune transformer-based models for named entity recognition (NER).

Opal-lang - A WIP programming language based on Python

NLP-Project - Used an API to scrape 2000 reddit posts, then used NLP analysis and created a classification model to mixed succcess

本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料，该资料目前包含自然语言处理各领域的面试题积累。