Common Voice Dataset explorer

Last update: Nov 16, 2022

Related tags

Text Data & NLP common-voice-explorer

Overview

Common Voice Dataset Explorer

Common Voice Dataset is by Mozilla

Made during huggingface finetuning week

Usage

pip install -r requirements.txt

streamlit run common_voice.py

Details

Made using streamlit
Using https://github.com/PablocFonseca/streamlit-aggrid for interactivity, because you can't click plots yet.

I tried to put this together as quickly as I can, so it is not perfect.

Open a PR or issue~

Owner

Ceyda Cinarel

AI researcher & engineer~ ♥ all things NLP 🤖 generative models ★ like trying out new libraries & tools ♥ Python

GitHub Repository

Using context-free grammar formalism to parse English sentences to determine their structure to help computer to better understand the meaning of the sentence.

Sentance Parser Executing the Program Make sure Python 3.6+ is installed. Install requirements $ pip install requirements.txt Run the program:

12 Sep 28, 2022

2021 2학기 데이터크롤링 기말프로젝트

공지 주제 웹 크롤링을 이용한 취업 공고 스케줄러 스케줄 주제 정하기 코딩하기 핵심 코드 설명 + 피피티 구조 구상 // 12/4 토 피피티 + 스크립트(대본) 제작 + 녹화 // ~ 12/10 ~ 12/11 금~토 영상 편집 // ~12/11 토 웹크롤러 사람인_평균

2 Aug 16, 2022

This repository details the steps in creating a Part of Speech tagger using Trigram Hidden Markov Models and the Viterbi Algorithm without using external libraries.

POS-Tagger This repository details the creation of a Part-of-Speech tagger using Trigram Hidden Markov Models to predict word tags in a word sequence.

1 Dec 09, 2021

KLUE-baseline contains the baseline code for the Korean Language Understanding Evaluation (KLUE) benchmark.

KLUE Baseline Korean(한국어) KLUE-baseline contains the baseline code for the Korean Language Understanding Evaluation (KLUE) benchmark. See our paper fo

74 Dec 13, 2022

A sample project that exists for PyPUG's "Tutorial on Packaging and Distributing Projects"

A sample Python project A sample project that exists as an aid to the Python Packaging User Guide's Tutorial on Packaging and Distributing Projects. T

4.5k Dec 30, 2022

SinglepassTextCluster, an TextCluster tools based on Singlepass cluster algorithm that use tfidf vector and doc2vec，which can be used for individual real-time corpus cluster task。基于single-pass算法思想的自动文本聚类小组件，内置tfidf和doc2vec两种文本向量方法，可自动输出聚类数目、类簇文档集合和簇类大小，用于自有实时数据的聚类任务。

项目的背景 SinglepassTextCluster, an TextCluster tool based on Singlepass cluster algorithm that use tfidf vector and doc2vec，which can be used for individ

34 Dec 18, 2022

Healthsea is a spaCy pipeline for analyzing user reviews of supplementary products for their effects on health.

Welcome to Healthsea ✨ Create better access to health with spaCy. Healthsea is a pipeline for analyzing user reviews to supplement products by extract

75 Dec 19, 2022

Code for the paper TestRank: Bringing Order into Unlabeled Test Instances for Deep Learning Tasks

TestRank in Pytorch Code for the paper TestRank: Bringing Order into Unlabeled Test Instances for Deep Learning Tasks by Yu Li, Min Li, Qiuxia Lai, Ya

3 May 19, 2022

Under the hood working of transformers, fine-tuning GPT-3 models, DeBERTa, vision models, and the start of Metaverse, using a variety of NLP platforms: Hugging Face, OpenAI API, Trax, and AllenNLP

150 Dec 23, 2022

Common Voice Dataset explorer

Related tags

Overview

Common Voice Dataset Explorer

Usage

Details

Owner

Ceyda Cinarel

Using context-free grammar formalism to parse English sentences to determine their structure to help computer to better understand the meaning of the sentence.

2021 2학기 데이터크롤링 기말프로젝트

This repository details the steps in creating a Part of Speech tagger using Trigram Hidden Markov Models and the Viterbi Algorithm without using external libraries.

KLUE-baseline contains the baseline code for the Korean Language Understanding Evaluation (KLUE) benchmark.

A sample project that exists for PyPUG's "Tutorial on Packaging and Distributing Projects"

Healthsea is a spaCy pipeline for analyzing user reviews of supplementary products for their effects on health.

Code for the paper TestRank: Bringing Order into Unlabeled Test Instances for Deep Learning Tasks

Under the hood working of transformers, fine-tuning GPT-3 models, DeBERTa, vision models, and the start of Metaverse, using a variety of NLP platforms: Hugging Face, OpenAI API, Trax, and AllenNLP

SDL: Synthetic Document Layout dataset

Pipeline for fast building text classification TF-IDF + LogReg baselines.

MicBot - MicBot uses Google Translate to speak everyone's chat messages

A library for Multilingual Unsupervised or Supervised word Embeddings

GSoC'2021 | TensorFlow implementation of Wav2Vec2

Community and sentiment analysis based on tweets

Text Normalization（文本正则化）

使用pytorch+transformers复现了SimCSE论文中的有监督训练和无监督训练方法

Framework for fine-tuning pretrained transformers for Named-Entity Recognition (NER) tasks

Use PaddlePaddle to reproduce the paper：mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

InferSent sentence embeddings