Data and code to support "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley)

Last update: Dec 06, 2022

Related tags

Overview

anlp21

Course materials for "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley) Syllabus: http://people.ischool.berkeley.edu/~dbamman/info256.html

Notebook	Description
1.words/EvaluateTokenizationForSentiment	The impact of tokenization choices on sentiment classification.
1.words/ExploreTokenization	Different methods for tokenizing texts (whitespace, NLTK, spacy, regex)
1.words/TokenizePrintedBooks	Design a better tokenizer for printed books
1.words/Text_Complexity	Implement type-token ratio and Flesch-Kincaid Grade Level scores for text
2.compare/ChiSquare, Mann-Whitney Tests	Explore two tests for finding distinctive terms
2.compare/Log-odds ratio with priors	Implement the log-odds ratio with an informative (and uninformative) Dirichlet prior
3.dictionaries/DictionaryTimeSeries	Plot sentiment over time using human-defined dictionaries
3.dictionaries/Empath	Explore using Empath dictionaries to characterize texts
4.embeddings/DistributionalSimilarity	Explore distributional hypothesis to build high-dimensional, sparse representations for words
4.embeddings/WordEmbeddings	Explore word embeddings using Gensim
4.embeddings/Semaxis	Implement SemAxis for scoring terms along a user-defined axis (e.g., positive-negative, concrete-abstract, hot-cold),
4.embeddings/BERT	Explore the basics of token representations in BERT and use it to find token nearest neighbors
4.embedings/SequenceEmbeddings	Use sequence embeddings to find TV episode summaries most similar to a short description
5.eda/WordSenseClustering	Inferring distinct word senses using KMeans clustering over BERT representations
5.eda/Haiku KMeans	Explore text representation in clustering by trying to group haiku and non-haiku poems into two distinct clusters

Data and code to support "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley)

Related tags

Overview

anlp21

Owner

David Bamman

Code for text augmentation method leveraging large-scale language models

Chinese real time voice cloning (VC) and Chinese text to speech (TTS).

BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents

Code voor mijn Master project omtrent VideoBERT

A curated list of efficient attention modules

Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform

Voice Assistant inspired by Google Assistant, Cortana, Alexa, Siri, ...

End-to-end image captioning with EfficientNet-b3 + LSTM with Attention

HAIS_2GNN: 3D Visual Grounding with Graph and Attention

My implementation of Safaricom Machine Learning Codility test. The code has bugs, logical I guess I made errors and any correction will be appreciated.

Crowd sourced training data for Rasa NLU models

NewsMTSC: (Multi-)Target-dependent Sentiment Classification in News Articles

This repository has a implementations of data augmentation for NLP for Japanese.

Toy example of an applied ML pipeline for me to experiment with MLOps tools.

Smart discord chatbot integrated with Dialogflow to manage different classrooms and assist in teaching!

this repository has datasets containing information of Uber pickups in NYC from April 2014 to September 2014 and January to June 2015. data Analysis , virtualization and some insights are gathered here

基于pytorch_rnn的古诗词生成

华为商城抢购手机的Python脚本 Python script of Huawei Store snapping up mobile phones

NL. The natural language programming language.