Interpretable Models for NLP using PyTorch

Last update: Dec 17, 2022

Related tags

Overview

This repo is deprecated. Please find the updated package here.

Anuvada: Interpretable Models for NLP using PyTorch

One of the common criticisms of deep learning has been it's black box nature. To address this issue, researchers have developed many ways to visualise and explain the inference. Some examples would be attention in the case of RNN's, activation maps, guided back propagation and occlusion (in the case of CNN's). This library is an ongoing effort to provide a high-level access to such models relying on PyTorch.

Installing

Clone this repo and add it to your python library path.

Getting started

Importing libraries

import anuvada
import numpy as np
import torch
import pandas as pd

from anuvada.models.classification_attention_rnn import AttentionClassifier

Creating the dataset

from anuvada.datasets.data_loader import CreateDataset
from anuvada.datasets.data_loader import LoadData

data = CreateDataset()

df = pd.read_csv('MovieSummaries/movie_summary_filtered.csv')

# passing only the first 512 samples, I don't have a GPU!
y = list(df.Genre.values)[0:512]
x = list(df.summary.values)[0:512]

x, y = data.create_dataset(x,y, folder_path='test', max_doc_tokens=500)

Loading created dataset

l = LoadData()

x, y, token2id, label2id, lengths_mask = l.load_data_from_path('test')

Change into torch vectors

x = torch.from_numpy(x)

y = torch.from_numpy(y)

Create attention classifier

acf = AttentionClassifier(vocab_size=len(token2id),embed_size=25,gru_hidden=25,n_classes=len(label2id))

loss = acf.fit(x,y, lengths_mask ,epochs=5)

Epoch 1 / 5
[========================================] 100%	loss: 3.9904loss: 3.9904

Epoch 2 / 5
[========================================] 100%	loss: 3.9851loss: 3.9851

Epoch 3 / 5
[========================================] 100%	loss: 3.9783loss: 3.9783

Epoch 4 / 5
[========================================] 100%	loss: 3.9739loss: 3.9739

Epoch 5 / 5
[========================================] 100%	loss: 3.9650loss: 3.9650

To do list

Implement Attention with RNN
Implement Attention Visualisation
Implement working Fit Module
Implement support for masking gradients in RNN (Working now!)
Implement a generic data set loader
Implement CNN Classifier with feature map visualisation

Acknowledgments

https://github.com/henryre/pytorch-fitmodule

Interpretable Models for NLP using PyTorch

Related tags

Overview

Anuvada: Interpretable Models for NLP using PyTorch

Installing

Getting started

Importing libraries

Creating the dataset

Loading created dataset

Change into torch vectors

Create attention classifier

To do list

Acknowledgments

Owner

Sandeep Tammu

Traditional Chinese Text Recognition Dataset: Synthetic Dataset and Labeled Data

A fast Text-to-Speech (TTS) model. Work well for English, Mandarin/Chinese, Japanese, Korean, Russian and Tibetan (so far). 快速语音合成模型，适用于英语、普通话/中文、日语、韩语、俄语和藏语（当前已测试）。

Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence models

Estimation of the CEFR complexity score of a given word, sentence or text.

Speech Recognition Database Management with python

Spooky Skelly For Python

A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

Deploying a Text Summarization NLP use case on Docker Container Utilizing Nvidia GPU

Kinky furry assitant based on GPT2

A Python/Pytorch app for easily synthesising human voices

An easy-to-use framework for BERT models, with trainers, various NLP tasks and detailed annonations

Data and code to support "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley)

Random-Word-Generator - Generates meaningful words from dictionary with given no. of letters and words.

A full spaCy pipeline and models for scientific/biomedical documents.

This repository contains helper functions which can help you generate additional data points depending on your NLP task.

GrammarTagger — A Neural Multilingual Grammar Profiler for Language Learning

TPlinker for NER 中文/英文命名实体识别

多语言降噪预训练模型MBart的中文生成任务

WikiPron - a command-line tool and Python API for mining multilingual pronunciation data from Wiktionary