Sentiment-Analysis and EDA on the IMDB Movie Review Dataset

Last update: Jan 12, 2022

Overview

Sentiment-Analysis and EDA on the IMDB Movie Review Dataset

The main part of the work focuses on the exploration and study of different approaches which are used for Sentiment Analysis (e.g. Bag of Words, TF-IDF, Word Embeddings). In addition, the work utilizes and compares different classification algorithms for Sentiment Analysis tasks in Natural Language Processing (e.g. Tree based Algorithms, Linear Models and Support Vector Machines).

Author: Nikolas Petrou, MSc in Data Science

Technical-Report and Code Availability

The complete text and analysis of the work is available and located in EDA-and-Sentiment-Analysis-on IMDB-Dataset.pdf file
The implementation and code of the project is located in the Implementation-Python Files folder.

Overview

The goal of this work focuses on the exploration and study of different approaches which are used for Sentiment Analysis (e.g. Bag of Words, TF-IDF, Word Embeddings). In addition, the work utilizes and compares different classification algorithms for Sentiment Analysis tasks in Natural Language Processing (e.g. Tree based Algorithms, Linear Models and Support Vector Machines).

Dataset

For this work, a large dataset which consists of movie reviews was used. Specifically, the publicly available Internet Movie Database (IMDB) review dataset

The data can be obtained from Kaggle or direcetly from Stanford

Methodology

An abstract methodology scheme of the work is illustrated in the following Figure.

Summarizing, firstly the initial questions were set in respect to the used dataset. Subsequentially, the data scrapping and data collection were performed. In addition, after the data preprocessing steps were performed, different data analytics and analysis were ,employed in order to better understand the data insights. Finally, during the final analysis, different methodologies and models were utilized in order to classify the textual data based on the sentiment. It is crucial to mention that the whole processed followed a cyclical scheme.

Sentiment-Analysis and EDA on the IMDB Movie Review Dataset

Related tags

Overview

Sentiment-Analysis and EDA on the IMDB Movie Review Dataset

Technical-Report and Code Availability

Overview

Dataset

Methodology

Owner

Nikolas Petrou

A method to generate speech across multiple speakers

ConferencingSpeech2022; Non-intrusive Objective Speech Quality Assessment (NISQA) Challenge

🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation

Implementing SimCSE(paper, official repository) using TensorFlow 2 and KR-BERT.

Multilingual text (NLP) processing toolkit

🌸 fastText + Bloom embeddings for compact, full-coverage vectors with spaCy

What are the best Systems? New Perspectives on NLP Benchmarking

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

Malware-Related Sentence Classification

Linear programming solver for paper-reviewer matching and mind-matching

jel - Japanese Entity Linker - is Bi-encoder based entity linker for japanese.

Dual languaged (rus+eng) tool for packing and unpacking archives of Silky Engine.

A relatively simple python program to generate one of those reddit text to speech videos dominating youtube.

Script to download some free japanese lessons in portuguse from NHK

Code for paper: An Effective, Robust and Fairness-awareHate Speech Detection Framework

Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/

nlp-tutorial is a tutorial for who is studying NLP(Natural Language Processing) using Pytorch

ACL'22: Structured Pruning Learns Compact and Accurate Models

Full Spectrum Bioinformatics - a free online text designed to introduce key topics in Bioinformatics using the Python