Community and sentiment analysis based on tweets

Overview

Social Media Analytics project

Community and sentiment analysis based on tweets

The project has set itself the goal of analyzing the thoughts and interaction of Italian users through the social posts expressed through the Twitter platform on the day of the entry into force of the new measures. In particular, we want to research the reference hubs present on the network, but also the sentiment and emotions of peoples with respect to the new limitations.

Motivation

One of the hottest topics in Italy in the last months of 2021 concerns the introduction of the Super Green Pass to access indoor clubs, events, gyms, etc. This security measure entered into force on 6 December 2021 and in fact no longer allows access to various services to those who have not completed the vaccination cycle. For these reasons it was decided, for the development of the project, to analyze the impressions of the Italian Twitter community regarding the Super Green Pass, with the aim of understanding who are the users who write and interact on the platform and if there are specific communities among the users who have commented on the introduction of this extension. We also want to analyze the possible influencing nodes of the network and verify the sentiment around them.

Data

The data was collected by Twitter using their API and Tweepy python package. All tweets were written on December 6th in italian languages.
In data folder you can find the .csv file with all the collected tweet (here), and you can also find two extras files that contains the sentiment extracted for each tweet (here) and the aggregated sentiment per cluster (here).

Files

All the developed code is present in the file Code.ipynb. You can also find the report and presentation made for the exam. Both in italian language.

How to run code?

We advise you to run all the code in Google Colaboratory platform. All notebooks all already setted to import the necessary packages! If you have any doubt please feel free to contact me!

Graph visualization

In Pyvis_export folder you can find two exported interactive visualization of the network graph. You can also find a static version of the images in .jpg files if you want to see them quickly (html version is quite slow at opening).

Results

We have found that hubs are not famous people, this may be an expected result due to the particular context of the no-vax discussion. In this context, the ideas and contents are more important than the celebrity of the person.
Focusing on sentiment analysis we noticed that the vast majority of tweets are neutral or negative! This is a far cry from the reality where most people have been vaccinated and are not that disappointed with the new rules.

About us

Riccardo Confalonieri - Data Science Student @ University of Milano-Bicocca

Justin Armanini - Data Science Student @ University of Milano-Bicocca

Chiara Cormio - Data Science Student @ University of Milano-Bicocca

Owner
Computer Science Bachelor @ Università degli Studi Milano Bicocca. DataScience Student @ Università degli Studi Milano Bicocca.
A Python wrapper for simple offline real-time dictation (speech-to-text) and speaker-recognition using Vosk.

Simple-Vosk A Python wrapper for simple offline real-time dictation (speech-to-text) and speaker-recognition using Vosk. Check out the official Vosk G

2 Jun 19, 2022
HF's ML for Audio study group

Hugging Face Machine Learning for Audio Study Group Welcome to the ML for Audio Study Group. Through a series of presentations, paper reading and disc

Vaibhav Srivastav 110 Jan 01, 2023
Repository for Project Insight: NLP as a Service

Project Insight NLP as a Service Contents Introduction Features Installation Setup and Documentation Project Details Demonstration Directory Details H

Abhishek Kumar Mishra 286 Dec 06, 2022
Multilingual finetuning of Machine Translation model on low-resource languages. Project for Deep Natural Language Processing course.

Low-resource-Machine-Translation This repository contains the code for the project relative to the course Deep Natural Language Processing. The goal o

Andrea Cavallo 3 Jun 22, 2022
null

CP-Cluster Confidence Propagation Cluster aims to replace NMS-based methods as a better box fusion framework in 2D/3D Object detection, Instance Segme

Yichun Shen 41 Dec 08, 2022
AI-powered literature discovery and review engine for medical/scientific papers

AI-powered literature discovery and review engine for medical/scientific papers paperai is an AI-powered literature discovery and review engine for me

NeuML 819 Dec 30, 2022
Tools, wrappers, etc... for data science with a concentration on text processing

Rosetta Tools for data science with a focus on text processing. Focuses on "medium data", i.e. data too big to fit into memory but too small to necess

207 Nov 22, 2022
Topic Inference with Zeroshot models

zeroshot_topics Table of Contents Installation Usage License Installation zeroshot_topics is distributed on PyPI as a universal wheel and is available

Rita Anjana 55 Nov 28, 2022
Speach Recognitions

easy_meeting Добро пожаловать в интерфейс сервиса автопротоколирования совещаний Easy Meeting. Website - http://cf5c-62-192-251-83.ngrok.io/ Принципиа

Maksim 3 Feb 18, 2022
A library for Multilingual Unsupervised or Supervised word Embeddings

MUSE: Multilingual Unsupervised and Supervised Embeddings MUSE is a Python library for multilingual word embeddings, whose goal is to provide the comm

Facebook Research 3k Jan 06, 2023
A notebook that shows how to import the IITB English-Hindi Parallel Corpus from the HuggingFace datasets repository

We provide a notebook that shows how to import the IITB English-Hindi Parallel Corpus from the HuggingFace datasets repository. The notebook also shows how to segment the corpus using BPE tokenizatio

Computation for Indian Language Technology (CFILT) 9 Oct 13, 2022
天池中药说明书实体识别挑战冠军方案;中文命名实体识别;NER; BERT-CRF & BERT-SPAN & BERT-MRC;Pytorch

天池中药说明书实体识别挑战冠军方案;中文命名实体识别;NER; BERT-CRF & BERT-SPAN & BERT-MRC;Pytorch

zxx飞翔的鱼 751 Dec 30, 2022
Mirco Ravanelli 2.3k Dec 27, 2022
A 10000+ hours dataset for Chinese speech recognition

A 10000+ hours dataset for Chinese speech recognition

309 Dec 16, 2022
An attempt to map the areas with active conflict in Ukraine using open source twitter data.

Live Action Map (LAM) An attempt to use open source data on Twitter to map areas with active conflict. Right now it is used for the Ukraine-Russia con

Kinshuk Dua 171 Nov 21, 2022
PIZZA - a task-oriented semantic parsing dataset

The PIZZA dataset continues the exploration of task-oriented parsing by introducing a new dataset for parsing pizza and drink orders, whose semantics cannot be captured by flat slots and intents.

17 Dec 14, 2022
Khandakar Muhtasim Ferdous Ruhan 1 Dec 30, 2021
NLP, before and after spaCy

textacy: NLP, before and after spaCy textacy is a Python library for performing a variety of natural language processing (NLP) tasks, built on the hig

Chartbeat Labs Projects 2k Jan 04, 2023
Cherche (search in French) allows you to create a neural search pipeline using retrievers and pre-trained language models as rankers.

Cherche (search in French) allows you to create a neural search pipeline using retrievers and pre-trained language models as rankers. Cherche is meant to be used with small to medium sized corpora. C

Raphael Sourty 224 Nov 29, 2022
A full spaCy pipeline and models for scientific/biomedical documents.

This repository contains custom pipes and models related to using spaCy for scientific documents. In particular, there is a custom tokenizer that adds

AI2 1.3k Jan 03, 2023