AI-UPV at IberLEF-2021 EXIST task: Sexism Prediction in Spanish and English Tweets Using Monolingual and Multilingual BERT and Ensemble Models

Last update: Jun 08, 2022

Overview

AI-UPV at IberLEF-2021 EXIST task: Sexism Prediction in Spanish and English Tweets Using Monolingual and Multilingual BERT and Ensemble Models

Description

This repository contains the code for the paper Sexism Prediction in Spanish and English Tweets Using Monolingual and Multilingual BERT and Ensemble Models. This paper will be published at the SEPLN-WS-IberLEF 2021 (the 3rd Workshop on Iberian Languages Evaluation Forum at the SEPLN 2021 Conference) scientific event. Descriptions of the implementation and the dataset are contained in the paper (link: Paper is soon...).

Paper Abstract

The popularity of social media has created problems such as hate speech and sexism. The identification and classification of sexism in social media are very relevant tasks, as they would allow building a healthier social environment. Nevertheless, these tasks are considerably challenging. This work proposes a system to use multilingual and monolingual BERT and data points translation and ensemble strategies for sexism identification and classification in English and Spanish. It was conducted in the context of the sEXism Identification in Social neTworks shared 2021 (EXIST 2021) task, proposed by the Iberian Languages Evaluation Forum (IberLEF). The proposed system and its main components are described, and an in-depth hyperparameters analysis is conducted. The main results observed were: (i) the system obtained better results than the baseline model (multilingual BERT); (ii) ensemble models obtained better results than monolingual models; and (iii) the E6 model (ensemble model considering all individual models and the best standardized values) obtained the best accuracies and F1-scores for both tasks. This work obtained first place in both tasks at EXIST, with the highest accuracies (0.780 for task 1 and 0.658 for task 2) and F1-scores (F1-binary of 0.780 for task 1 and F1-macro of 0.579 for task 2).

Credits

EXIST shared Task Organizers

Task website: http://nlp.uned.es/exist2021/

Contact: [email protected]

AI-UPV at IberLEF-2021 EXIST task: Sexism Prediction in Spanish and English Tweets Using Monolingual and Multilingual BERT and Ensemble Models

Related tags

Overview

AI-UPV at IberLEF-2021 EXIST task: Sexism Prediction in Spanish and English Tweets Using Monolingual and Multilingual BERT and Ensemble Models

Description

Paper Abstract

Credits

Owner

Angel de Paula

Control-Robot-Arm-using-PS4-Controller - A Robotic Arm based on Raspberry Pi and Arduino that controlled by PS4 Controller

GAN-based 3D human pose estimation model for 3DV'17 paper

Notebook and code to synthesize complex and highly dimensional datasets using Gretel APIs.

BOVText: A Large-Scale, Multidimensional Multilingual Dataset for Video Text Spotting

Food recognition model using convolutional neural network & computer vision

Large-Scale Pre-training for Person Re-identification with Noisy Labels (LUPerson-NL)

Official implementation of "Open-set Label Noise Can Improve Robustness Against Inherent Label Noise" (NeurIPS 2021)

[NeurIPS'21] "AugMax: Adversarial Composition of Random Augmentations for Robust Training" by Haotao Wang, Chaowei Xiao, Jean Kossaifi, Zhiding Yu, Animashree Anandkumar, and Zhangyang Wang.

This project deals with the detection of skin lesions within the ISICs dataset using YOLOv3 Object Detection with Darknet.

Code of the paper "Multi-Task Meta-Learning Modification with Stochastic Approximation".

A repository for the updated version of CoinRun used to collect MUGEN, a multimodal video-audio-text dataset.

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

Run containerized, rootless applications with podman

Code for Paper "Evidential Softmax for Sparse MultimodalDistributions in Deep Generative Models"

This repo generates the training data and the model for Morpheus-Deblend

Your interactive network visualizing dashboard

ProFuzzBench - A Benchmark for Stateful Protocol Fuzzing

Jigsaw Rate Severity of Toxic Comments

CLUES: Few-Shot Learning Evaluation in Natural Language Understanding

clustering moroccan stocks time series data using k-means with dtw (dynamic time warping)