Natural Language Processing - Sommer Semester 2022

Overview

Natural Language Processing (DIS25a/NLP)

This course can be taken for the Bachelor Programm Data and Information Science (DIS25a) or the Master Program Digital Sciences (NLP).

After easter all sessions are hosted at TH Köln, Claudiusstraße 1. The sessions will be held life. Slides will be usually available a night before the actual lecture. We try to record all lectures and tutorials for later referal (not sure how this works out with the sessions at Claudiusstraße).

Schedule for Summer Semester 2022

(L) Lectures; (T) Tutorials; (P) Project

The first lectures and tutorial were recorded and are available online. The password is the same as for the Zoom sessions.

Date Slot 13:30h Slot 15:15h DIS25a (DIS B.Sc.) NLP (DS M.Sc.)
1.4.2022 Introduction and Overview (L) Basic Text Processing (L) x x
8.4.2022 Basic NLP Pipeline: NLTK (T) (solution) Common Toolkit: Spacy (T) (solution) x x
15.4.2022 no lecture
22.4.2022 WordNet (L) Vector Semantics (L) x x
29.4.2022 WordNet, GermaNet (T) (solution) Vector Semantics (T) (solution) x x
6.5.2022 Information Extraction (L) Sentiment Analysis (L) x x
13.5.2022 no lecture
20.5.2022 Language Models and Ethics in NLP (L) Group assignment (P) x x
27.5.2022 Group work (P) Group work (P) x
3.6.2022 Data Programming for IE (L) Group work (P) / Oral Exam Master x x
10.6.2022 Guest Lecture: Dimitar Dimitrov(L) Group work (P) x
17.6.2022 Group work (P) Group work (P) x
24.6.2022 Student talks - Project presentation (P) Student talks - Project presentation (P) x
31.8.2022 Submission of term papers x

Bachelor: Group Assignments

In the group assignments a group of four students has to work on a bias-related topic with a specific focus and on one of three datasets. In the group work phases starting on 20.5.2022 we will be available during the lecture time to help and advise.

In the presentations on 24.6.2022 you are expected to present a concept regarding your specific topic and dataset. Please decribe the motivation, the dataset, your methods and NLP pipeline, a working prototype and some first insights and results.

The feedback gathered during the presentation should be used to write a final term paper on your specific topic and work. Please read the guidelines for the term paper.

Datasets

Choose one of the following datasets to work on:

Topics

Choose one of the following topcis:

Gender Bias

Gender bias is a group bias in which different genders are represented differently in terms of an aspect in a given (set of) document(s) than expected. Aspects for which there can be a bias range from quantitative measures (e.g., how many documents have male/female authors) to more complex NLP measures (e.g., different sentiments in texts about male/female politicians or topical bias, different distributions of topics in texts geared towards male/female readers).

Exaples for papers that investigate gender bias:

Ethnic Bias

Like gender bias, ethnic or racial bias describes bias towards groups of people belonging to an ethnical (or religious) group. Ethnic bias includes harmful stereotypes and less blatant but still dangerous aspects like topical bias. Detecting ethnic bias is not only important because it may lead to even more severe instances of racism, and it is an infringement of the constitutional right to equal treatment.

Exaples for papers that investigate ethnic bias:

Non-Neutral Speech

Non-neutral language consists of many aspects of language that is subjective, opinionated, or otherwise implies valuation. This includes toxicity, ranging from forms of hate speech such as racism, incivility, profane, offensive and aggressive language to over-positive praises. Non-neutral language is especially problematic when it appears in types of documents that claim to be neutral, such as wikipedia or (public) news. A related concept is framing bias, defined as the use of subjective words or phrases linked with a particular opinion.

Exaples for papers that investigate non-neutral language:

Stance Detection

Stance is a concept that describes an opinion on a subject, most often in a political context. The goal of stance detection is to detect the stances of users/authors towards these subjects. Often, the subjects are known due to context (for example, abortion, weapon laws and gay marriage in political texts) or they have to be determined using approaches like entity recognition. A related concept is that of target-dependent or aspect-based sentiment analysis, in which the opinions on aspects (targets) are detected.

Exaples for papers that investigate stance detection:

Owner
Classrooms of IR Group at Technische Hochschule Köln
Classrooms of IR Group at Technische Hochschule Köln
Exploit grafana Pre-Auth LFI

Grafana-LFI-8.x Exploit grafana Pre-Auth LFI How to use python3

2 Jul 25, 2022
Exploiting CVE-2021-44228 in Unifi Network Application for remote code execution and more

Log4jUnifi Exploiting CVE-2021-44228 in Unifi Network Application for remote cod

96 Jan 02, 2023
DoSer.py - Simple DoSer in Python

DoSer.py - Simple DoSer in Python What is DoSer? DoSer is basically an HTTP Denial of Service attack that affects threaded servers. It works like this

1 Oct 12, 2021
Aiminsun 165 Dec 21, 2022
Script checks provided domains for log4j vulnerability

log4j Script checks provided domains for log4j vulnerability. A token is created with canarytokens.org and passed as header at request for a single do

Matthias Nehls 2 Dec 12, 2021
A tool to crack a wifi password with a help of wordlist

A tool to crack a wifi password with a help of wordlist. This may take long to crack a wifi depending upon number of passwords your wordlist contains. Also it is slower as compared to social media ac

Saad 144 Dec 29, 2022
Unauthenticated Sqlinjection that leads to dump data base but this one impersonated Admin and drops a interactive shell

Unauthenticated Sqlinjection that leads to dump database but this one impersonated Admin and drops a interactive shell

sam 16 Nov 09, 2022
A proxy for asyncio.AbstractEventLoop for testing purposes

aioloop-proxy A proxy for asyncio.AbstractEventLoop for testing purposes. When tests writing for asyncio based code, there are controversial requireme

aio-libs 12 Dec 12, 2022
Laravel RCE (CVE-2021-3129)

CVE-2021-3129 - Laravel RCE About The script has been made for exploiting the Laravel RCE (CVE-2021-3129) vulnerability. This script allows you to wri

Joshua van der Poll 21 Dec 27, 2022
Denial Attacks by Various Methods

Denial Service Attack Denial Attacks by Various Methods IIIIIIIIIIIIIIIIIIII PPPPPPPPPPPPPPPPP VVVVVVVV VVVVVVVV I::

Baris Dincer 9 Nov 26, 2022
Log4j2 intranet scan

Log4j2-intranet-scan ⚠️ 免责声明 本项目仅面向合法授权的企业安全建设行为,在使用本项目进行检测时,您应确保该行为符合当地的法律法规,并且已经取得了足够的授权 如您在使用本项目的过程中存在任何非法行为,您需自行承担相应后果,我们将不承担任何法律及连带责任 在使用本项目前,请您务

k3rwin 16 Dec 19, 2022
一款Web在线自动免杀工具

一款利用加载器以及Python反序列化绕过AV的在线免杀工具 因为打包方式的局限性,不能跨平台,若要生成exe格式的只能在Windows下运行本项目 打包速度有点慢,提交后稍等一会 开发环境及运行 前端使用Bootstrap框架,后端使用Django框架 。

yhy 172 Nov 28, 2022
A small Python Script To get all levels of subdomains from a list

getlevels A small Python Script To get all levels of subdomains Easily get 1st level, 2nd level, 3rd level, 4th level .... nth level subdomains Usag

9 Feb 15, 2022
Website OSINT untuk mencari informasi dari email dan nomor telepon. Dibuat dengan React dan Flask.

Inspektur Cari informasi mengenai email dan nomor telepon dengan mudah. Inspektur adalah aplikasi OSINT yang berguna untuk mencari informasi berdasark

Bagas Wastu 36 Dec 04, 2022
XSS scanner in python

DeadXSS XSS scanner in python How to Download: Step 1: git clone https://github.com/Deadeye0x/DeadXSS.git Step 2: cd DeadXSS Step 3: python3 DeadXSS.p

2 Jul 17, 2022
A Fast Broken Link Hijacker Tool written in Python

Broken Link Hijacker BrokenLinkHijacker(BLH) is a Fast Broken Link Hijacker Tool written in Python.

Mayank Pandey 70 Nov 30, 2022
Execution After Redirect (EAR) / Long Response Redirection Vulnerability Scanner written in python3

Execution After Redirect (EAR) / Long Response Redirection Vulnerability Scanner written in python3, It Fuzzes All URLs of target website & then scan them for EAR

Pushpender Singh 9 Dec 12, 2022
Vulnerability Exploitation Code Collection Repository

Introduction expbox is an exploit code collection repository List CVE-2021-41349 Exchange XSS PoC = Exchange 2013 update 23 = Exchange 2016 update 2

0x0021h 263 Feb 14, 2022
A python script to brute-force guess the passwords to Instagram accounts

Instagram-Brute-Force The purpose of this script is to brute-force guess the passwords to Instagram accounts. Specifics: Comes with 2 separate modes i

Moondog 2 Nov 16, 2021
CVE-2022-21907 - Windows HTTP协议栈远程代码执行漏洞 CVE-2022-21907

CVE-2022-21907 Description POC for CVE-2022-21907: Windows HTTP协议栈远程代码执行漏洞 creat

antx 365 Nov 30, 2022