Course materials for a 3-day seminar "Machine Learning and NLP: Advances and Applications" at New College of Florida

Overview

Machine Learning and NLP: Advances and Applications

This repository hosts the course materials used for a 3-day seminar "Machine Learning and NLP: Advances and Applications" as part of Independent Study Period 2020 at New College of Florida.

Note that the seminar was held in Jan 2020, and the content may be a little bit oudated (as of Feb 2022). Please also refer to a Fall 2021 full semester course "CIS6930 Topics in Computing for Data Science", which covers much wider (and a little bit newer) Deep Learning topics.

Syllabus

Course Description

This 3-day course provides students with an opportunity to learn Machine Learning and Natural Language Processing (NLP) from basics to applications. The course covers some state-of-the-art NLP techniques including Deep Learning. Each day consists of a lecture and a hands-on session to help students learn how to apply those techniques to real-world applications. During the hands-on session, students will be given assignments to develop programming code in Python. Three days are too short to fully understand the concepts that are covered by the course and learn to apply those techniques to actual problems. Students are strongly encouraged to complete reading assignments before the lecture to be ready for the course assignments, and bring a lot of questions to the course. :)

Learning Objectives

Students successfully completing the course will

  • demonstrate the ability to apply machine learning and natural language processing techniques to various types of problems.
  • demonstrate the ability to build their own machine learning models using Python libraries.
  • demonstrate the ability to read and understand research papers in ML and NLP.

Course Outline

  • Wed 1/22 Day 1: Machine Learning basics [Slides]

    • Machine learning examples
    • Problem formulation
    • Evaluation and hyper-parameter tuning
    • Data Processing basics with pandas
    • Machine Learning with scikit-learn
    • Hands-on material: [ipynb] Open In Colab
  • Thu 1/23 Day 2: NLP basics [Slides]

    • Unsupervised learning and visualization
    • Topic models
    • NLP basics with SpaCy and NLTK
    • Understanding NLP pipeline for feature extraction
    • Machine learning for NLP tasks (text classification, sequential tagging)
    • Hands-on material [ipynb] Open In Colab
    • Follow-up
      • Commonsense Reasoning (Winograd Schema Challenge)
  • Fri 1/24 Day 3: Advanced techniques and applications [Slides]

    • Basic Deep Learning techniques
    • Word embeddings
    • Advanced Deep Learning techniques for NLP
    • Problem formulation and applications to (non-)NLP tasks
    • Pre-training models: ELMo and BERT
    • Hands-on material: [ipynb] Open In Colab
    • Follow-up
      • The Illustrated Transformer – Jay Alammar – Visualizing machine learning one concept at a time
      • Cross-lingual word/sentence embeddings

Reading Assignments & Recommendations:

The following online tutorials for students who are not familiar with the Python libraries used in the course. Each day will have a hands-on session that requires those libraries. Please do not expect to have enough time to learn how to use those libraries during the lecture.

The following list is a good starting point.

The course will cover the following papers as examples of (non-NLP) applications (probably in Day 3.) Students who'd like to learn how to apply Deep Learning techniques to your own problems are encouraged to read the following papers.

  • [1] A. Asai, S. Evensen, B. Golshan, A. Halevy, V. Li, A. Lopatenko, D. Stepanov, Y. Suhara, W.-C. Tan, Y. Xu, "HappyDB: A Corpus of 100,000 Crowdsourced Happy Moments" Proc LREC 18, 2018. [Paper] [Dataset]
  • [2] S. Evensen, Y. Suhara, A. Halevy, V. Li, W.-C. Tan, S. Mumick, "Happiness Entailment: Automating Suggestions for Well-Being," Proc. ACII 2019, 2019. [Paper]
  • [3] Y. Suhara, Y. Xu, A. Pentland, "DeepMood: Forecasting Depressed Mood Based on Self-Reported Histories via Recurrent Neural Networks," Proc. WWW '17, 2017. [Paper]
  • [4] N. Bhutani, Y. Suhara, W.-C. Tan, A. Halevy, H. V. Jagadish, "Open Information Extraction from Question-Answer Pairs," Proc. NAACL-HLT 2019, 2019. [Paper]

Computing Resources:

The course requires students to write code:

  • Students are expected to have a personal computer at their disposal. Students should have a Python interpreter and the listed libraries installed on their machines.

The hands-on sessions will require the following Python libraries. Please install those libraries on your computer prior to the course. See also the reading assignment section for the recommended tutorials.

  • pandas
  • scikit-learn
  • gensim
  • spacy
  • nltk
  • torch (PyTorch)
Owner
Yoshi Suhara
Yoshi Suhara
MiniJVM is simple java virtual machine written by python language, it can load class file from file system and run it.

MiniJVM MiniJVM是一款使用python编写的简易JVM,能够从本地加载class文件并且执行绝大多数指令。 支持的功能 1.从本地磁盘加载class并解析 2.支持绝大多数指令集的执行 3.支持虚拟机内存分区以及对象的创建 4.支持方法的调用和参数传递 5.支持静态代码块的初始化 不支

keguoyu 60 Apr 01, 2022
Linux Pressure Stall Information (PSI) Status App

Linux Pressure Stall Information (PSI) Status App psistat is a simple python3 program to display the PSIs and to capture/display exception events. psi

Joe D 3 Sep 18, 2022
Your Google Recon is Now Automated

GRecon : GRecon (Greei-Conn) is a simple python tool that automates the process of Google Based Recon AKA Google Dorking The current Version 1.0 Run 7

adnane-tebbaa 189 Dec 21, 2022
PyWorkflow(PyWF) - A Python Binding of C++ Workflow

PyWorkflow(PyWF) - A Python Binding of C++ Workflow 概览 C++ Workflow是一个高性能的异步引擎,本项目着力于实现一个Python版的Workflow,让Python用户也能享受Workflow带来的绝佳体验。

Sogou-inc 108 Dec 01, 2022
使用clash核心,对服务器进行Netflix解锁批量测试。

注意事项 测速及解锁测试仅供参考,不代表实际使用情况,由于网络情况变化、Netflix封锁及ip更换,测速具有时效性 本项目使用 Python 编写,使用前请完成环境安装 首次运行前请安装pip及相关依赖,也可使用 pip install -r requirements.txt 命令自行安装 Net

11 Dec 07, 2022
frida-based ceserver. iOS analysis is possible with Cheat Engine.

frida-ceserver frida-based ceserver. iOS analysis is possible with Cheat Engine. Original by Dark Byte. Usage Install frida on iOS. python main.py Cyd

KenjiroIchise 89 Jan 08, 2023
An ultra fast cross-platform multiple screenshots module in pure Python using ctypes.

Python MSS from mss import mss # The simplest use, save a screen shot of the 1st monitor with mss() as sct: sct.shot() An ultra fast cross-platfo

Mickaël Schoentgen 799 Dec 30, 2022
This app converts an pdf file into the audio file.

PDF-to-Audio This app takes an pdf as an input and convert it into audio, and the library text-to-speech starts speaking the preffered page given in t

Ojas Barawal 3 Aug 04, 2021
Very simple encoding scheme that will encode data as a series of OwOs or UwUs.

OwO Encoder Very simple encoding scheme that will encode data as a series of OwOs or UwUs. The encoder is a simple state machine. Still needs a decode

1 Nov 15, 2021
Python script to automate the change of desktop background

wallomator Python script to automate the change of desktop background A python script that automates the process of changing the desktop background. I

Mohammed Haaris Javed 10 Jun 16, 2022
Swubcase - The shitty programming language

What is Swubcase? Swubcase is easy-to-use programming language that can fuck you

5 Jun 19, 2022
Run unpatched binaries on Nix/NixOS

Run unpatched binaries on Nix/NixOS

Thiago Kenji Okada 160 Jan 08, 2023
Simple Python tool to check if there is an Office 365 instance linked to a domain.

o365chk.py Simple Python script to check if there is an Office365 instance linked to a particular domain.

Steven Harris 37 Jan 02, 2023
Aoc 2021 kedro playground with python

AOC 2021 Overview This is your new Kedro project, which was generated using Kedro 0.17.5. Take a look at the Kedro documentation to get started. Rules

1 Dec 20, 2021
Experimental proxy for dumping the unencrypted packet data from Brawl Stars (WIP)

Brawl Stars Proxy Experimental proxy for version 39.99 of Brawl Stars. It allows you to capture the packets being sent between the Brawl Stars client

4 Oct 29, 2021
This repository provides a set of easy to understand and tested Python samples for using Acronis Cyber Platform API.

Base Acronis Cyber Platform API operations with Python !!! info Copyright © 2019-2021 Acronis International GmbH. This is distributed under MIT licens

Acronis International GmbH 3 Aug 11, 2022
India's own RPA Platform Python Powered

Welcome to My-AutoPylot , Made in India with ❤️ What is My-AutoPylot? PyBots is an Indian firm based in Vadodara, Gujarat. My-AutoPylot is a product d

PyBots Pvt Ltd 28 Sep 12, 2022
A partial-transpiler that converts a subset of Python to the Folders esoteric programming language

Py2Folders A partial-transpiler that converts a subset of Python to the Folders esoteric programming language Folders Folders is an esoteric programmi

Daniel Johnson 1 Dec 23, 2021
Python library to decode the EU Covid-19 vaccine certificate

DCC Utils Python library to decode the EU Covid-19 vaccine certificate, as specified by the EU. Setup pip install dcc-utils Make sure zbar is installe

Developers Italia 13 Mar 11, 2022
Flask html response minifier

Flask-HTMLmin Minify flask text/html mime type responses. Just add MINIFY_HTML = True to your deployment config to minify HTML and text responses of y

Hamid Feizabadi 85 Dec 07, 2022