A web scraping pipeline project that retrieves TV and movie data from two sources, then transforms and stores data in a MySQL database.

Last update: Mar 28, 2022

Overview

New to Streaming Scraper

An in-progress web scraping project built with Python, R, and SQL.

The scraped data are movie and TV show information. The goal of the project is to show new to streaming titles that arrive on Netflix monthly with additional details, such as critic and audience ratings.

Current stage: Preparing how to present data with R Markdown.

Testing at: https://charlesdungy.github.io/new-to-streaming-scraper/

Future stage: Complete documentation, comments.

Description

Data are retrieved from two different data sources: What's on Netflix (WON) and Rotten Tomatoes (RT). RT data are cleaned and transformed with Python, while WON data are cleaned and transformed with R.

All data are piped into a MySQL database, then retrieved for presentation in R.

Here is a high-level look at the pipeline:

Data Source 1 is WON data. Data Source 2 is RT data.

Main Packages/Tools

Python

R

SQL

MySQL

Current Directory Tree

License

MIT

A web scraping pipeline project that retrieves TV and movie data from two sources, then transforms and stores data in a MySQL database.

Related tags

Overview

New to Streaming Scraper

Description

Data Source 1 is WON data. Data Source 2 is RT data.

Main Packages/Tools

Python

R

SQL

Current Directory Tree

License

Owner

Charles Dungy

A simplistic scraper made to download tons of random screenshots made by people.

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

Telegram Group Scrapper

原神爬虫抓取原神界面圣遗物信息

SkyScrapers: A collection of variety of Scraping Apps

Use Flask API to wrap Facebook data. Grab the wapper of Facebook public pages without an API key.

Automated data scraper for Thailand COVID-19 data

Docker containerized Python Flask API that uses selenium to scrape and interact with websites

A Happy and lightweight Python Package that searches Google News RSS Feed and returns a usable JSON response and scrap complete article - No need to write scrappers for articles fetching anymore

Python script to check if there is any differences in responses of an application when the request comes from a search engine's crawler.

用python爬取江苏几大高校的就业网站，并提供3种方式通知给用户，分别是通过微信发送、命令行直接输出、windows气泡通知。

Nekopoi scraper using python3

A python module to parse the Open Graph Protocol

A high-level distributed crawling framework.

A scalable frontier for web crawlers

A scrapy pipeline that provides an easy way to store files and images using various folder structures.

A simple django-rest-framework api using web scraping

Scrapes Every Email Address of Every Society in Every University

Web scraper for Zillow

Screenhook is a script that captures an image of a web page and send it to a discord webhook.

A web scraping pipeline project that retrieves TV and movie data from two sources, then transforms and stores data in a MySQL database.

Related tags

Overview

New to Streaming Scraper

Description

Data Source 1 is WON data. Data Source 2 is RT data.

Main Packages/Tools

Python

R

SQL

Current Directory Tree

License

Owner

Charles Dungy

A simplistic scraper made to download tons of random screenshots made by people.

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

Telegram Group Scrapper

原神爬虫 抓取原神界面圣遗物信息

SkyScrapers: A collection of variety of Scraping Apps

Use Flask API to wrap Facebook data. Grab the wapper of Facebook public pages without an API key.

Automated data scraper for Thailand COVID-19 data

Docker containerized Python Flask API that uses selenium to scrape and interact with websites

A Happy and lightweight Python Package that searches Google News RSS Feed and returns a usable JSON response and scrap complete article - No need to write scrappers for articles fetching anymore

Python script to check if there is any differences in responses of an application when the request comes from a search engine's crawler.

用python爬取江苏几大高校的就业网站，并提供3种方式通知给用户，分别是通过微信发送、命令行直接输出、windows气泡通知。

Nekopoi scraper using python3

A python module to parse the Open Graph Protocol

A high-level distributed crawling framework.

A scalable frontier for web crawlers

A scrapy pipeline that provides an easy way to store files and images using various folder structures.

A simple django-rest-framework api using web scraping

Scrapes Every Email Address of Every Society in Every University

Web scraper for Zillow

Screenhook is a script that captures an image of a web page and send it to a discord webhook.

原神爬虫抓取原神界面圣遗物信息