A universal package of scraper scripts for humans

Related tags

Web CrawlingScrapera
Overview

Logo

MIT License version-shield release-shield python-shield

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Contributing
  5. Sponsors
  6. License
  7. Contact
  8. Acknowledgements

About The Project

Scrapera is a completely Chromedriver free package that provides access to a variety of scraper scripts for most commonly used machine learning and data science domains. Scrapera directly and asynchronously scrapes from public API endpoints, thereby removing the heavy browser overhead which makes Scrapera extremely fast and robust to DOM changes. Currently, Scrapera supports the following crawlers:

  • Images
  • Text
  • Audio
  • Videos
  • Miscellaneous

  • The main aim of this package is to cluster common scraping tasks so as to make it more convenient for ML researchers and engineers to focus on their models rather than worrying about the data collection process

    DISCLAIMER: Owner or Contributors do not take any responsibility for misuse of data obtained through Scrapera. Contact the owner if copyright terms are violated due to any module provided by Scrapera.

    Prerequisites

    Prerequisites can be installed separately through the requirements.txt file as below

    pip install -r requirements.txt

    Installation

    Scrapera is built with Python 3 and can be pip installed directly

    pip install scrapera

    Alternatively, if you wish to install the latest version directly through GitHub then run

    pip install git+https://github.com/DarshanDeshpande/Scrapera.git

    Usage

    To use any sub-module, you just need to import, instantiate and execute

    from scrapera.video.vimeo import VimeoScraper
    scraper = VimeoScraper()
    scraper.scrape('https://vimeo.com/191955190', '540p')

    For more examples, please refer to the individual test folders in respective modules

    Contributing

    Scrapera welcomes any and all contributions and scraper requests. Please raise an issue if the scraper fails at any instance. Feel free to fork the repository and add your own scrapers to help the community!
    For more guidelines, refer to CONTRIBUTING

    License

    Distributed under the MIT License. See LICENSE for more information.

    Sponsors

    Logo

    Contact

    Feel free to reach out for any issues or requests related to Scrapera

    Darshan Deshpande (Owner) - Email | LinkedIn

    Acknowledgements

    Owner
    Helping Machines Learn Better 💻😃
    A leetcode scraper to compile all questions in leetcode free tier to text file. pdf also available.

    A leetcode scraper to compile all questions in leetcode free tier to text file, pdf also available. if new questions get added, run again to get new questions.

    3 Dec 07, 2021
    Ebay Webscraper for Getting Average Product Price

    Ebay-Webscraper-for-Getting-Average-Product-Price The code in this repo is used to determine the average price of an item on Ebay given a valid search

    17 Jan 05, 2023
    Get paper names from dblp.org

    scraper-dblp Get paper names from dblp.org and store them in a .txt file Useful for a related literature :) Install libraries pip3 install -r requirem

    Daisy Lab 1 Dec 07, 2021
    Simple library for exploring/scraping the web or testing a website you’re developing

    Robox is a simple library with a clean interface for exploring/scraping the web or testing a website you’re developing. Robox can fetch a page, click on links and buttons, and fill out and submit for

    Dan Claudiu Pop 79 Nov 27, 2022
    Incredibly fast crawler designed for OSINT.

    Photon Incredibly fast crawler designed for OSINT. Photon Wiki • How To Use • Compatibility • Photon Library • Contribution • Roadmap Key Features Dat

    Somdev Sangwan 9.3k Jan 02, 2023
    Command line program to download documents from web portals.

    command line document download made easy Highlights list available documents in json format or download them filter documents using string matching re

    16 Dec 26, 2022
    This is python to scrape overview and reviews of companies from Glassdoor.

    Data Scraping for Glassdoor This is python to scrape overview and reviews of companies from Glassdoor. Please use it carefully and follow the Terms of

    Houping 5 Jun 23, 2022
    Semplice scraper realizzato in Python tramite la libreria BeautifulSoup

    Semplice scraper realizzato in Python tramite la libreria BeautifulSoup

    2 Nov 22, 2021
    A Web Scraping Program.

    Web Scraping AUTHOR: Saurabh G. MTech Information Security, IIT Jammu. If you find this repository useful. I would appreciate if you Star it and Fork

    Saurabh G. 2 Dec 14, 2022
    Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

    Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation This repository provides two web crawlers to label domain nam

    1 Nov 05, 2021
    A Pixiv web crawler module

    Pixiv-spider A Pixiv spider module WARNING It's an unfinished work, browsing the code carefully before using it. Features 0004 - Readme.md updated, co

    Uzuki 1 Nov 14, 2021
    A web crawler for recording posts in "sina weibo"

    Web Crawler for "sina weibo" A web crawler for recording posts in "sina weibo" Introduction This script helps collect attributes of posts in "sina wei

    4 Aug 20, 2022
    Dailyiptvlist.com Scraper With Python

    Dailyiptvlist.com scraper Info Made in python Linux only script Script requires to have wget installed Running script Clone repository with: git clone

    1 Oct 16, 2021
    Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js

    Gerapy Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Scrapyd-Client, Scrapyd-API, Django and Vue.js. Documentation Documentation

    Gerapy 2.9k Jan 03, 2023
    A universal package of scraper scripts for humans

    Scrapera is a completely Chromedriver free package that provides access to a variety of scraper scripts for most commonly used machine learning and data science domains.

    299 Dec 15, 2022
    A web scraping pipeline project that retrieves TV and movie data from two sources, then transforms and stores data in a MySQL database.

    New to Streaming Scraper An in-progress web scraping project built with Python, R, and SQL. The scraped data are movie and TV show information. The go

    Charles Dungy 1 Mar 28, 2022
    Binance Smart Chain Contract Scraper + Contract Evaluator

    Pulls Binance Smart Chain feed of newly-verified contracts every 30 seconds, then checks their contract code for links to socials.Returns only those with socials information included, and then submit

    14 Dec 09, 2022
    Raspi-scraper is a configurable python webscraper that checks raspberry pi stocks from verified sellers

    Raspi-scraper is a configurable python webscraper that checks raspberry pi stocks from verified sellers.

    Louie Cai 13 Oct 15, 2022
    A tool for scraping and organizing data from NewsBank API searches

    nbscraper Overview This simple tool automates the process of copying, pasting, and organizing data from NewsBank API searches. Curerntly, nbscrape onl

    0 Jun 17, 2021
    Simple tool to scrape and download cross country ski timings and results from live.skidor.com

    LiveSkidorDownload Simple tool to scrape and download cross country ski timings and results from live.skidor.com Usage: Put the python file in a dedic

    0 Jan 07, 2022