Amazon Scraper: A command-line tool for scraping Amazon product data

Overview

Amazon Product Scraper: 2021

Description

A command-line tool for scraping Amazon product data to CSV or JSON format(s).

Requirements

  • Python 3
  • pip3

Installation

Using git clone (you'll need git installed for this):

git clone https://github.com/scrapewalrus/amazon-scraper-python-2021

Or download and extract the zip file of the project manually

You'll also need to install requirements for the project to run. Locate amazon-product-scraper folder via terminal and type pip install -r requirements.txt:

Usage

To launch the Amazon scraper locate the amazon-product-scraper folder via terminal and type python amazon_scraper.py -k "your keyword". This will start the program.

NOTE: you must declare either -k or --keyword before entering your keyword. It's a required argument.

Example:

amazon_scraper.py - the name of a scraper file.

-k or --keyword - required argument to pass before entering your keyword.

-p or --proxies - optional argument to enable proxies. To avoid getting blocked I highly recommend using proxies. I'm using Residential Proxies from Oxylabs. For highest success rate, I suggest Residential Proxies over Datacenter as they're almost impossible to detect and have the smallest footprint. If you decide to use different proxy provider services keep in mind that you'll have to make some minor adjustments in get-proxies.py file.

-j or --json - optional argument for storing extracted data in .json format. Default output format is .csv.

Example of product data #1: JSON

[
    {
        "SOURCE_URL": "https://www.amazon.com/s?k=funny+t+shirt+for+women&page=1",
        "PAGE": 1,
        "KEYWORD": "funny t shirt for women",
        "PRODUCT_LINK": "https://www.amazon.com/Mostly-T-Shirt-Womens-Letter-Printed/dp/B07QN2NQ59/ref=sr_1_3?dchild=1&keywords=funny+t+shirt+for+women&qid=1627833682&sr=8-3",
        "PRODUCT_NAME": "I'm Mostly Peace Love and Light Funny T-Shirt Womens Graphic Printed Short Sleeve Tops Tee",
        "PRICE": "$21.99",
        "PRODUCT_RATING": "4.6",
        "NUMBER_OF_RATINGS": "1,637"
    },
    {
        "SOURCE_URL": "https://www.amazon.com/s?k=funny+t+shirt+for+women&page=1",
        "PAGE": 1,
        "KEYWORD": "funny t shirt for women",
        "PRODUCT_LINK": "https://www.amazon.com/YITAN-Women-Graphic-Funny-X-Large/dp/B074QMG4D7/ref=sr_1_4?dchild=1&keywords=funny+t+shirt+for+women&qid=1627833682&sr=8-4",
        "PRODUCT_NAME": "YITAN Women's Cute Juniors Tops Teen Girl Tee Funny T Shirt",
        "PRICE": "$12.99",
        "PRODUCT_RATING": "4.6",
        "NUMBER_OF_RATINGS": "12,281"
    },
    {
        "SOURCE_URL": "https://www.amazon.com/s?k=funny+t+shirt+for+women&page=1",
        "PAGE": 1,
        "KEYWORD": "funny t shirt for women",
        "PRODUCT_LINK": "https://www.amazon.com/DANVOUY-Womens-V-Neck-Doesnt-Definitely/dp/B07V55ZXVS/ref=sr_1_5?dchild=1&keywords=funny+t+shirt+for+women&qid=1627833682&sr=8-5",
        "PRODUCT_NAME": "DANVOUY Womens If My Mouth Doesn't Say It My Face Definitely Will T Shirt",
        "PRICE": "$12.99",
        "PRODUCT_RATING": "4.6",
        "NUMBER_OF_RATINGS": "6,787"
    }]

Example of product data #2: CSV

amazon-csv-product-data-example

Doro is a CLI based pomodoro app and countdown timer application built using python.

Doro - CLI based pomodoro app Doro is a CLI based pomodoro app and countdown timer application built using python. Install $ pip install doro Usage Po

Suresh Kumar 14 May 23, 2022
Open-Source Python CLI package for copying DynamoDB tables and items in parallel batch processing + query natural & Global Secondary Indexes (GSIs)

Python Command-Line Interface Package to copy Dynamodb data in parallel batch processing + query natural & Global Secondary Indexes (GSIs).

1 Oct 31, 2021
Output Analyzer for you terminal commands

Output analyzer (OZER) You can specify a few words inside config.yaml file and specify the color you want to be used. installing: Install command usin

Ehsan Shirzadi 1 Oct 21, 2021
A simple script that outputs the current date on the user interface/terminal.

Py-Date A simple script that outputs the current date on the user interface/terminal. How to Run Open your terminal and cd into the folder containi

Arinzechukwu Okoye 1 Jan 13, 2022
Seamlessly run Python code in IPython from Vim

Seamlessly run Python code from Vim in IPython, including executing individual code cells similar to Jupyter notebooks and MATLAB. This plugin also supports other languages and REPLs such as Julia.

Hans Chen 269 Dec 20, 2022
🖥️ A cross-platform modern shell.

Ergonomica WARNING: master on this repository is not the same as a stable release! Currently, this software is purely experimental, as I am cleaning i

813 Dec 27, 2022
topalias - Linux alias generator from bash/zsh command history with statistics, written on Python.

topalias topalias - Linux alias generator from bash/zsh command history with statistics, written on Python. Features Generate short alias for popular

Sergey Chudakov 38 May 26, 2022
Dart Version Manager CLI implemented with Python and Typer.

Dart Version Manager CLI implemented with Python and Typer.

EducUp 6 Jun 26, 2022
🎄 Advent of Code command-line tool.

🎄 advent-cli advent-cli is a command-line tool for interacting with Advent of Code, specifically geared toward writing solutions in Python. It can be

Christian Ferguson 6 Dec 01, 2022
A curated list of awesome things related to Textual

Awesome Textual | A curated list of awesome things related to Textual. Textual is a TUI (Text User Interface) framework for Python inspired by modern

Marcelo Trylesinski 5 May 08, 2022
Wik is use to get information about anything on the shell using Wikipedia.

WIK wik is a tool to view wikipedia pages from your terminal. It also let you search for any wikipedia up to date article on one query from your termi

Yash Singh 340 Dec 18, 2022
A CLI tools to get you started on any project in any language

Any Template A faster easier to Quick start any programming project. Installation pip3 install any-template Features No third party dependencies. Tem

Adwaith Rajesh 2 Jan 11, 2022
Loading animation; a progress bar

Loading animation; a progress bar. When you know the remaining time or task completion percentage, then you’re able to show an animated progress bar:

Goldy 1 Jan 23, 2022
(BionicLambda Universal SHell) A simple shell made in Python. Docs and possible C port incoming.

blush 😳 (BionicLambda Universal SHell) A simple shell made in Python. Docs and possible C port incoming. Note: The Linux executables were made on Ubu

3 Jun 30, 2021
You'll never want to use cd again.

Jmp Description Have you ever used the cd command? You'll never touch that outdated thing again when you try jmp. Navigate your filesystem with unprec

Grant Holmes 21 Nov 03, 2022
A communist shell written in Python

kash A communist shell written in Python It doesn't support escapes, quotes, comment lines, |, &&, , or similar yet. If you need help, get it from

Çınar Yılmaz 1 Dec 10, 2021
Helping you manage your data science projects sanely.

PyDS CLI Helping you manage your data science projects sanely. Requirements Anaconda/Miniconda/Miniforge/Mambaforge (Mambaforge recommended!) git on y

Eric Ma 16 Apr 25, 2022
CPOST is a CLI tool to assist with the proper sizing of Clara Deploy pipelines

CPOST (Clara Pipeline Operator Sizing Tool) Tool to measure resource usage of Clara Platform pipeline operators Cpost is a tool that will help you run

NVIDIA Corporation 5 Sep 27, 2021
Command line, configuration and persistence utilities

Zensols Utilities Command line, configuration and persistence utilities generally used for any more than basic application. This general purpose libra

Paul Landes 2 Nov 17, 2022
PdpCLI is a pandas DataFrame processing CLI tool which enables you to build a pandas pipeline from a configuration file.

PdpCLI Quick Links Introduction Installation Tutorial Basic Usage Data Reader / Writer Plugins Introduction PdpCLI is a pandas DataFrame processing CL

Yasuhiro Yamaguchi 15 Jan 07, 2022