Materials to reproduce our findings in our stories, "Amazon Puts Its Own 'Brands' First Above Better-Rated Products" and "When Amazon Takes the Buy Box, it Doesn’t Give it up"

Overview

Amazon Brands and Exclusives

This repository contains code to reproduce the findings featured in our story "Amazon Puts Its Own 'Brands' First Above Better-Rated Products" and "When Amazon Takes the Buy Box, it Doesn’t Give it up" from our series Amazon's Advantage.

Our methodology is described in "How We Analyzed Amazon’s Treatment of Its Brands in Search Results".

Data that we collected and analyzed is in the data folder.
To use the full input dataset (which is not hosted here), please refer to Download data.

Jupyter notebooks used for data preprocessing and analysis are available in the notebooks folder.
Descriptions for each notebook are outlined in the Notebooks section below.

Installation

Python

Make sure you have Python 3.6+ installed. We used Miniconda to create a Python 3.8 virtual environment.

Then install the Python packages:
pip install -r requirements.txt

Notebooks

These notebooks are intended to be run sequentially, but they are not dependent on one another. If you want a quick overview of the methodology, you only need to concern yourself with the notebooks with an asterisk(*).

0-data-preprocessing.ipynb

This notebook parses Amazon search results and Amazon product pages, and produces the intermediary datasets (data/output/datasets/) used in ranking analysis and random forest classifiers.

1-data-analysis-search-results.ipynb *

Bulk of the ranking analysis and stats in the data analysis.

2-random-forest-analysis.ipynb *

Feature engineering training set, finding optimal hyperparameters, and performing the ablation study on a random forest model. The most predictive feature is verified using three separate methods.

3-survey-results.ipynb

Visualizing the survey results from our national panel of 1,000 adults.

4-limiations-product-page-changes.ipynb

Analysis of how often the Buy Box's default shipper and seller change between Amazon and a third party.

utils.py

Contains convenient functions used in the notebooks.

parsers.py

Contains parsers for search results and product pages.

Data

This directory is where inputs, intermediaries, and outputs are saved.

data
├── output
│   ├── figures
│   ├── tables
│   └── datasets
│       ├── amazon_private_label.csv.xz
│       ├── products.csv.xz
│       ├── searches.csv.xz
│       ├── training_set.csv.gz
│       ├── pairwise_training_set.csv.gz
│       └── trademarks
└── input
    ├── combined_queries_with_source.csv
    ├── best_sellers
    ├── generic_search_terms
    ├── search-private-label
    ├── search-selenium
    ├── search-selenium-our-brands-filter_
    ├── selenium-products
    ├── seller_central
    └── spotcheck

data/output/ contains tables, figures, and datasets used in our methodology.

data/output/datasets/amazon_private_label.csv.xz is our dataset of Amazon brands, exclusives, and proprietary electronics (N=137,428 products). We use each product's unique ID (called an ASIN) to identify Amazon's own products in our methodology.

data/output/datasets/trademarks contains a dataset of trademarked brands registered by Amazon. The data was collected from USPTO.gov and Amazon. We included an additional README with the exact steps we took to build this dataset in the directory.

data/output/datasets/searches.csv.xz parsed search result pages from top and generic searches (N=187,534 product positions). You can filter this by search_term for each of these subsets from data/input/combined_queries_with_source.csv.

data/output/datasets/products.csv.xz parsed product pages from the searches above (N=157,405 product pages).

data/output/training_set.csv.gz metadata used to train and evaluate the random forest. Additionally, feature engineering is conducted in notebooks/2-random-forest-analysis.ipynb, which produces pairwise_training_set.csv.gz.

Every file in data/input except combined_queries_with_source.csv is stored in AWS s3. Those are not hosted in this repository.

Download Data

You can find the raw inputs in data/input in s3://markup-public-data/amazon-brands/.

If you trust us, you can download the HTML and JSON files in data/input using this script: sh data/download_input_data.sh

Note this is not necessary to run notebooks and see full results.

data/input/search-selenium/ (12 GB uncompressed)

First page of search results collected in January 2021. Download the HTML files search-selenium.tar.xz (238 MB compressed) here.

data/input/selenium-products/ (220 GB uncompressed)

Product pages collected in February 2021. Download the HTML files selenium-products.tar.xz (9 GB compressed) here.

data/input/search-selenium-our-brands-filter_/ (35 GB uncompressed)

Search results filtered by "our brands". Contains every page of search results. Download search-selenium-our-brands-filter_.tar.xz (403 MB compressed) here.

data/input/search-private-label/ (25 GB uncompressed)

API responses for search results filtered down to products Amazon identifies as "our brands". Contains paginated API results. Download the JSON files search-private-label.tar.xz (402 MB uncompressed) here.

data/input/seller_central/ (105 MB)

Seller central data for Q4 2020. Download the CSV file All_Q4_2020.csv.xz (105 MB compressioned) here.

data/input/best_sellers/ (4 GB)

Amazon's best sellers under the category "Amazon Devices & Accessories". Download the HTML files best_sellers.tar.xz (60MB compressed) here.

data/input/spotcheck/ (4 GB)

A sub-sample of product pages for spot-checking Buy Box changes. Download the HTML files spotcheck.tar.xz (159 MB compressed) here.

Minimal telegram voice chat music bot, in pyrogram.

VCBOT Fully working VC (user)Bot, based on py-tgcalls and py-tgcalls-wrapper with minimal features. Deploying To heroku: Local machine/VPS: git clone

Aditya 33 Nov 12, 2022
Wrapper for wttr.in weather forecast.

pywttr Wrapper for wttr.in weather forecast. Asynchronous version here. Installation pip install pywttr Example This example prints the average temper

Almaz 6 Dec 25, 2022
Torrent-Igruha SDK Python

Простой пример использования библиотеки: Устанавливаем библиотеку python -m

LORD_CODE 2 Jun 25, 2022
Get your Pixiv token (for running upbit/pixivpy)

gppt: get-pixivpy-token Get your Pixiv token (for running upbit/pixivpy) Refine pixiv_auth.py + its fork Install ❭ pip install gppt Run Note: In advan

haruna 58 Jan 04, 2023
Обертка для мини-игры "рабы" на python

Slaves API Библиотека для игры Рабы на Python. Большая просьба Поставьте звездочку на репозиторий. Это много для меня значит. Версии Т.к. разработчики

Zdorov Philipp 13 Mar 31, 2021
Utility for downloading fanfiction in bulk from the Archive of Our Own

What is this? This is a program intended to help you download fanfiction from the Archive of Our Own in bulk. This program is primarily intended to wo

73 Dec 30, 2022
Recommendation systems are among most widely preffered marketing strategies.

Recommendation systems are among most widely preffered marketing strategies. Their popularity comes from close prediction scores obtained from relationships of users and items. In this project, two r

Sübeyte 8 Oct 06, 2021
A Python module for communicating with the Twilio API and generating TwiML.

twilio-python The default branch name for this repository has been changed to main as of 07/27/2020. Documentation The documentation for the Twilio AP

Twilio 1.6k Jan 05, 2023
An advanced automatic top.gg dank memer voter that votes automatically for you.

Auto Dank Memer Voter An automatic dank memer voter that sends votes onto top.gg every 12 hours, unless their is captcha. I am working on a captcha de

6 Aug 27, 2022
Network simulation tools

Overview I'm building my network simulation environments with Vagrant using libvirt plugin on a Ubuntu 20.04 system... and I always hated how boring i

Ivan Pepelnjak 219 Jan 07, 2023
Telegram RAT written in Python

teleRAT Python based RAT that uses Telegram for sending commands and receiving data to and from a victim computer. Setup.py Insert your API key into t

96 Jan 01, 2023
Wordy is a Wordle-like Discord bot but with a twist.

Wordy Discord Bot Wordy is a Wordle-like Discord bot but with a twist. It already supports 6 languages from the beginning: English, Italian, French, G

The Coding Channel 2 Sep 06, 2022
SEMID - OSINT module with lots of discord functions

SEMID Framework About Semid is a framework with different Discord functions and

Hima 20 Sep 23, 2022
An information scroller Twitter trends, news, weather for raspberry pi and Pimoroni Unicorn Hat Mini and Scroll Phat HD.

uticker An information scroller Twitter trends, news, weather for raspberry pi and Pimoroni Unicorn Hat Mini and Scroll Phat HD. Features include: Twi

kottuora 5 Oct 31, 2022
Rich presence app for playstation 3. Display what game you are playing on the PS3 via Discord

PS3-Rich-Presence-for-Discord Discord Rich Presence script for PS3 consoles on HFW&HEN or CFW. Written in Python. Display what you are playing on your

17 Dec 11, 2022
Stackoverflow Telegram Bot With Python

Template for Telegram Bot Template to create a telegram bot in python. How to Run Set your telegram bot token as environment variable TELEGRAM_BOT_TOK

PyTopia 10 Mar 07, 2022
A telegram bot written in Python to fetch random SFW & NSFW anime images

Tsuzumi A telegram bot written in python to fetch both random SFW & NSFW Anime images using nekos.life & waifu.pics API Commands SFW Commands : /

Nisarga Adhikary 3 Oct 12, 2022
Free python/telegram bot for easy execution and surveillance of crypto trading plans on multiple exchanges.

EazeBot Introduction Have you ever traded cryptocurrencies and lost overview of your planned buys/sells? Have you encountered the experience that your

Marcel Beining 100 Dec 06, 2022
Irenedao-nft-generator - Original scripts used to generate IreneDAO NFTs

IreneDAO NFT Generator Scripts to generate IreneDAO NFT. Make sure you have Pill

libevm 60 Oct 27, 2022
Battle Pass farming tft bot

Tft bot Bot para farmar pontos do Passe de Batalha do TFT Descrição A cada partida de tft jogada você ganha 100 pontos no passe, porém você não precis

Leonardo Gonçalves 4 Jan 27, 2022