4th place solution to datafactory challenge by Intermarché.

Last update: Mar 19, 2022

Related tags

Overview

Solution to Datafactory challenge by Intermarché.

4th place solution to datafactory challenge by Intermarché. The objective of the challenge is to predict the sales made by intermarche in the first quarter of 2019. We have the data of the past year (2018) to train our model to fit the sales.

Data 💿

We have the record of sales for a set of pairs (store, item) and for each day of 2018 (if there was at least one sale). The data are structured as:

date	store	item	quantity
2018-01-01	1	12	1
2018-01-01	1	17	2
2018-01-01	1	22	3

We have additional tables available such as:

Product characteristics.
Store characteristics.
Product prices by store and by quarter.

Solution 🤖

The main difficulty of the challenge is to find the days for which a store has recorded no sales for a given product. Indeed, Intermarché does not provide records for which the target variable (quantity) is equal to 0. I found that adding up to 5 zeros after a sale for a given pair (store / item) maximized the performance of my model and limited the overfitting of my aggregates.

Features:

Aggregates by item / store (mean + std)
Aggregates on prices. (mean)
Aggregates on the characteristics of the stores. (mean)
Aggregates on product characteristics. (mean)
Rolling medians over the last 9 weeks.
Features on dates. (weekend / holidays / day of the week)

I used LightGBM and performed a 3-fold cross-validation with bagging to make my prediction. I transformed the target variable to train my model using quantity = log(1 + quantity). Poisson loss helps a bit. I didn't look for the hyperparameters of the model.

Finally I set all predictions of February and March as the predictions of the second and third week of January.

Also I set to 0 the set of predictions associated to triplets (store / item / day of the week) for which we have not enough records in the training set.

Run ♻️

To reproduce my results, you must download the data in the folder data/raw.

python scripts/prepare_raw_data.py
python scripts/features/aggs_items.py
python scripts/features/aggs_prices.py
python scripts/features/aggs_stores.py
python scripts/features/aggs.py 
python scripts/features/lags.py
python scripts/features/cal.py 
python scripts/make_train_test.py
python scripts/learn.py
python scripts/polish_sub.py

License

This project is free and open-source software licensed under the MIT license.

4th place solution to datafactory challenge by Intermarché.

Related tags

Overview

Solution to Datafactory challenge by Intermarché.

Data 💿

Solution 🤖

Run ♻️

License

Owner

Raphael Sourty

Ego4d dataset repository. Download the dataset, visualize, extract features & example usage of the dataset

[CVPR 2021] Few-shot 3D Point Cloud Semantic Segmentation

This repo contains the implementation of YOLOv2 in Keras with Tensorflow backend.

The BCNet related data and inference model.

An official source code for paper Deep Graph Clustering via Dual Correlation Reduction, accepted by AAAI 2022

Pytorch code for "DPFM: Deep Partial Functional Maps" - 3DV 2021 (Oral)

Official Pytorch Implementation of: "Semantic Diversity Learning for Zero-Shot Multi-label Classification"(2021) paper

UMT is a unified and flexible framework which can handle different input modality combinations, and output video moment retrieval and/or highlight detection results.

StarGAN - Official PyTorch Implementation (CVPR 2018)

Tensors and neural networks in Haskell

A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

IPATool-py: download ipa easily

AdamW optimizer for bfloat16 models in pytorch.

Code release for the paper “Worldsheet Wrapping the World in a 3D Sheet for View Synthesis from a Single Image”, ICCV 2021.

GemNet model in PyTorch, as proposed in "GemNet: Universal Directional Graph Neural Networks for Molecules" (NeurIPS 2021)

CCAFNet: Crossflow and Cross-scale Adaptive Fusion Network for Detecting Salient Objects in RGB-D Images

Code for "PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clouds", CVPR 2021

Code for "Adversarial attack by dropping information." (ICCV 2021)

Constrained Language Models Yield Few-Shot Semantic Parsers

Python suite to construct benchmark machine learning datasets from the MIMIC-III clinical database.