4th place solution to datafactory challenge by Intermarché.

Overview

Solution to Datafactory challenge by Intermarché.

4th place solution to datafactory challenge by Intermarché. The objective of the challenge is to predict the sales made by intermarche in the first quarter of 2019. We have the data of the past year (2018) to train our model to fit the sales.

Data 💿

We have the record of sales for a set of pairs (store, item) and for each day of 2018 (if there was at least one sale). The data are structured as:

date store item quantity
2018-01-01 1 12 1
2018-01-01 1 17 2
2018-01-01 1 22 3

We have additional tables available such as:

  • Product characteristics.
  • Store characteristics.
  • Product prices by store and by quarter.

Solution 🤖

The main difficulty of the challenge is to find the days for which a store has recorded no sales for a given product. Indeed, Intermarché does not provide records for which the target variable (quantity) is equal to 0. I found that adding up to 5 zeros after a sale for a given pair (store / item) maximized the performance of my model and limited the overfitting of my aggregates.

Features:

  • Aggregates by item / store (mean + std)
  • Aggregates on prices. (mean)
  • Aggregates on the characteristics of the stores. (mean)
  • Aggregates on product characteristics. (mean)
  • Rolling medians over the last 9 weeks.
  • Features on dates. (weekend / holidays / day of the week)

I used LightGBM and performed a 3-fold cross-validation with bagging to make my prediction. I transformed the target variable to train my model using quantity = log(1 + quantity). Poisson loss helps a bit. I didn't look for the hyperparameters of the model.

Finally I set all predictions of February and March as the predictions of the second and third week of January.

Also I set to 0 the set of predictions associated to triplets (store / item / day of the week) for which we have not enough records in the training set.

Run ♻️

To reproduce my results, you must download the data in the folder data/raw.

python scripts/prepare_raw_data.py
python scripts/features/aggs_items.py
python scripts/features/aggs_prices.py
python scripts/features/aggs_stores.py
python scripts/features/aggs.py 
python scripts/features/lags.py
python scripts/features/cal.py 
python scripts/make_train_test.py
python scripts/learn.py
python scripts/polish_sub.py

License

This project is free and open-source software licensed under the MIT license.

Owner
Raphael Sourty
PhD Student @ IRIT and Renault
Raphael Sourty
This is a Pytorch implementation of paper: DropEdge: Towards Deep Graph Convolutional Networks on Node Classification

DropEdge: Towards Deep Graph Convolutional Networks on Node Classification This is a Pytorch implementation of paper: DropEdge: Towards Deep Graph Con

401 Dec 16, 2022
Codebase for INVASE: Instance-wise Variable Selection - 2019 ICLR

Codebase for "INVASE: Instance-wise Variable Selection" Authors: Jinsung Yoon, James Jordon, Mihaela van der Schaar Paper: Jinsung Yoon, James Jordon,

Jinsung Yoon 50 Nov 11, 2022
imbalanced-DL: Deep Imbalanced Learning in Python

imbalanced-DL: Deep Imbalanced Learning in Python Overview imbalanced-DL (imported as imbalanceddl) is a Python package designed to make deep imbalanc

NTUCSIE CLLab 19 Dec 28, 2022
This thesis is mainly concerned with state-space methods for a class of deep Gaussian process (DGP) regression problems

Doctoral dissertation of Zheng Zhao This thesis is mainly concerned with state-space methods for a class of deep Gaussian process (DGP) regression pro

Zheng Zhao 21 Nov 14, 2022
A Java implementation of the experiments for the paper "k-Center Clustering with Outliers in Sliding Windows"

OutliersSlidingWindows A Java implementation of the experiments for the paper "k-Center Clustering with Outliers in Sliding Windows" Dataset generatio

PaoloPellizzoni 0 Jan 05, 2022
Learning Dense Representations of Phrases at Scale (Lee et al., 2020)

DensePhrases DensePhrases provides answers to your natural language questions from the entire Wikipedia in real-time. While it efficiently searches th

Princeton Natural Language Processing 540 Dec 30, 2022
Which Style Makes Me Attractive? Interpretable Control Discovery and Counterfactual Explanation on StyleGAN

Interpretable Control Exploration and Counterfactual Explanation (ICE) on StyleGAN Which Style Makes Me Attractive? Interpretable Control Discovery an

Bo Li 11 Dec 01, 2022
Benchmarks for semi-supervised domain generalization.

Semi-Supervised Domain Generalization This code is the official implementation of the following paper: Semi-Supervised Domain Generalization with Stoc

Kaiyang 49 Dec 10, 2022
BMW TechOffice MUNICH 148 Dec 21, 2022
This is the code of using DQN to play Sekiro .

Update for using DQN to play sekiro 2021.2.2(English Version) This is the code of using DQN to play Sekiro . I am very glad to tell that I have writen

144 Dec 25, 2022
Code for the paper "PortraitNet: Real-time portrait segmentation network for mobile device" @ CAD&Graphics2019

PortraitNet Code for the paper "PortraitNet: Real-time portrait segmentation network for mobile device". @ CAD&Graphics 2019 Introduction We propose a

265 Dec 01, 2022
TAUFE: Task-Agnostic Undesirable Feature DeactivationUsing Out-of-Distribution Data

A deep neural network (DNN) has achieved great success in many machine learning tasks by virtue of its high expressive power. However, its prediction can be easily biased to undesirable features, whi

KAIST Data Mining Lab 8 Dec 07, 2022
A tutorial on DataFrames.jl prepared for JuliaCon2021

JuliaCon2021 DataFrames.jl Tutorial This is a tutorial on DataFrames.jl prepared for JuliaCon2021. A video recording of the tutorial is available here

Bogumił Kamiński 106 Jan 09, 2023
Joint Unsupervised Learning (JULE) of Deep Representations and Image Clusters.

Joint Unsupervised Learning (JULE) of Deep Representations and Image Clusters. Overview This project is a Torch implementation for our CVPR 2016 paper

Jianwei Yang 278 Dec 25, 2022
Text Summarization - WCN — Weighted Contextual N-gram method for evaluation of Text Summarization

Text Summarization WCN — Weighted Contextual N-gram method for evaluation of Text Summarization In this project, I fine tune T5 model on Extreme Summa

Aditya Shah 1 Jan 03, 2022
Airborne Optical Sectioning (AOS) is a wide synthetic-aperture imaging technique

AOS: Airborne Optical Sectioning Airborne Optical Sectioning (AOS) is a wide synthetic-aperture imaging technique that employs manned or unmanned airc

JKU Linz, Institute of Computer Graphics 39 Dec 09, 2022
Codes for the ICCV'21 paper "FREE: Feature Refinement for Generalized Zero-Shot Learning"

FREE This repository contains the reference code for the paper "FREE: Feature Refinement for Generalized Zero-Shot Learning". [arXiv][Paper] 1. Prepar

Shiming Chen 28 Jul 29, 2022
Resources for the "Evaluating the Factual Consistency of Abstractive Text Summarization" paper

Evaluating the Factual Consistency of Abstractive Text Summarization Authors: Wojciech Kryściński, Bryan McCann, Caiming Xiong, and Richard Socher Int

Salesforce 165 Dec 21, 2022
Reinforcement learning library in JAX.

Reinforcement learning library in JAX.

Yicheng Luo 96 Oct 30, 2022
CAUSE: Causality from AttribUtions on Sequence of Events

CAUSE: Causality from AttribUtions on Sequence of Events

Wei Zhang 21 Dec 01, 2022