A library built upon PyTorch for building embeddings on discrete event sequences using self-supervision

Overview

pytorch-lifestream a library built upon PyTorch for building embeddings on discrete event sequences using self-supervision. It can process terabyte-size volumes of raw events like game history events, clickstream data, purchase history or card transactions.

It supports various methods of self-supervised training, adapted for event sequences:

  • Contrastive Learning for Event Sequences (CoLES)
  • Contrastive Predictive Coding (CPC)
  • Replaced Token Detection (RTD) from ELECTRA
  • Next Sequence Prediction (NSP) from BERT
  • Sequences Order Prediction (SOP) from ALBERT

It supports several types of encoders, including Transformer and RNN. It also supports many types of self-supervised losses.

The following variants of the contrastive losses are supported:

Install from PyPi

pip install pytorch-lifestream

Install from source

# Ubuntu 20.04

sudo apt install python3.8 python3-venv
pip3 install pipenv

pipenv sync  --dev # install packages exactly as specified in Pipfile.lock
pipenv shell
pytest

Demo notebooks

  • Self-supervided training and embeddings for downstream task notebook
  • Self-supervided embeddings in CatBoost notebook
  • Self-supervided training and fine-tuning notebook
  • PySpark and Parquet for data preprocessing notebook

Experiments on public datasets

pytorch-lifestream usage experiments on several public event datasets are available in the separate repo

Comments
  • torch.stack in def collate_feature_dict

    torch.stack in def collate_feature_dict

    ptls/data_load/utils.py

    Hello!

    If the dataloader has a feature called target. And the batchsize is not a multiple of the length of the dataset, then an error pops up on the last batch: "Sizes of tensors must match except in dimension 0". Due to the use of torch.staсk when processing a feature startwith 'target'.

    opened by Ivanich-spb 11
  • Not supported multiGPU option from pytorchlightning.Trainer

    Not supported multiGPU option from pytorchlightning.Trainer

    Try to set Trainer(gpus=[0,1]), while using PtlsDataModule as data module, get such error:

    AttributeError: Can't pickle local object 'PtlsDataModule.__init__.<locals>.train_dataloader'

    opened by mazitovs 1
  • Correct seq_len for feature dict

    Correct seq_len for feature dict

    rec = {
        'mcc': [0, 1, 2, 3],
        'target_distribution': [0.1, 0.2, 0.4, 0.1, 0.1, 0.0],
    }
    

    How to get correct seq_len. true len: 4 possible length: 4, 6 'target_distribution' is incorrect field to get length, this is not a sequence, this is an array

    opened by ivkireev86 1
  • Save categories encodings along with model weights in demos

    Save categories encodings along with model weights in demos

    Вместе с обученной моделью необходимо сохранять обученный препроцессор и разбивку на трейн-тест. Иначе категории могут поехать и сохраненная предобученная модель станет бесполезной.

    opened by ivkireev86 1
  • Documentation index

    Documentation index

    Прототип главной страницы документации. Три секции:

    • описание моделей библиотеки
    • гайд как использовать библиотеку
    • как писать свои компоненты

    Есть краткое описание и ссылки на подробные (которые напишем потом).

    В описании модулей предложена структура библиотеки. Предполагается, что мы эти модули в ближайшее создадим и перетащим туда соответсвующие классы из библиотеки. Старые, модули, которые станут пустыми, удалим. Далее будем придерживаться схемы, описанной в этом документе.

    На ревью предлагается чекнуть предлагаемую структуру библиотеки, названия модулей ну и сам описательный текст документа.

    opened by ivkireev86 1
  • KL cyclostationarity test tools

    KL cyclostationarity test tools

    Test provides a hystogram with self-samples similarity vs. random sample similarity. Shows compatibility with CoLES.

    Think about tests for other frameworks.

    opened by ivkireev86 0
  • Repair pyspark tests

    Repair pyspark tests

    def test_dt_to_timestamp(): spark = SparkSession.builder.getOrCreate() df = spark.createDataFrame(data=[ {'dt': '1970-01-01 00:00:00'}, {'dt': '2012-01-01 12:01:16'}, {'dt': '2021-12-30 00:00:00'} ])

        df = df.withColumn('ts', dt_to_timestamp('dt'))
        ts = [rec.ts for rec in df.select('ts').collect()]
    
      assert ts == [0, 1325419276, 1640822400]
    

    E assert [-10800, 1325...6, 1640811600] == [0, 1325419276, 1640822400] E At index 0 diff: -10800 != 0 E Use -v to get more diff

    ptls_tests/test_preprocessing/test_pyspark/test_event_time.py:16: AssertionError


    def test_datetime_to_timestamp(): t = DatetimeToTimestamp(col_name_original='dt') spark = SparkSession.builder.getOrCreate() df = spark.createDataFrame(data=[ {'dt': '1970-01-01 00:00:00', 'rn': 1}, {'dt': '2012-01-01 12:01:16', 'rn': 2}, {'dt': '2021-12-30 00:00:00', 'rn': 3} ]) df = t.fit_transform(df) et = [rec.event_time for rec in df.select('event_time').collect()]

      assert et[0] == 0
    

    E assert -10800 == 0

    ptls_tests/test_preprocessing/test_pyspark/test_event_time.py:48: AssertionError

    opened by ikretus 0
  • docs. Development guide (for demo notebooks)

    docs. Development guide (for demo notebooks)

    • add current patterns
    • when model training start print message "model training stats, please wait. See tensorboard to track progress", use it with enable_progress=False
    documentation user feedback 
    opened by ivkireev86 0
Releases(v0.5.1)
  • v0.5.1(Dec 28, 2022)

    What's Changed

    • fixed cpc import by @ArtyomVorobev in https://github.com/dllllb/pytorch-lifestream/pull/90
    • add softmaxloss and tests by @ArtyomVorobev in https://github.com/dllllb/pytorch-lifestream/pull/87
    • MLM NSP Module by @mazitovs in https://github.com/dllllb/pytorch-lifestream/pull/88
    • fix test dropout error by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/91

    New Contributors

    • @ArtyomVorobev made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/90
    • @mazitovs made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/88

    Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.5.0...v0.5.1

    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Nov 9, 2022)

    What's Changed

    • Fix metrics reset by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/72
    • Pandas preprocessing without df copy, faster preprocessing for large datasets by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/73
    • fix in supervised-sequence-to-target.ipynb by @blinovpd in https://github.com/dllllb/pytorch-lifestream/pull/74
    • ptls.nn.PBDropout by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/75
    • tanh for rnn starter by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/76
    • Auc regr metric by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/78
    • spatial dropout for NoisyEmbedding, LastMaxAvgEncoder, warning for bidir RnnEncoder by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/80
    • Hparam tuning demo. hydra, optuna, tensorboard by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/81
    • tabformer by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/83
    • Supervised Coles Module, trx_encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/84

    New Contributors

    • @blinovpd made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/74

    Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.4.0...v0.5.0

    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Jul 27, 2022)

    What's Changed

    • Seq encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/29
    • regr.task ZILNLoss, RMSE, BucketAccuracy by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/36
    • lighting modules and nn layers refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/34
    • Demo colab by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/40
    • Fix drop target arrays by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/42
    • feature naming by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/43
    • Update abs_module.py by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/37
    • Extended inference demo by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/45
    • fix import path by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/46
    • Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/50
    • Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/52
    • Target dist by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/58
    • Data load refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/60
    • doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/62
    • doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/63

    New Contributors

    • @ikretus made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/36

    Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.3.0...v0.4.0

    What's Changed

    • Seq encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/29
    • regr.task ZILNLoss, RMSE, BucketAccuracy by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/36
    • lighting modules and nn layers refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/34
    • Demo colab by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/40
    • Fix drop target arrays by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/42
    • feature naming by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/43
    • Update abs_module.py by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/37
    • Extended inference demo by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/45
    • fix import path by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/46
    • Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/50
    • Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/52
    • Target dist by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/58
    • Data load refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/60
    • doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/62
    • doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/63

    New Contributors

    • @ikretus made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/36

    Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.3.0...v0.4.0

    What's Changed

    • Seq encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/29
    • regr.task ZILNLoss, RMSE, BucketAccuracy by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/36
    • lighting modules and nn layers refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/34
    • Demo colab by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/40
    • Fix drop target arrays by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/42
    • feature naming by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/43
    • Update abs_module.py by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/37
    • Extended inference demo by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/45
    • fix import path by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/46
    • Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/50
    • Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/52
    • Target dist by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/58
    • Data load refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/60
    • doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/62
    • doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/63

    New Contributors

    • @ikretus made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/36

    Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.3.0...v0.4.0

    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Jun 12, 2022)

    More Pythonic Core API: constructor arguments instead of config objects

    What's Changed

    • cpc params by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/9
    • All modules by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/15
    • Mlm pretrain by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/13
    • all encoders and get rid of get_loss by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/19
    • init by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/20
    • Documentation index by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/8
    • Demos api update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/18
    • loss output correction by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/22
    • Test fixes by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/23
    • readme_demo_link by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/25
    • init by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/26
    • work without logger by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/7
    • trx_encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/28

    Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.1.2...v0.3.0

    Source code(tar.gz)
    Source code(zip)
Owner
Dmitri Babaev
Dmitri Babaev
Code for CMaskTrack R-CNN (proposed in Occluded Video Instance Segmentation)

CMaskTrack R-CNN for OVIS This repo serves as the official code release of the CMaskTrack R-CNN model on the Occluded Video Instance Segmentation data

Q . J . Y 61 Nov 25, 2022
A general framework for deep learning experiments under PyTorch based on pytorch-lightning

torchx Torchx is a general framework for deep learning experiments under PyTorch based on pytorch-lightning. TODO list gan-like training wrapper text

Yingtian Liu 6 Mar 17, 2022
Directed Greybox Fuzzing with AFL

AFLGo: Directed Greybox Fuzzing AFLGo is an extension of American Fuzzy Lop (AFL). Given a set of target locations (e.g., folder/file.c:582), AFLGo ge

380 Nov 24, 2022
This is an official PyTorch implementation of Task-Adaptive Neural Network Search with Meta-Contrastive Learning (NeurIPS 2021, Spotlight).

NeurIPS 2021 (Spotlight): Task-Adaptive Neural Network Search with Meta-Contrastive Learning This is an official PyTorch implementation of Task-Adapti

Wonyong Jeong 15 Nov 21, 2022
Learning Synthetic Environments and Reward Networks for Reinforcement Learning

Learning Synthetic Environments and Reward Networks for Reinforcement Learning We explore meta-learning agent-agnostic neural Synthetic Environments (

AutoML-Freiburg-Hannover 16 Sep 02, 2022
Single-step adversarial training (AT) has received wide attention as it proved to be both efficient and robust.

Subspace Adversarial Training Single-step adversarial training (AT) has received wide attention as it proved to be both efficient and robust. However,

15 Sep 02, 2022
Goal of the project : Detecting Temporal Boundaries in Sign Language videos

MVA RecVis course final project : Goal of the project : Detecting Temporal Boundaries in Sign Language videos. Sign language automatic indexing is an

Loubna Ben Allal 6 Dec 21, 2022
The official codes for the ICCV2021 Oral presentation "Rethinking Counting and Localization in Crowds: A Purely Point-Based Framework"

P2PNet (ICCV2021 Oral Presentation) This repository contains codes for the official implementation in PyTorch of P2PNet as described in Rethinking Cou

Tencent YouTu Research 208 Dec 26, 2022
fastgradio is a python library to quickly build and share gradio interfaces of your trained fastai models.

fastgradio is a python library to quickly build and share gradio interfaces of your trained fastai models.

Ali Abdalla 34 Jan 05, 2023
Check out the StyleGAN repo and place it in the same directory hierarchy as the present repo

Variational Model Inversion Attacks Kuan-Chieh Wang, Yan Fu, Ke Li, Ashish Khisti, Richard Zemel, Alireza Makhzani Most commands are in run_scripts. W

Jackson Wang 15 Dec 26, 2022
Deep Learning and Logical Reasoning from Data and Knowledge

Logic Tensor Networks (LTN) Logic Tensor Network (LTN) is a neurosymbolic framework that supports querying, learning and reasoning with both rich data

171 Dec 29, 2022
Predict bus arrival time using VertexAI and Nvidia's Jetson Nano

bus_prediction predict bus arrival time using VertexAI and Nvidia's Jetson Nano imagenet the command for imagenet.py look like this python3 /path/to/i

10 Dec 22, 2022
Official Pytorch implementation of 'GOCor: Bringing Globally Optimized Correspondence Volumes into Your Neural Network' (NeurIPS 2020)

Official implementation of GOCor This is the official implementation of our paper : GOCor: Bringing Globally Optimized Correspondence Volumes into You

Prune Truong 71 Nov 18, 2022
Reading Group @mila-iqia on Computational Optimal Transport for Machine Learning Applications

Computational Optimal Transport for Machine Learning Reading Group Over the last few years, optimal transport (OT) has quickly become a central topic

Ali Harakeh 11 Aug 26, 2022
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR2021)

NExT-QA We reproduce some SOTA VideoQA methods to provide benchmark results for our NExT-QA dataset accepted to CVPR2021 (with 1 'Strong Accept' and 2

Junbin Xiao 50 Nov 24, 2022
A library that can print Python objects in human readable format

objprint A library that can print Python objects in human readable format Install pip install objprint Usage op Use op() (or objprint()) to print obj

319 Dec 25, 2022
This is the official implementation of Elaborative Rehearsal for Zero-shot Action Recognition (ICCV2021)

Elaborative Rehearsal for Zero-shot Action Recognition This is an official implementation of: Shizhe Chen and Dong Huang, Elaborative Rehearsal for Ze

DeLightCMU 26 Sep 24, 2022
A fast poisson image editing implementation that can utilize multi-core CPU or GPU to handle a high-resolution image input.

Poisson Image Editing - A Parallel Implementation Jiayi Weng (jiayiwen), Zixu Chen (zixuc) Poisson Image Editing is a technique that can fuse two imag

Jiayi Weng 110 Dec 27, 2022
Pose Detection and Machine Learning for real-time body posture analysis during exercise to provide audiovisual feedback on improvement of form.

Posture: Pose Tracking and Machine Learning for prescribing corrective suggestions to improve posture and form while exercising. This repository conta

Pratham Mehta 10 Nov 11, 2022