Python package for machine learning for healthcare using a OMOP common data model

Last update: Jan 03, 2023

Overview

omop-learn

What is omop-learn?

This library was developed in order to facilitate rapid prototyping in Python of predictive machine-learning models using longitudinal medical data from an OMOP CDM-standard database. omop-learn supports the easy definition of predictive clinical tasks, featurizations of OMOP data, and cohorts of relevance. We further provide methods using sparse tensor implementations to rapidly manipulate the collected features in the rawest form possible, allowing for dynamic transformations of the data.

Two machine-learning models are included with the library. First, a windowed linear model, which uses various backwards-facing windows to aggregate features over different timescales, then feeds these features into a regularized logistic regression model. This model was inspired by the work of Razavian et. al. '15, and despite its simplicity is often competitive with state-of-the-art algorithms. We also include SARD (Self-Attention with Reverse Distillation), a novel deep-learning algorithm that uses self-attention to allow medical events to contextualize themselves using other events in a patient's timeline. SARD also makes use of reverse distillation, a training technique we introduce that effectively initializes a deep model using a high-performing linear proxy, in this case the windowed linear model described above -- for the details of this method and the SARD architecture, please see our paper Kodialam et al. AAAI '21.

Documentation

For a more detailed summary of omop-learn's data collection pipeline, and for documentation of functions, please see the full documentation for this repo, which also describes the process of creating one's own cohorts, predictive tasks, and features.

Dependencies

The following libraries are necessary to run omop-learn:

numpy
sqlalchemy
pandas
torch
sklearn
matplotlib
ipywidgets
IPython.display
gensim.models
scipy.sparse
sparse

Note that sparse is the PyData Sparse library, documented here

Running omop-learn

We provide several example notebooks, which all use an example task of predicting mortality over a six-month window for patients over the age of 70.

End of Life Linear Model Example.ipynb and End of Life Deep Model Example.ipynb run the windowed linear and deep SARD models respectively -- note that your machine must be able to access a GPU in order to run the deep models.
End of Life Linear Model Example (With Nontemporal Features).ipynb demonstrates how to add nontemporal features.
End of Life Linear Model Ancestors Example.ipynb demonstrates how to add feature ancestors.
End of Life Linear Model Example More Prediction Times.ipynb uses a larger dataset with predictions from any date within a time range.

To run the models, first set up the file config.py with connection information for your Postgres server containing an OMOP CDM database. Then, simply run through the cells of the notebook in order. Further documentation of the exact steps taken to define a task, collect data, and run a predictive model are embedded within the notebooks.

Contributors and Acknowledgements

Omop-learn was written by Rohan Kodialam and Jake Marcus, with additional contributions by Rebecca Boiarsky, Ike Lage, and Shannon Hwang.

This package was developed as part of a collaboration with Independence Blue Cross and would not have been possible without the advice and support of Aaron Smith-McLallen, Ravi Chawla, Kyle Armstrong, Luogang Wei, and Jim Denyer.

Python package for machine learning for healthcare using a OMOP common data model

Related tags

Overview

omop-learn

What is omop-learn?

Documentation

Dependencies

Running omop-learn

Contributors and Acknowledgements

Owner

Sontag Lab

Neighbourhood Retrieval (Nearest Neighbours) with Distance Correlation.

This jupyter notebook project was completed by me and my friend using the dataset from Kaggle

Price forecasting of SGB and IRFC Bonds and comparing there returns

Machine Learning Algorithms ( Desion Tree, XG Boost, Random Forest )

A machine learning model for Covid case prediction

Pyomo is an object-oriented algebraic modeling language in Python for structured optimization problems.

Used Logistic Regression, Random Forest, and XGBoost to predict the outcome of Search & Destroy games from the Call of Duty World League for the 2018 and 2019 seasons.

High performance implementation of Extreme Learning Machines (fast randomized neural networks).

Skforecast is a python library that eases using scikit-learn regressors as multi-step forecasters

cleanlab is the data-centric ML ops package for machine learning with noisy labels.

Uses WiFi signals :signal_strength: and machine learning to predict where you are

Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.

Kats is a toolkit to analyze time series data, a lightweight, easy-to-use, and generalizable framework to perform time series analysis.

Software Engineer Salary Prediction

Built on python (Mathematical straight fit line coordinates error predictor machine learning foundational model)

onelearn: Online learning in Python

Breast-Cancer-Classification - Using SKLearn breast cancer dataset which contains 569 examples and 32 features classifying has been made with 6 different algorithms

This is the code repository for Interpretable Machine Learning with Python, published by Packt.

MaD GUI is a basis for graphical annotation and computational analysis of time series data.

Sleep stages are classified with the help of ML. We have used 4 different ML algorithms (SVM, KNN, RF, NN) to demonstrate them