The easiest tool for extracting radiomics features and training ML models on them.

Last update: Aug 04, 2022

Overview

Simple pipeline for experimenting with radiomics features

Installation

git clone https://github.com/piotrekwoznicki/ClassyRadiomics.git
cd classrad
pip install -e .

Example - Hydronephrosis detection from CT images:

Extract radiomics features and save them to CSV table

df = pd.read_csv(table_dir / "paths.csv")
extractor = FeatureExtractor(
    df=df,
    out_path=(table_dir / "features.csv"),
    image_col="img_path",
    mask_col="seg_path",
    verbose=True,
)
extractor.extract_features()

Create a dataset from the features table

feature_df = pd.read_csv(table_dir / "features.csv")
data = Dataset(
    dataframe=feature_df,
    features=feature_cols,
    target=label_col="Hydronephrosis",
    task_name="Hydronephrosis detection"
)
data.cross_validation_split_test_from_column(
    column_name="cohort", test_value="control"
)

Select classifiers to compare

classifier_names = [
    "Gaussian Process Classifier",
    "Logistic Regression",
    "SVM",
    "Random Forest",
    "XGBoost",
]
classifiers = [MLClassifier(name) for name in classifier_names]

Create an evaluator to train and evaluate selected classifiers

evaluator = Evaluator(dataset=data, models=classifiers)
evaluator.evaluate_cross_validation()
evaluator.boxplot_by_class()
evaluator.plot_all_cross_validation()
evaluator.plot_test()

Comments

Preprocessing features fails during machine learning

Describe the bug

Trying to use Machine Learning in the self-hosted webapp, as well as in example_WORC.ipynb fails.

Steps/Code to Reproduce

import pandas as pd
from pathlib import Path
from autorad.external.download_WORC import download_WORCDatabase

# Set where we will save our data and results
base_dir = Path.cwd() / "autorad_tutorial"
data_dir = base_dir / "data"
result_dir = base_dir / "results"
data_dir.mkdir(exist_ok=True, parents=True)
result_dir.mkdir(exist_ok=True, parents=True)

%load_ext autoreload
%autoreload 2

download data (it may take a few minutes)
download_WORCDatabase(
dataset="Desmoid",
data_folder=data_dir,
n_subjects=100,
)

from autorad.utils.preprocessing import get_paths_with_separate_folder_per_case

# create a table with all the paths
paths_df = get_paths_with_separate_folder_per_case(data_dir, relative=True)
paths_df.sample(5)


from autorad.data.dataset import ImageDataset
from autorad.feature_extraction.extractor import FeatureExtractor
import logging

logging.getLogger().setLevel(logging.CRITICAL)

image_dataset = ImageDataset(
    paths_df,
    ID_colname="ID",
    root_dir=data_dir,
)

# Let's take a look at the data, plotting random 10 cases
image_dataset.plot_examples(n=10, window=None)

extractor = FeatureExtractor(image_dataset, extraction_params="MR_default.yaml")
feature_df = extractor.run()

feature_df.head()

label_df = pd.read_csv(data_dir / "labels.csv")
label_df.sample(5)

from autorad.data.dataset import FeatureDataset

merged_feature_df = feature_df.merge(label_df, left_on="ID",
    right_on="patient_ID", how="left")
feature_dataset = FeatureDataset(
    merged_feature_df,
    target="diagnosis",
    ID_colname="ID"
)

splits_path = result_dir / "splits.json"
feature_dataset.split(method="train_val_test", save_path=splits_path)

from autorad.models.classifier import MLClassifier
from autorad.training.trainer import Trainer

models = MLClassifier.initialize_default_sklearn_models()
print(models)

trainer = Trainer(
    dataset=feature_dataset,
    models=models,
    result_dir=result_dir,
    experiment_name="Fibromatosis_vs_sarcoma_classification",
)
trainer.run_auto_preprocessing(
        selection_methods=["boruta"],
        oversampling=False,
        )

Expected Results

Initialising the trainer and running preprocessing on the features

Actual Results

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [15], in <cell line: 7>()
      1 trainer = Trainer(
      2     dataset=feature_dataset,
      3     models=models,
      4     result_dir=result_dir,
      5     experiment_name="Fibromatosis_vs_sarcoma_classification",
      6 )
----> 7 trainer.run_auto_preprocessing(
      8         selection_methods=["boruta"],
      9         oversampling=False,
     10         )

File ~/AutoRadiomics/autorad/training/trainer.py:78, in Trainer.run_auto_preprocessing(self, oversampling, selection_methods)
     70 preprocessor = Preprocessor(
     71     normalize=True,
     72     feature_selection_method=selection_method,
     73     oversampling_method=oversampling_method,
     74 )
     75 try:
     76     preprocessed[selection_method][
     77         oversampling_method
---> 78     ] = preprocessor.fit_transform(self.dataset.data)
     79 except AssertionError:
     80     log.error(
     81         f"Preprocessing with {selection_method} and {oversampling_method} failed."
     82     )

File ~/AutoRadiomics/autorad/preprocessing/preprocessor.py:66, in Preprocessor.fit_transform(self, data)
     64 result_y = {}
     65 all_features = X.train.columns.tolist()
---> 66 X_train_trans, y_train_trans = self.pipeline.fit_transform(
     67     X.train, y.train
     68 )
     69 self.selected_features = self.pipeline["select"].selected_features(
     70     column_names=all_features
     71 )
     72 result_X["train"] = pd.DataFrame(
     73     X_train_trans, columns=self.selected_features
     74 )

File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/sklearn/pipeline.py:434, in Pipeline.fit_transform(self, X, y, **fit_params)
    432 fit_params_last_step = fit_params_steps[self.steps[-1][0]]
    433 if hasattr(last_step, "fit_transform"):
--> 434     return last_step.fit_transform(Xt, y, **fit_params_last_step)
    435 else:
    436     return last_step.fit(Xt, y, **fit_params_last_step).transform(Xt)

File ~/AutoRadiomics/autorad/feature_selection/selector.py:47, in CoreSelector.fit_transform(self, X, y)
     44 def fit_transform(
     45     self, X: np.ndarray, y: np.ndarray
     46 ) -> tuple[np.ndarray, np.ndarray]:
---> 47     self.fit(X, y)
     48     return X[:, self.selected_columns], y

File ~/AutoRadiomics/autorad/feature_selection/selector.py:124, in BorutaSelector.fit(self, X, y, verbose)
    122 with warnings.catch_warnings():
    123     warnings.simplefilter("ignore")
--> 124     model.fit(X, y)
    125 self.selected_columns = np.where(model.support_)[0].tolist()
    126 if not self.selected_columns:

File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/boruta/boruta_py.py:201, in BorutaPy.fit(self, X, y)
    188 def fit(self, X, y):
    189     """
    190     Fits the Boruta feature selection with the provided estimator.
    191 
   (...)
    198         The target values.
    199     """
--> 201     return self._fit(X, y)

File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/boruta/boruta_py.py:251, in BorutaPy._fit(self, X, y)
    249 def _fit(self, X, y):
    250     # check input params
--> 251     self._check_params(X, y)
    252     self.random_state = check_random_state(self.random_state)
    253     # setup variables for Boruta

File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/boruta/boruta_py.py:517, in BorutaPy._check_params(self, X, y)
    513 """
    514 Check hyperparameters as well as X and y before proceeding with fit.
    515 """
    516 # check X and y are consistent len, X is Array and y is column
--> 517 X, y = check_X_y(X, y)
    518 if self.perc <= 0 or self.perc > 100:
    519     raise ValueError('The percentile should be between 0 and 100.')

File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/sklearn/utils/validation.py:964, in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)
    961 if y is None:
    962     raise ValueError("y cannot be None")
--> 964 X = check_array(
    965     X,
    966     accept_sparse=accept_sparse,
    967     accept_large_sparse=accept_large_sparse,
    968     dtype=dtype,
    969     order=order,
    970     copy=copy,
    971     force_all_finite=force_all_finite,
    972     ensure_2d=ensure_2d,
    973     allow_nd=allow_nd,
    974     ensure_min_samples=ensure_min_samples,
    975     ensure_min_features=ensure_min_features,
    976     estimator=estimator,
    977 )
    979 y = _check_y(y, multi_output=multi_output, y_numeric=y_numeric)
    981 check_consistent_length(X, y)

File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/sklearn/utils/validation.py:746, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    744         array = array.astype(dtype, casting="unsafe", copy=False)
    745     else:
--> 746         array = np.asarray(array, order=order, dtype=dtype)
    747 except ComplexWarning as complex_warning:
    748     raise ValueError(
    749         "Complex data not supported\n{}\n".format(array)
    750     ) from complex_warning

ValueError: could not broadcast input array from shape (60,1015) into shape (60,)

opened by wagon-master 3

BUG: Time and memory inefficient concating in pandas on every case.

In the feature extraction, we concat a pd.DataFrame for every case. AFAIK this construction of a pd.DataFrame leads to a new memory allocation (and copying) every time, which is highly memory inefficient. Especially, when parallelized on many CPUs, combined with the already memory intensive forking in joblib this can lead to OOM-Events (and is slow of course). Wouldn't it be more convenient to return only the feature set, that is currently processed. https://github.com/pwoznicki/AutoRadiomics/blob/e475893c566de057d742f32da5cb9ece23a44eb0/autorad/feature_extraction/extractor.py#L109-L115 These are subsequently collected in results anyways: https://github.com/pwoznicki/AutoRadiomics/blob/e475893c566de057d742f32da5cb9ece23a44eb0/autorad/feature_extraction/extractor.py#L135-L144

opened by laqua-stack 2
Feature/add inference mlflow
Major changes:

fixed training with autologging of training parameters, preprocessor and classifier in MLFlow

webapp: added Predict subpage for inference on a single case, giving out class probability and Shap explanation

webapp: moved all steps into subpages

webapp: added Getting started in the landing page

Fixes:

webapp: fixed extraction params discarding Feature Names selected from Feature Classes
opened by pwoznicki 1

example_WORC.ipynb not being up to date with the repository

Describe the bug

In example_WORC.ipynb there are function calls that do not work due to code in the repository being changed while the example_WORC.ipynb code wasn't updated to reflect those changes

Steps/Code to Reproduce

import pandas as pd
from pathlib import Path
from autorad.external.download_WORC import download_WORCDatabase

# Set where we will save our data and results
base_dir = Path.cwd() / "autorad_tutorial"
data_dir = base_dir / "data"
result_dir = base_dir / "results"
data_dir.mkdir(exist_ok=True, parents=True)
result_dir.mkdir(exist_ok=True, parents=True)

%load_ext autoreload
%autoreload 2

download data (it may take a few minutes)
download_WORCDatabase(
dataset="Desmoid",
data_folder=data_dir,
n_subjects=100,
)



from autorad.data.utils import get_paths_with_separate_folder_per_case  # 1

# create a table with all the paths
paths_df = get_paths_with_separate_folder_per_case(data_dir, relative=True)
paths_df.sample(5)


from autorad.data.dataset import ImageDataset
from autorad.feature_extraction.extractor import FeatureExtractor
import logging

logging.getLogger().setLevel(logging.CRITICAL)

image_dataset = ImageDataset(
    paths_df,
    ID_colname="ID",
    root_dir=data_dir,
)

# Let's take a look at the data, plotting random 10 cases
image_dataset.plot_examples(n=10, window=None)

extractor = FeatureExtractor(image_dataset, extraction_params="default_MR.yaml") # 2
feature_df = extractor.run()

Expected Results

1: Importing the function get_paths_with_separate_folder_per_case

2: Using default_MR.yaml as value for extraction_params

Actual Results

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Input In [7], in <cell line: 1>()
----> 1 from autorad.data.utils import get_paths_with_separate_folder_per_case
      3 # create a table with all the paths
      4 paths_df = get_paths_with_separate_folder_per_case(data_dir, relative=True)

ModuleNotFoundError: No module named 'autorad.data.utils'

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [18], in <cell line: 1>()
----> 1 extractor = FeatureExtractor(image_dataset, extraction_params="default_MR.yaml")
      2 feature_df = extractor.run()

File ~/AutoRadiomics/autorad/feature_extraction/extractor.py:41, in FeatureExtractor.__init__(self, dataset, feature_set, extraction_params, n_jobs)
     39 self.dataset = dataset
     40 self.feature_set = feature_set
---> 41 self.extraction_params = self._get_extraction_param_path(
     42     extraction_params
     43 )
     44 log.info(f"Using extraction params from {self.extraction_params}")
     45 self.n_jobs = set_n_jobs(n_jobs)

File ~/AutoRadiomics/autorad/feature_extraction/extractor.py:55, in FeatureExtractor._get_extraction_param_path(self, extraction_params)
     53     result = default_extraction_param_dir / extraction_params
     54 else:
---> 55     raise ValueError(
     56         f"Extraction parameter file {extraction_params} not found."
     57     )
     58 return result

ValueError: Extraction parameter file default_MR.yaml not found.

Fix

1: change from autorad.data.utils to from autorad.utils.preprocessing 2: change extractor = FeatureExtractor(image_dataset, extraction_params="default_MR.yaml") to extractor = FeatureExtractor(image_dataset, extraction_params="MR_default.yaml")

opened by wagon-master 1

Bugfix/refactor
New features:

log feature dataset and splits in MLFlow

update docs & add getting-started

Fixes:

fix evaluation in the web app

fix docs build in readthedocss
opened by pwoznicki 0
Support various readers (Nibabel, ITK)

Currently we use Nibabel for loading images. It works only for Nifti images, but a user may want to load a DICOM image, without converting it to Nifti.

Consider using MONAI LoadImage() function that provides a common interface for loading both Nifti and DICOM images.
enhancement

opened by pwoznicki 0

Releases(v0.2.2)

v0.2.2(Jul 30, 2022)

Includes fixes for the web application, fixed bugs in spatial util functions, and function for voxel-based extraction
Source code(tar.gz)
Source code(zip)

Owner

Piotr Woźnicki

Recently graduated medical doctor, working on medical image analysis.

GitHub Repository

Share a benchmark that can easily apply reinforcement learning in Job-shop-scheduling

Gymjsp Gymjsp is an open source Python library, which uses the OpenAI Gym interface for easily instantiating and interacting with RL environments, and

134 Dec 08, 2022

Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

GPT2-Pytorch with Text-Generator Better Language Models and Their Implications Our model, called GPT-2 (a successor to GPT), was trained simply to pre

775 Jan 08, 2023

Generate Cartoon Images using Generative Adversarial Network

AvatarGAN ✨ Generate Cartoon Images using DC-GAN Deep Convolutional GAN is a generative adversarial network architecture. It uses a couple of guidelin

50 Dec 29, 2022

PyTorch implementation of InstaGAN: Instance-aware Image-to-Image Translation

InstaGAN: Instance-aware Image-to-Image Translation Warning: This repo contains a model which has potential ethical concerns. Remark that the task of

827 Dec 29, 2022

An Unsupervised Detection Framework for Chinese Jargons in the Darknet

An Unsupervised Detection Framework for Chinese Jargons in the Darknet This repo is the Python 3 implementation of 《An Unsupervised Detection Framewor

7 Nov 08, 2022

Video Corpus Moment Retrieval with Contrastive Learning (SIGIR 2021)

Video Corpus Moment Retrieval with Contrastive Learning PyTorch implementation for the paper "Video Corpus Moment Retrieval with Contrastive Learning"

42 Dec 29, 2022

PyTorch Implementation of Fully Convolutional Networks. (Training code to reproduce the original result is available.)

pytorch-fcn PyTorch implementation of Fully Convolutional Networks. Requirements pytorch = 0.2.0 torchvision = 0.1.8 fcn = 6.1.5 Pillow scipy tqdm

1.6k Jan 07, 2023

Denoising Normalizing Flow

Denoising Normalizing Flow Christian Horvat and Jean-Pascal Pfister 2021 We combine Normalizing Flows (NFs) and Denoising Auto Encoder (DAE) by introd

17 Oct 15, 2022

A parallel framework for population-based multi-agent reinforcement learning.

MALib: A parallel framework for population-based multi-agent reinforcement learning MALib is a parallel framework of population-based learning nested

348 Jan 08, 2023

Here is the implementation of our paper S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations.

S2VC Here is the implementation of our paper S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations. In thi

81 Dec 15, 2022

The easiest tool for extracting radiomics features and training ML models on them.

Related tags

Overview

Simple pipeline for experimenting with radiomics features

Installation

Example - Hydronephrosis detection from CT images:

Extract radiomics features and save them to CSV table

Create a dataset from the features table

Select classifiers to compare

Create an evaluator to train and evaluate selected classifiers

Comments

Preprocessing features fails during machine learning

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

BUG: Time and memory inefficient concating in pandas on every case.

Feature/add inference mlflow

example_WORC.ipynb not being up to date with the repository

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Fix

Bugfix/refactor

Support various readers (Nibabel, ITK)

Releases(v0.2.2)

v0.2.2(Jul 30, 2022)

Owner

Piotr Woźnicki

Share a benchmark that can easily apply reinforcement learning in Job-shop-scheduling

Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

Generate Cartoon Images using Generative Adversarial Network

PyTorch implementation of InstaGAN: Instance-aware Image-to-Image Translation

An Unsupervised Detection Framework for Chinese Jargons in the Darknet

Video Corpus Moment Retrieval with Contrastive Learning (SIGIR 2021)

PyTorch Implementation of Fully Convolutional Networks. (Training code to reproduce the original result is available.)

Denoising Normalizing Flow

A parallel framework for population-based multi-agent reinforcement learning.

Here is the implementation of our paper S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations.

QQ Browser 2021 AI Algorithm Competition Track 1 1st Place Program

A python/pytorch utility library

Ontologysim: a Owlready2 library for applied production simulation

A PyTorch implementation of deep-learning-based registration

Implementation of FitVid video prediction model in JAX/Flax.

Time-Optimal Planning for Quadrotor Waypoint Flight

[ACMMM 2021 Oral] Enhanced Invertible Encoding for Learned Image Compression

Paper Title: Heterogeneous Knowledge Distillation for Simultaneous Infrared-Visible Image Fusion and Super-Resolution

TensorFlow, PyTorch and Numpy layers for generating Orthogonal Polynomials

ICRA 2021 - Robust Place Recognition using an Imaging Lidar