HistoKT: Cross Knowledge Transfer in Computational Pathology

Related tags

Deep LearningHistoKT
Overview

HistoKT: Cross Knowledge Transfer in Computational Pathology

Exciting News! HistoKT has been accepted to ICASSP 2022.

HistoKT: Cross Knowledge Transfer in Computational Pathology,
Ryan Zhang, Jiadai Zhu, Stephen Yang, Mahdi S. Hosseini, Angelo Genovese, Lina Chen, Corwyn Rowsell, Savvas Damaskinos, Sonal Varma, Konstantinos N. Plataniotis
Accepted in 2022 IEEE International Conference on Acourstics, Speech, and Signal Processing (ICASSP2022)

Overview

In computational pathology, the lack of well-annotated datasets obstructs the application of deep learning techniques. Since pathologist time is expensive, dataset curation is intrinsically difficult. Thus, many CPath workflows involve transferring learned knowledge between various image domains through transfer learning. Currently, most transfer learning research follows a model-centric approach, tuning network parameters to improve transfer results over few datasets. In this paper, we take a data-centric approach to the transfer learning problem and examine the existence of generalizable knowledge between histopathological datasets. First, we create a standardization workflow for aggregating existing histopathological data. We then measure inter-domain knowledge by training ResNet18 models across multiple histopathological datasets, and cross-transferring between them to determine the quantity and quality of innate shared knowledge. Additionally, we use weight distillation to share knowledge between models without additional training. We find that hard to learn, multi-class datasets benefit most from pretraining, and a two stage learning framework incorporating a large source domain such as ImageNet allows for better utilization of smaller datasets. Furthermore, we find that weight distillation enables models trained on purely histopathological features to outperform models using external natural image data.

Results

We report our transfer learning using ResNet18 results accross various datasets, with two initialization methods (random and ImageNet initialization). Each item in the matrix represents the Top-1 test accuracy of a ResNet18 model trained on the source dataset and deep-tuned on the target dataset. Items are highlighted in a colour gradient from deep red to deep green, where green represents significant accuracy improvement after tuning, and red represents accuracy decline after tuning.

No Pretraining

ImageNet Initialization

Table of Contents

Getting Started

Dependencies

  • Requirements are specified in requirements.txt
argon2-cffi==20.1.0
async-generator==1.10
attrs==21.2.0
backcall==0.2.0
bleach==3.3.0
cffi==1.14.5
colorama==0.4.4
cycler==0.10.0
decorator==4.4.2
defusedxml==0.7.1
entrypoints==0.3
et-xmlfile==1.1.0
h5py==3.2.1
imageio==2.9.0
ipykernel==5.5.4
ipython==7.23.1
ipython-genutils==0.2.0
ipywidgets==7.6.3
jedi==0.18.0
Jinja2==3.0.0
joblib==1.0.1
jsonschema==3.2.0
jupyter==1.0.0
jupyter-client==6.1.12
jupyter-console==6.4.0
jupyter-core==4.7.1
jupyterlab-pygments==0.1.2
jupyterlab-widgets==1.0.0
kiwisolver==1.3.1
MarkupSafe==2.0.0
matplotlib==3.4.2
matplotlib-inline==0.1.2
mistune==0.8.4
nbclient==0.5.3
nbconvert==6.0.7
nbformat==5.1.3
nest-asyncio==1.5.1
networkx==2.5.1
notebook==6.3.0
numpy==1.20.3
openpyxl==3.0.7
packaging==20.9
pandas==1.2.4
pandocfilters==1.4.3
parso==0.8.2
pickleshare==0.7.5
Pillow==8.2.0
prometheus-client==0.10.1
prompt-toolkit==3.0.18
pyaml==20.4.0
pycparser==2.20
Pygments==2.9.0
pyparsing==2.4.7
pyrsistent==0.17.3
python-dateutil==2.8.1
pytz==2021.1
PyWavelets==1.1.1
pywin32==300
pywinpty==0.5.7
PyYAML==5.4.1
pyzmq==22.0.3
qtconsole==5.1.0
QtPy==1.9.0
scikit-image==0.18.1
scikit-learn==0.24.2
scipy==1.6.3
Send2Trash==1.5.0
six==1.16.0
sklearn==0.0
terminado==0.9.5
testpath==0.4.4
threadpoolctl==2.1.0
tifffile==2021.4.8
torch==1.8.1+cu102
torchaudio==0.8.1
torchvision==0.9.1+cu102
tornado==6.1
traitlets==5.0.5
typing-extensions==3.10.0.0
wcwidth==0.2.5
webencodings==0.5.1
widgetsnbextension==3.5.1

Running the Code

This codebase was created in collaboration with the RMSGD repository. As such, much of the training pipeline is shared.

Downloading datasets

All available datasets can be found on their respective websites. Some datasets, such as ADP, are available by request.

A list of all datasets used in this paper can be found below:

Preprocessing and Training

To prepare datasets for training, please use the functions found in dataset_processing\standardize_datasets.py after downloading all the datasets and placing them all in one folder.

cd HistoKT/dataset_processing
python standardize_datasets.py

A standardized version of each dataset will be created in the dataset folder.

To run the code for training, use the src/adas/train.py file:

cd HistoKT
python src/adas/train.py --config CONFIG --data DATA_FOLDER

Options for Training

--config CONFIG       Set configuration file path: Default = 'configAdas.yaml'
--data DATA           Set data directory path: Default = '.adas-data'
--output OUTPUT       Set output directory path: Default = '.adas-output'
--checkpoint CHECKPOINT
                    Set checkpoint directory path: Default = '.adas-checkpoint'
--resume RESUME       Set checkpoint resume path: Default = None
--pretrained_model PRETRAINED_MODEL
                    Set checkpoint pretrained model path: Default = None
--freeze_encoder FREEZE_ENCODER
                    Set if to freeze encoder for post training: Default = True
--root ROOT           Set root path of project that parents all others: Default = '.'
--save-freq SAVE_FREQ
                    Checkpoint epoch save frequency: Default = 25
--cpu                 Flag: CPU bound training: Default = False
--gpu GPU             GPU id to use: Default = 0
--multiprocessing-distributed
                    Use multi-processing distributed training to launch N processes per node, which has N GPUs. This is the fastest way to use PyTorch for either   
                    single node or multi node data parallel training: Default = False
--dist-url DIST_URL   url used to set up distributed training:Default = 'tcp://127.0.0.1:23456'
--dist-backend DIST_BACKEND
                    distributed backend: Default = 'nccl'
--world-size WORLD_SIZE
                    Number of nodes for distributed training: Default = -1
--rank RANK           Node rank for distributed training: Default = -1
--color_aug COLOR_AUG
                    override config color augmentation, can also choose "no_aug"
--norm_vals NORM_VALS
                    override normalization values, use dataset string. e.g. "BACH_transformed"

Training Output

All training output will be saved to the OUTPUT_PATH location. After a full experiment, results will be recorded in the following format:

  • OUTPUT
    • Timestamped xlsx sheet with the record of train and validation (notated as test) acc, loss, and rank metrics for each layer in the network (refer to AdaS)
  • CHECKPOINT
    • checkpoint dictionaries with a snapshot of the model's parameters at a given epoch.

Code Organization

Configs

We provide sample configuration files for ResNet18 over all used datasets in configs\NewPretrainingConfigs

These configs were used for training the model on each dataset from random initialization.

All available options can be found in the config files.

Visualization

We provide sample code to plot training curves in Plots

We provide sample code on using the statistical method t-SNE to visualize the high-dimensional features in T-sne.

We provide sample code on using the visual explanation algorithm Grad-CAM heat-maps in gradCAM.

Version History

  • 0.1
    • Initial Release
Owner
Mahdi S. Hosseini
Assistant Professor in ECE Department at University of New Brunswick. My research interests cover broad topics in Machine Learning and Computer Vision problems
Mahdi S. Hosseini
This is a collection of our NAS and Vision Transformer work.

AutoML - Neural Architecture Search This is a collection of our AutoML-NAS work iRPE (NEW): Rethinking and Improving Relative Position Encoding for Vi

Microsoft 832 Jan 08, 2023
RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching

RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching This repository contains the source code for our paper: RAFT-Stereo: Multilevel

Princeton Vision & Learning Lab 328 Jan 09, 2023
clDice - a Novel Topology-Preserving Loss Function for Tubular Structure Segmentation

README clDice - a Novel Topology-Preserving Loss Function for Tubular Structure Segmentation CVPR 2021 Authors: Suprosanna Shit and Johannes C. Paetzo

110 Dec 29, 2022
Evaluation framework for testing segmentation networks in PyTorch

Evaluation framework for testing segmentation networks in PyTorch. What segmentation network to choose for next Kaggle competition? This benchmark knows the answer!

Eugene Khvedchenya 37 Apr 27, 2022
Let's Git - Versionsverwaltung & Open Source Hausaufgabe

Let's Git - Versionsverwaltung & Open Source Hausaufgabe Herzlich Willkommen zu dieser Hausaufgabe für unseren MOOC: Let's Git! Wir hoffen, dass Du vi

1 Dec 13, 2021
PyTorch implementation for paper Neural Marching Cubes.

NMC PyTorch implementation for paper Neural Marching Cubes, Zhiqin Chen, Hao Zhang. Paper | Supplementary Material (to be updated) Citation If you fin

Zhiqin Chen 109 Dec 27, 2022
Traditional deepdream with VQGAN+CLIP and optical flow. Ready to use in Google Colab

VQGAN-CLIP-Video cat.mp4 policeman.mp4 schoolboy.mp4 forsenBOG.mp4

23 Oct 26, 2022
A proof of concept ai-powered Recaptcha v2 solver

Recaptcha Fullauto I've decided to open source my old Recaptcha v2 solver. My latest version will be opened sourced this summer. I am hoping this proj

Nate 60 Dec 20, 2022
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

TAP: Text-Aware Pre-training TAP: Text-Aware Pre-training for Text-VQA and Text-Caption by Zhengyuan Yang, Yijuan Lu, Jianfeng Wang, Xi Yin, Dinei Flo

Microsoft 61 Nov 14, 2022
PyTorch implementation of Histogram Layers from DeepHist: Differentiable Joint and Color Histogram Layers for Image-to-Image Translation

deep-hist PyTorch implementation of Histogram Layers from DeepHist: Differentiable Joint and Color Histogram Layers for Image-to-Image Translation PyT

Winfried Lötzsch 10 Dec 06, 2022
Multi-Task Learning as a Bargaining Game

Nash-MTL Official implementation of "Multi-Task Learning as a Bargaining Game". Setup environment conda create -n nashmtl python=3.9.7 conda activate

Aviv Navon 87 Dec 26, 2022
Speech recognition tool to convert audio to text transcripts, for Linux and Raspberry Pi.

Spchcat Speech recognition tool to convert audio to text transcripts, for Linux and Raspberry Pi. Description spchcat is a command-line tool that read

Pete Warden 279 Jan 03, 2023
Real-time multi-object tracker using YOLO v5 and deep sort

This repository contains a two-stage-tracker. The detections generated by YOLOv5, a family of object detection architectures and models pretrained on the COCO dataset, are passed to a Deep Sort algor

Mike 3.6k Jan 05, 2023
Weakly Supervised Segmentation with Tensorflow. Implements instance segmentation as described in Simple Does It: Weakly Supervised Instance and Semantic Segmentation, by Khoreva et al. (CVPR 2017).

Weakly Supervised Segmentation with TensorFlow This repo contains a TensorFlow implementation of weakly supervised instance segmentation as described

Phil Ferriere 220 Dec 13, 2022
Open-sourcing the Slates Dataset for recommender systems research

FINN.no Recommender Systems Slate Dataset This repository accompany the paper "Dynamic Slate Recommendation with Gated Recurrent Units and Thompson Sa

FINN.no 48 Nov 28, 2022
MT3: Multi-Task Multitrack Music Transcription

MT3: Multi-Task Multitrack Music Transcription MT3 is a multi-instrument automatic music transcription model that uses the T5X framework. This is not

Magenta 867 Dec 29, 2022
(CVPR 2022) Pytorch implementation of "Self-supervised transformers for unsupervised object discovery using normalized cut"

(CVPR 2022) TokenCut Pytorch implementation of Tokencut: Self-supervised Transformers for Unsupervised Object Discovery using Normalized Cut Yangtao W

YANGTAO WANG 200 Jan 02, 2023
A convolutional recurrent neural network for classifying A/B phases in EEG signals recorded for sleep analysis.

CAP-Classification-CRNN A deep learning model based on Inception modules paired with gated recurrent units (GRU) for the classification of CAP phases

Apurva R. Umredkar 2 Nov 25, 2022
Anomaly detection analysis and labeling tool, specifically for multiple time series (one time series per category)

taganomaly Anomaly detection labeling tool, specifically for multiple time series (one time series per category). Taganomaly is a tool for creating la

Microsoft 272 Dec 17, 2022
Clairvoyance: a Unified, End-to-End AutoML Pipeline for Medical Time Series

Clairvoyance: A Pipeline Toolkit for Medical Time Series Authors: van der Schaar Lab This repository contains implementations of Clairvoyance: A Pipel

van_der_Schaar \LAB 89 Dec 07, 2022