An unopinionated replacement for PyTorch's Dataset and ImageFolder, that handles Tar archives

Last update: Dec 20, 2022

Related tags

Overview

Simple Tar Dataset

An unopinionated replacement for PyTorch's Dataset and ImageFolder classes, for datasets stored as uncompressed Tar archives.

Just Tar it: No particular structure is enforced in the Tar archive. This means that you can just archive your files with no modification, and handle any data/meta-data with your dataset code.

Why? Storing a dataset as millions of small files makes access inefficient, and can create other difficulties in large-scale scenarios (e.g. running out of inodes, inneficient operations in distributed filesystems which are optimised for fewer large files). A Tar file is a simple and uncompressed archive format for which numerous utilities exist, and it allows fast random access into a single archive file.

Example

The default TarDataset simply loads all PNG, JPG and JPEG images from a Tar file, and allows you to iterate them.

Images are returned as Tensor. Here some RGB values are printed.

from tardataset import TarDataset

dataset = TarDataset('example-data/colors.tar')

for (idx, image) in enumerate(dataset):
  print(f"Image #{idx}, color: {image[:,0,0]}")

Usage

For image classification datasets, where images are usually stored in one folder per class (e.g. ImageNet), TarImageFolder is a drop-in replacement for torchvision.dataset.ImageFolder.

For more complex scenarios -- say, you store some data in one or more JSON files, or you have folders with video frames in specific formats -- you can subclass TarDataset, and read the data in any format you like.

Jupyter notebook tutorial

There is a more comprehensive set of examples as a Jupyter notebook in example.ipynb.

Full "ImageNet in a Tar file" example

A large-scale data loading example is given in imagenet-example.py. Only the section of code responsible for data loading was modified from the official PyTorch ImageNet example.

First, ensure that the data is in the expected format for the original example to work, in a folder named ILSVRC12. Then, create a Tar archive from it (tar cf ILSVRC12.tar ILSVRC12 on Linux or a utility like 7-Zip on Windows). Finally, run our modified imagenet-example.py, passing it the path to the Tar archive instead.

Author

João Henriques, Visual Geometry Group (VGG), University of Oxford

An unopinionated replacement for PyTorch's Dataset and ImageFolder, that handles Tar archives

Related tags

Overview

Simple Tar Dataset

Example

Usage

Jupyter notebook tutorial

Full "ImageNet in a Tar file" example

Author

Owner

Joao Henriques

Find the Heart simple Python Game

External Attention Network

AI pipelines for Nvidia Jetson Platform

Implementation of Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning

FrankMocap: A Strong and Easy-to-use Single View 3D Hand+Body Pose Estimator

Over-the-Air Ensemble Inference with Model Privacy

Medical image analysis framework merging ANTsPy and deep learning

Train Yolov4 using NBX-Jobs

Open CV - Convert a picture to look like a cartoon sketch in python

StyleGAN2 Webtoon / Anime Style Toonify

Text-to-Music Retrieval using Pre-defined/Data-driven Emotion Embeddings

InvTorch: memory-efficient models with invertible functions

Pytorch codes for Feature Transfer Learning for Face Recognition with Under-Represented Data

A Survey on Deep Learning Technique for Video Segmentation

[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen, Kaixiong Zhou, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang

Code release for Local Light Field Fusion at SIGGRAPH 2019

SuRE Evaluation: A Supplementary Material

Open-World Entity Segmentation

B2EA: An Evolutionary Algorithm Assisted by Two Bayesian Optimization Modules for Neural Architecture Search

A Sign Language detection project using Mediapipe landmark detection and Tensorflow LSTM's

An unopinionated replacement for PyTorch's Dataset and ImageFolder, that handles Tar archives

Related tags

Overview

Simple Tar Dataset

Example

Usage

Jupyter notebook tutorial

Full "ImageNet in a Tar file" example

Author

Owner

Joao Henriques

Find the Heart simple Python Game

External Attention Network

AI pipelines for Nvidia Jetson Platform

Implementation of Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning

FrankMocap: A Strong and Easy-to-use Single View 3D Hand+Body Pose Estimator

Over-the-Air Ensemble Inference with Model Privacy

Medical image analysis framework merging ANTsPy and deep learning

Train Yolov4 using NBX-Jobs

Open CV - Convert a picture to look like a cartoon sketch in python

StyleGAN2 Webtoon / Anime Style Toonify

Text-to-Music Retrieval using Pre-defined/Data-driven Emotion Embeddings

InvTorch: memory-efficient models with invertible functions

Pytorch codes for Feature Transfer Learning for Face Recognition with Under-Represented Data

A Survey on Deep Learning Technique for Video Segmentation

[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen*, Kaixiong Zhou*, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang

Code release for Local Light Field Fusion at SIGGRAPH 2019

SuRE Evaluation: A Supplementary Material

Open-World Entity Segmentation

B2EA: An Evolutionary Algorithm Assisted by Two Bayesian Optimization Modules for Neural Architecture Search

A Sign Language detection project using Mediapipe landmark detection and Tensorflow LSTM's

[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen, Kaixiong Zhou, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang