A set of examples around hub for creating and processing datasets

Last update: Dec 14, 2022

Related tags

Deep Learning examples

Overview

Examples for Hub - Dataset Format for AI

A repository showcasing examples of using Hub

Uploading Dataset Places365

Colab Tutorials

Notebook	Link
Getting Started with Hub
Creating Object Detection Datasets
Creating Complex Detection Datasets
Data Processing Using Parallel Computing
Training an Image Classification Model in PyTorch

Getting Started with Hub 🚀

Installation

Hub is written in 100% python and can be quickly installed using pip.

pip3 install hub

Creating Datasets

A hub dataset can be created in various locations (Storage providers). This is how the paths for each of them would look like:

Storage provider	Example path
Hub cloud	hub://user_name/dataset_name
AWS S3	s3://bucket_name/dataset_name
GCP	gcp://bucket_name/dataset_name
Local storage	path to local directory
In-memory	mem://dataset_name

Let's create a dataset in the Hub cloud. Create a new account with Hub from the terminal using activeloop register if you haven't already. You will be asked for a user name, email id and passowrd. The user name you enter here will be used in the dataset path.

$ activeloop register
Enter your details. Your password must be atleast 6 characters long.
Username:
Email:
Password:

Initialize an empty dataset in the hub cloud:

import hub

ds = hub.empty("hub://<USERNAME>/test-dataset")

Next, create a tensor to hold images in the dataset we just initialized:

images = ds.create_tensor("images", htype="image", sample_compression="jpg")

Assuming you have a list of image file paths, lets upload them to the dataset:

image_paths = ...
with ds:
    for image_path in image_paths:
        image = hub.read(image_path)
        ds.images.append(image)

Alternatively, you can also upload numpy arrays. Since the images tensor was created with sample_compression="jpg", the arrays will be compressed with jpeg compression.

import numpy as np

with ds:
    for _ in range(1000):  # 1000 random images
        radnom_image = np.random.randint(0, 256, (100, 100, 3))  # 100x100 image with 3 channels
        ds.images.append(image)

Loading Datasets

You can load the dataset you just created with a single line of code:

import hub

ds = hub.load("hub://<USERNAME>/test-dataset")

You can also access other publicly available hub datasets, not just the ones you created. Here is how you would load the Objectron Bikes Dataset:

import hub

ds = hub.load('hub://activeloop/objectron_bike_train')

To get the first image in the Objectron Bikes dataset in numpy format:

image_arr = ds.image[0].numpy()

Documentation

Getting started guides, examples, tutorials, API reference, and other usage information can be found on our documentation page.

A set of examples around hub for creating and processing datasets

Related tags

Overview

Examples for Hub - Dataset Format for AI

Colab Tutorials

Getting Started with Hub 🚀

Installation

Creating Datasets

Loading Datasets

Documentation

Owner

Activeloop

[NeurIPS'20] Self-supervised Co-Training for Video Representation Learning. Tengda Han, Weidi Xie, Andrew Zisserman.

End-to-End Object Detection with Fully Convolutional Network

PyTorch implementation for OCT-GAN Neural ODE-based Conditional Tabular GANs (WWW 2021)

A collection of semantic image segmentation models implemented in TensorFlow

Using Clinical Drug Representations for Improving Mortality and Length of Stay Predictions

Code for our CVPR 2022 Paper "GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection"

Official repository for "Deep Recurrent Neural Network with Multi-scale Bi-directional Propagation for Video Deblurring".

Exploring Cross-Image Pixel Contrast for Semantic Segmentation

Official implementation of Meta-StyleSpeech and StyleSpeech

学习 python3 以来写的一些垃圾玩具……

Keras Realtime Multi-Person Pose Estimation - Keras version of Realtime Multi-Person Pose Estimation project

Training Structured Neural Networks Through Manifold Identification and Variance Reduction

The Easy-to-use Dialogue Response Selection Toolkit for Researchers

[ICLR 2021] Heteroskedastic and Imbalanced Deep Learning with Adaptive Regularization

Selfplay In MultiPlayer Environments

Multispectral Object Detection with Yolov5

Source code for CVPR 2020 paper "Learning to Forget for Meta-Learning"

This demo showcase the use of onnxruntime-rs with a GPU on CUDA 11 to run Bert in a data pipeline with Rust.

ICLR 2021: Pre-Training for Context Representation in Conversational Semantic Parsing

AI-Bot - 一个基于watermelon改造的OpenAI-GPT-2的智能机器人