Interactive dimensionality reduction for large datasets

Related tags

Deep Learningblossom
Overview

BlosSOM 🌼

BlosSOM is a graphical environment for running semi-supervised dimensionality reduction with EmbedSOM. You can use it to explore multidimensional datasets, and produce great-looking 2-dimensional visualizations.

WARNING: BlosSOM is still under development, some stuff may not work right, but things will magically improve without notice. Feel free to open an issue if something looks wrong.

screenshot

BlosSOM was developed at the MFF UK Prague, in cooperation with IOCB Prague.

MFF logoIOCB logo

Overview

BlosSOM creates a landmark-based model of the dataset, and dynamically projects all dataset point to your screen (using EmbedSOM). Several other algorithms and tools are provided to manage the landmarks; a quick overview follows:

  • High-dimensional landmark positioning:
    • Self-organizing maps
    • k-Means
  • 2D landmark positioning
    • k-NN graph generation (only adds edges, not vertices)
    • force-based graph layouting
    • dynamic t-SNE
  • Dimensionality reduction
    • EmbedSOM
    • CUDA EmbedSOM (with roughly 500x speedup, enabling smooth display of a few millions of points)
  • Manual landmark position optimization
  • Visualization settings (colors, transparencies, cluster coloring, ...)
  • Dataset transformations and dimension scaling
  • Import from matrix-like data files
    • FCS3.0 (Flow Cytometry Standard files)
    • TSV (Tab-separated CSV)
  • Export of the data for plotting

Compiling and running BlosSOM

You will need cmake build system and SDL2.

For CUDA EmbedSOM to work, you need the NVIDIA CUDA toolkit. Append -DBUILD_CUDA=1 to cmake options to enable the CUDA version.

Windows (Visual Studio 2019)

Dependencies

The project requires SDL2 as an external dependency:

  1. install vcpkg tool and remember your vcpkg directory
  2. install SDL: vcpkg install SDL2:x64-windows

Compilation

git submodule init
git submodule update

mkdir build
cd build

# You need to fix the path to vcpkg in the following command:
cmake .. -G "Visual Studio 16 2019" -A x64 -DCMAKE_BUILD_TYPE="Release" -DCMAKE_INSTALL_PREFIX=./inst -DCMAKE_TOOLCHAIN_FILE=your-vcpkg-clone-directory/scripts/buildsystems/vcpkg.cmake

cmake --build . --config Release
cmake --install . --config Release

Running

Open Visual Studio solution BlosSOM.sln, set blossom as startup project, set configuration to Release and run the project.

Linux (and possibly other unix-like systems)

Dependencies

The project requires SDL2 as an external dependency. Install libsdl2-dev (on Debian-based systems) or SDL2-devel (on Red Hat-based systems), or similar (depending on the Linux distribution). You should be able to install cmake package the same way.

Compilation

git submodule init
git submodule update

mkdir build
cd build
cmake .. -DCMAKE_INSTALL_PREFIX=./inst    # or any other directory
make install                              # use -j option to speed up the build

Running

./inst/bin/blossom

Documentation

Quickstart

  1. Click on the "plus" button on the bottom right side of the window
  2. Choose Open file (the first button from the top) and open a file from the demo_data/ directory
  3. You can now add and delete landmarks using ctrl+mouse click, and drag them around.
  4. Use the tools and settings available under the "plus" button to optimize the landmark positions and get a better visualization.

See the HOWTO for more details and hints.

Performance and CUDA

If you pass -DBUILD_CUDA=1 to the cmake commands, you will get extra executable called blossom_cuda (or blossom_cuda.exe, on Windows).

The 2 versions of BlosSOM executable differ mainly in the performance of EmbedSOM projection, which is more than 100× faster on GPUs than on CPUs. If the dataset gets large, only a fixed-size slice of the dataset gets processed each frame (e.g., at most 1000 points in case of CPU) to keep the framerate in a usable range. The defaults in BlosSOM should work smoothly for many use-cases (defaulting at 1k points per frame on CPU and 50k points per frame on GPU).

If required (e.g., if you have a really fast GPU), you may modify the constants in the corresponding source files, around the call sites of clean_range(), which is the function that manages the round-robin refreshing of the data. Functionality that dynamically chooses the best data-crunching rate is being implemented and should be available soon.

License

BlosSOM is licensed under GPLv3 or later. Several small libraries bundled in the repository are licensed with MIT-style licenses.

DiAne is a smart fuzzer for IoT devices

Diane Diane is a fuzzer for IoT devices. Diane works by identifying fuzzing triggers in the IoT companion apps to produce valid yet under-constrained

seclab 28 Jan 04, 2023
This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

SLATE This is the official source code for SLATE. We provide the code for the model, the training code and a dataset loader for the 3D Shapes dataset.

Gautam Singh 66 Dec 26, 2022
Official implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis https://arxiv.org/abs/2011.13775

CIPS -- Official Pytorch Implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis Requirements pip install -r requi

Multimodal Lab @ Samsung AI Center Moscow 201 Dec 21, 2022
An implementation of "Learning human behaviors from motion capture by adversarial imitation"

Merel-MoCap-GAIL An implementation of Merel et al.'s paper on generative adversarial imitation learning (GAIL) using motion capture (MoCap) data: Lear

Yu-Wei Chao 34 Nov 12, 2022
🐸STT integration examples

🐸 STT 0.9.x Examples These are various examples on how to use or integrate 🐸 STT using our packages. It is a good way to just try out 🐸 STT before

coqui 92 Dec 19, 2022
Industrial Image Anomaly Localization Based on Gaussian Clustering of Pre-trained Feature

Industrial Image Anomaly Localization Based on Gaussian Clustering of Pre-trained Feature Q. Wan, L. Gao, X. Li and L. Wen, "Industrial Image Anomaly

smiler 6 Dec 25, 2022
Neural networks applied in recognizing guitar chords using python, AutoML.NET with C# and .NET Core

Chord Recognition Demo application The demo application is written in C# with .NETCore. As of July 9, 2020, the only version available is for windows

Andres Mauricio Rondon Patiño 24 Oct 22, 2022
Hso-groupie - A pwnable challenge in Real World CTF 4th

Hso-groupie - A pwnable challenge in Real World CTF 4th

Riatre Foo 42 Dec 05, 2022
ICRA 2021 - Robust Place Recognition using an Imaging Lidar

Robust Place Recognition using an Imaging Lidar A place recognition package using high-resolution imaging lidar. For best performance, a lidar equippe

Tixiao Shan 293 Dec 27, 2022
Calling Julia from Python - an experiment on data loading

Calling Julia from Python - an experiment on data loading See the slides. TLDR After reading Patrick's blog post, we decided to try to replace C++ wit

Abel Siqueira 8 Jun 07, 2022
Final term project for Bayesian Machine Learning Lecture (XAI-623)

Mixquality_AL Final Term Project For Bayesian Machine Learning Lecture (XAI-623) Youtube Link The presentation is given in YoutubeLink Problem Formula

JeongEun Park 3 Jan 18, 2022
Python Single Object Tracking Evaluation

pysot-toolkit The purpose of this repo is to provide evaluation API of Current Single Object Tracking Dataset, including VOT2016 VOT2018 VOT2018-LT OT

348 Dec 22, 2022
Deep Learning for Natural Language Processing SS 2021 (TU Darmstadt)

Deep Learning for Natural Language Processing SS 2021 (TU Darmstadt) Task Training huge unsupervised deep neural networks yields to strong progress in

2 Aug 05, 2022
Implementation of the state-of-the-art vision transformers with tensorflow

ViT Tensorflow This repository contains the tensorflow implementation of the state-of-the-art vision transformers (a category of computer vision model

Mohammadmahdi NouriBorji 2 Mar 16, 2022
使用深度学习框架提取视频硬字幕;docker容器免安装深度学习库,使用本地api接口使得界面和后端识别分离;

extract-video-subtittle 使用深度学习框架提取视频硬字幕; 本地识别无需联网; CPU识别速度可观; 容器提供API接口; 运行环境 本项目运行环境非常好搭建,我做好了docker容器免安装各种深度学习包; 提供windows界面操作; 容器为CPU版本; 视频演示 https

歌者 16 Aug 06, 2022
quantize aware training package for NCNN on pytorch

ncnnqat ncnnqat is a quantize aware training package for NCNN on pytorch. Table of Contents ncnnqat Table of Contents Installation Usage Code Examples

62 Nov 23, 2022
Tightness-aware Evaluation Protocol for Scene Text Detection

TIoU-metric Release on 27/03/2019. This repository is built on the ICDAR 2015 evaluation code. If you propose a better metric and require further eval

Yuliang Liu 206 Nov 18, 2022
Code release for the ICML 2021 paper "PixelTransformer: Sample Conditioned Signal Generation".

PixelTransformer Code release for the ICML 2021 paper "PixelTransformer: Sample Conditioned Signal Generation". Project Page Installation Please insta

Shubham Tulsiani 24 Dec 17, 2022
A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

Feature Forge This library provides a set of tools that can be useful in many machine learning applications (classification, clustering, regression, e

Machinalis 380 Nov 05, 2022
Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.

WECHSEL Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models. arXiv: https://arx

Institute of Computational Perception 45 Dec 29, 2022