Hyperbolic Hierarchical Clustering.

Related tags

Deep LearningHypHC
Overview

Hyperbolic Hierarchical Clustering (HypHC)

This code is the official PyTorch implementation of the NeurIPS 2020 paper:

From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical Clustering
Ines Chami, Albert Gu, Vaggos Chatziafratis and Christopher Ré
Stanford University
Paper: https://arxiv.org/abs/2010.00402

Abstract. Similarity-based Hierarchical Clustering (HC) is a classical unsupervised machine learning algorithm that has traditionally been solved with heuristic algorithms like Average-Linkage. Recently, Dasgupta reframed HC as a discrete optimization problem by introducing a global cost function measuring the quality of a given tree. In this work, we provide the first continuous relaxation of Dasgupta's discrete optimization problem with provable quality guarantees. The key idea of our method, HypHC, is showing a direct correspondence from discrete trees to continuous representations (via the hyperbolic embeddings of their leaf nodes) and back (via a decoding algorithm that maps leaf embeddings to a dendrogram), allowing us to search the space of discrete binary trees with continuous optimization. Building on analogies between trees and hyperbolic space, we derive a continuous analogue for the notion of lowest common ancestor, which leads to a continuous relaxation of Dasgupta's discrete objective. We can show that after decoding, the global minimizer of our continuous relaxation yields a discrete tree with a (1+epsilon)-factor approximation for Dasgupta's optimal tree, where epsilon can be made arbitrarily small and controls optimization challenges. We experimentally evaluate HypHC on a variety of HC benchmarks and find that even approximate solutions found with gradient descent have superior clustering quality than agglomerative heuristics or other gradient based algorithms. Finally, we highlight the flexibility of HypHC using end-to-end training in a downstream classification task.

Installation

This code has been tested with python3.7. First, create a virtual environment (or conda environment) and install the dependencies:

python3 -m venv hyphc_env

source hyphc_env/bin/activate

pip install -r requirements.txt

Then install the mst and unionfind packages which are used to decode embeddings into trees and compute the discrete Dasgupta Cost efficiently:

cd mst; python setup.py build_ext --inplace

cd unionfind; python setup.py build_ext --inplace

Datasets

source download_data.sh

This will download the zoo, iris and glass datasets from the UCI machine learning repository. Please refer to the paper for the download links of the other datasets used in the paper.

Code Usage

Train script

To use the code, first set environment variables in each shell session:

source set_env.sh

To train the HypHC mode, use the train script:

python train.py
    optional arguments:
      -h, --help            show this help message and exit
      --seed SEED
      --epochs EPOCHS
      --batch_size BATCH_SIZE
      --learning_rate LEARNING_RATE
      --eval_every EVAL_EVERY
      --patience PATIENCE
      --optimizer OPTIMIZER
      --save SAVE
      --fast_decoding FAST_DECODING
      --num_samples NUM_SAMPLES
      --dtype DTYPE
      --rank RANK
      --temperature TEMPERATURE
      --init_size INIT_SIZE
      --anneal_every ANNEAL_EVERY
      --anneal_factor ANNEAL_FACTOR
      --max_scale MAX_SCALE
      --dataset DATASET

Examples

We provide examples of training commands for the zoo, iris and glass datasets. For instance, to train HypHC on zoo, run:

source examples/run_zoo.sh

This will create an embedding directory and save training logs, embeddings and the configuration parameters in a embedding/zoo/[unique_id] where the unique id is based on the configuration parameters used to train the model.

Citation

If you find this code useful, please cite the following paper:

@inproceedings{NEURIPS2020_ac10ec1a,
 author = {Chami, Ines and Gu, Albert and Chatziafratis, Vaggos and R\'{e}, Christopher},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {H. Larochelle and M. Ranzato and R. Hadsell and M. F. Balcan and H. Lin},
 pages = {15065--15076},
 publisher = {Curran Associates, Inc.},
 title = {From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical Clustering},
 url = {https://proceedings.neurips.cc/paper/2020/file/ac10ec1ace51b2d973cd87973a98d3ab-Paper.pdf},
 volume = {33},
 year = {2020}
}
Owner
HazyResearch
We are a CS research group led by Prof. Chris Ré.
HazyResearch
Single cell current best practices tutorial case study for the paper:Luecken and Theis, "Current best practices in single-cell RNA-seq analysis: a tutorial"

Scripts for "Current best-practices in single-cell RNA-seq: a tutorial" This repository is complementary to the publication: M.D. Luecken, F.J. Theis,

Theis Lab 968 Dec 28, 2022
Hi Guys, here I am providing examples, which will help you in Lerarning Python

LearningPython Hi guys, here I am trying to include as many practice examples of Python Language, as i Myself learn, and hope these will help you in t

4 Feb 03, 2022
PROJECT - Az Residential Real Estate Analysis

AZ RESIDENTIAL REAL ESTATE ANALYSIS -Decided on libraries to import. Includes pa

2 Jul 05, 2022
From this paper "SESNet: A Semantically Enhanced Siamese Network for Remote Sensing Change Detection"

SESNet for remote sensing image change detection It is the implementation of the paper: "SESNet: A Semantically Enhanced Siamese Network for Remote Se

1 May 24, 2022
RoboDesk A Multi-Task Reinforcement Learning Benchmark

RoboDesk A Multi-Task Reinforcement Learning Benchmark If you find this open source release useful, please reference in your paper: @misc{kannan2021ro

Google Research 66 Oct 07, 2022
A library for finding knowledge neurons in pretrained transformer models.

knowledge-neurons An open source repository replicating the 2021 paper Knowledge Neurons in Pretrained Transformers by Dai et al., and extending the t

EleutherAI 96 Dec 21, 2022
Official Implementation of "Learning Disentangled Behavior Embeddings"

DBE: Disentangled-Behavior-Embedding Official implementation of Learning Disentangled Behavior Embeddings (NeurIPS 2021). Environment requirement The

Mishne Lab 12 Sep 28, 2022
Another pytorch implementation of FCN (Fully Convolutional Networks)

FCN-pytorch-easiest Trying to be the easiest FCN pytorch implementation and just in a get and use fashion Here I use a handbag semantic segmentation f

Y. Dong 158 Dec 21, 2022
Official implementation for paper: A Latent Transformer for Disentangled Face Editing in Images and Videos.

A Latent Transformer for Disentangled Face Editing in Images and Videos Official implementation for paper: A Latent Transformer for Disentangled Face

InterDigital 108 Dec 09, 2022
Linear image-to-image translation

Linear (Un)supervised Image-to-Image Translation Examples for linear orthogonal transformations in PCA domain, learned without pairing supervision. Tr

Eitan Richardson 40 Aug 31, 2022
使用yolov5训练自己数据集(详细过程)并通过flask部署

使用yolov5训练自己的数据集(详细过程)并通过flask部署 依赖库 torch torchvision numpy opencv-python lxml tqdm flask pillow tensorboard matplotlib pycocotools Windows,请使用 pycoc

HB.com 19 Dec 28, 2022
Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

Multi-Task Framework for Cross-Lingual Abstractive Summarization (MCLAS) The code for ACL2021 paper Cross-Lingual Abstractive Summarization with Limit

Yu Bai 43 Nov 07, 2022
Official code of the paper "Expanding Low-Density Latent Regions for Open-Set Object Detection" (CVPR 2022)

OpenDet Expanding Low-Density Latent Regions for Open-Set Object Detection (CVPR2022) Jiaming Han, Yuqiang Ren, Jian Ding, Xingjia Pan, Ke Yan, Gui-So

csuhan 64 Jan 07, 2023
ChainerRL is a deep reinforcement learning library built on top of Chainer.

ChainerRL and PFRL ChainerRL (this repository) is a deep reinforcement learning library that implements various state-of-the-art deep reinforcement al

Chainer 1.1k Jan 01, 2023
Repository of the paper Compressing Sensor Data for Remote Assistance of Autonomous Vehicles using Deep Generative Models at ML4AD @ NeurIPS 2021.

Compressing Sensor Data for Remote Assistance of Autonomous Vehicles using Deep Generative Models Code and supplementary materials Repository of the p

Daniel Bogdoll 4 Jul 13, 2022
Intel® Nervana™ reference deep learning framework committed to best performance on all hardware

DISCONTINUATION OF PROJECT. This project will no longer be maintained by Intel. Intel will not provide or guarantee development of or support for this

Nervana 3.9k Dec 20, 2022
African language Speech Recognition - Speech-to-Text

Swahili-Speech-To-Text Table of Contents Swahili-Speech-To-Text Overview Scenario Approach Project Structure data: models: notebooks: scripts tests: l

2 Jan 05, 2023
Bringing Characters to Life with Computer Brains in Unity

AI4Animation: Deep Learning for Character Control This project explores the opportunities of deep learning for character animation and control as part

Sebastian Starke 5.5k Jan 04, 2023
A Python package for faster, safer, and simpler ML processes

Bender 🤖 A Python package for faster, safer, and simpler ML processes. Why use bender? Bender will make your machine learning processes, faster, safe

Otovo 6 Dec 13, 2022