DuBE: Duple-balanced Ensemble Learning from Skewed Data

Overview

DuBE: Duple-balanced Ensemble Learning from Skewed Data

"Towards Inter-class and Intra-class Imbalance in Class-imbalanced Learning"
(IEEE ICDE 2022 Submission) [Documentation] [Examples]

DuBE is an ensemble learning framework for (multi)class-imbalanced classification. It is an easy-to-use solution to imbalanced learning problems, features good performance, computing efficiency, and wide compatibility with different learning models. Documentation and examples are available at https://duplebalance.readthedocs.io.

Table of Contents

Background

Imbalanced Learning (IL) is an important problem that widely exists in data mining applications. Typical IL methods utilize intuitive class-wise resampling or reweighting to directly balance the training set. However, some recent research efforts in specific domains show that class-imbalanced learning can be achieved without class-wise manipulation. This prompts us to think about the relationship between the two different IL strategies and the nature of the class imbalance. Fundamentally, they correspond to two essential imbalances that exist in IL: the difference in quantity between examples from different classes as well as between easy and hard examples within a single class, i.e., inter-class and intra-class imbalance.

image

Existing works fail to explicitly take both imbalances into account and thus suffer from suboptimal performance. In light of this, we present Duple-Balanced Ensemble, namely DUBE, a versatile ensemble learning framework. Unlike prevailing methods, DUBE directly performs inter-class and intra-class balancing without relying on heavy distance-based computation, which allows it to achieve competitive performance while being computationally efficient.

image

Install

Our DuBE implementation requires following dependencies:

You can install DuBE by clone this repository:

git clone https://github.com/ICDE2022Sub/duplebalance.git
cd duplebalance
pip install .

Usage

For more detailed usage example, please see Examples.

A minimal working example:

# load dataset & prepare environment
from duplebalance import DupleBalanceClassifier
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=1000, n_classes=3,
                           n_informative=4, weights=[0.2, 0.3, 0.5],
                           random_state=0)

# ensemble training
clf = DupleBalanceClassifier(
    n_estimators=10,
    random_state=42,
    ).fit(X_train, y_train)

# predict
y_pred_test = clf.predict_proba(X_test)

Documentation

For more detailed API references, please see API reference.

Our DupleBalance implementation can be used much in the same way as the ensemble classifiers in sklearn.ensemble. The DupleBalanceClassifier class inherits from the sklearn.ensemble.BaseEnsemble base class.

Main parameters are listed below:

Parameters Description
base_estimator object, optional (default=sklearn.tree.DecisionTreeClassifier())
The base estimator to fit on self-paced under-sampled subsets of the dataset. NO need to support sample weighting. Built-in fit(), predict(), predict_proba() methods are required.
n_estimators int, optional (default=10)
The number of base estimators in the ensemble.
resampling_target {'hybrid', 'under', 'over', 'raw'}, default="hybrid"
Determine the number of instances to be sampled from each class (inter-class balancing).
- If under, perform under-sampling. The class containing the fewest samples is considered the minority class :math:c_{min}. All other classes are then under-sampled until they are of the same size as :math:c_{min}.
- If over, perform over-sampling. The class containing the argest number of samples is considered the majority class :math:c_{maj}. All other classes are then over-sampled until they are of the same size as :math:c_{maj}.
- If hybrid, perform hybrid-sampling. All classes are under/over-sampled to the average number of instances from each class.
- If raw, keep the original size of all classes when resampling.
resampling_strategy {'hem', 'shem', 'uniform'}, default="shem")
Decide how to assign resampling probabilities to instances during ensemble training (intra-class balancing).
- If hem, perform hard-example mining. Assign probability with respect to instance's latest prediction error.
- If shem, perform soft hard-example mining. Assign probability by inversing the classification error density.
- If uniform, assign uniform probability, i.e., random resampling.
perturb_alpha float or str, optional (default='auto')
The multiplier of the calibrated Gaussian noise that was add on the sampled data. It determines the intensity of the perturbation-based augmentation. If 'auto', perturb_alpha will be automatically tuned using a subset of the given training data.
k_bins int, optional (default=5)
The number of error bins that were used to approximate error distribution. It is recommended to set it to 5. One can try a larger value when the smallest class in the data set has a sufficient number (say, > 1000) of samples.
estimator_params list of str, optional (default=tuple())
The list of attributes to use as parameters when instantiating a new base estimator. If none are given, default parameters are used.
n_jobs int, optional (default=None)
The number of jobs to run in parallel for :meth:predict. None means 1 unless in a :obj:joblib.parallel_backend context. -1 means using all processors. See :term:Glossary <n_jobs> for more details.
random_state int / RandomState instance / None, optional (default=None)
If integer, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by numpy.random.
verbose int, optional (default=0)
Controls the verbosity when fitting and predicting.
Fbone (Flask bone) is a Flask (Python microframework) starter/template/bootstrap/boilerplate application.

Fbone (Flask bone) is a Flask (Python microframework) starter/template/bootstrap/boilerplate application.

Wilson 1.7k Dec 30, 2022
Semantic Segmentation in Pytorch

PyTorch Semantic Segmentation Introduction This repository is a PyTorch implementation for semantic segmentation / scene parsing. The code is easy to

Hengshuang Zhao 1.2k Jan 01, 2023
Source code for "FastBERT: a Self-distilling BERT with Adaptive Inference Time".

FastBERT Source code for "FastBERT: a Self-distilling BERT with Adaptive Inference Time". Good News 2021/10/29 - Code: Code of FastPLM is released on

Weijie Liu 584 Jan 02, 2023
This is an official implementation for "DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation"

DeciWatch: A Simple Baseline for 10× Efficient 2D and 3D Pose Estimation This repo is the official implementation of "DeciWatch: A Simple Baseline for

117 Dec 24, 2022
Source code for "Interactive All-Hex Meshing via Cuboid Decomposition [SIGGRAPH Asia 2021]".

Interactive All-Hex Meshing via Cuboid Decomposition Video demonstration This repository contains an interactive software to the PolyCube-based hex-me

Lingxiao Li 131 Dec 05, 2022
Powerful unsupervised domain adaptation method for dense retrieval.

Powerful unsupervised domain adaptation method for dense retrieval

Ubiquitous Knowledge Processing Lab 191 Dec 28, 2022
Ansible Automation Example: JSNAPY PRE/POST Upgrade Validation

Ansible Automation Example: JSNAPY PRE/POST Upgrade Validation Overview This example will show how to validate the status of our firewall before and a

Calvin Remsburg 1 Jan 07, 2022
Anderson Acceleration for Deep Learning

Anderson Accelerated Deep Learning (AADL) AADL is a Python package that implements the Anderson acceleration to speed-up the training of deep learning

Oak Ridge National Laboratory 7 Nov 24, 2022
PyTorch code for JEREX: Joint Entity-Level Relation Extractor

JEREX: "Joint Entity-Level Relation Extractor" PyTorch code for JEREX: "Joint Entity-Level Relation Extractor". For a description of the model and exp

LAVIS - NLP Working Group 50 Dec 01, 2022
This program can detect your face and add an Christams hat on the top of your head

Auto_Christmas This program can detect your face and add a Christmas hat to the top of your head. just run the Auto_Christmas.py, then you can see the

3 Dec 22, 2021
An Efficient Training Approach for Very Large Scale Face Recognition or F²C for simplicity.

Fast Face Classification (F²C) This is the code of our paper An Efficient Training Approach for Very Large Scale Face Recognition or F²C for simplicit

33 Jun 27, 2021
GDSC-ML Team Interview Task

GDSC-ML-Team---Interview-Task Task 1 : Clean or Messy room In this task we have to classify the given test images as clean or messy. - Link for datase

Aayush. 1 Jan 19, 2022
Datasets, Transforms and Models specific to Computer Vision

torchvision The torchvision package consists of popular datasets, model architectures, and common image transformations for computer vision. Installat

13.1k Jan 02, 2023
A PyTorch implementation of EfficientNet and EfficientNetV2 (coming soon!)

EfficientNet PyTorch Quickstart Install with pip install efficientnet_pytorch and load a pretrained EfficientNet with: from efficientnet_pytorch impor

Luke Melas-Kyriazi 7.2k Jan 06, 2023
9th place solution in "Santa 2020 - The Candy Cane Contest"

Santa 2020 - The Candy Cane Contest My solution in this Kaggle competition "Santa 2020 - The Candy Cane Contest", 9th place. Basic Strategy In this co

toshi_k 22 Nov 26, 2021
The Python code for the paper A Hybrid Quantum-Classical Algorithm for Robust Fitting

About The Python code for the paper A Hybrid Quantum-Classical Algorithm for Robust Fitting The demo program was only tested under Conda in a standard

Anh-Dzung Doan 5 Nov 28, 2022
Unofficial implementation of MUSIQ (Multi-Scale Image Quality Transformer)

MUSIQ: Multi-Scale Image Quality Transformer Unofficial pytorch implementation of the paper "MUSIQ: Multi-Scale Image Quality Transformer" (paper link

41 Jan 02, 2023
Multi-modal Content Creation Model Training Infrastructure including the FACT model (AI Choreographer) implementation.

AI Choreographer: Music Conditioned 3D Dance Generation with AIST++ [ICCV-2021]. Overview This package contains the model implementation and training

Google Research 365 Dec 30, 2022
A GOOD REPRESENTATION DETECTS NOISY LABELS

A GOOD REPRESENTATION DETECTS NOISY LABELS This code is a PyTorch implementation of the paper: Prerequisites Python 3.6.9 PyTorch 1.7.1 Torchvision 0.

<a href=[email protected]"> 64 Jan 04, 2023
An intuitive library to extract features from time series

Time Series Feature Extraction Library Intuitive time series feature extraction This repository hosts the TSFEL - Time Series Feature Extraction Libra

Associação Fraunhofer Portugal Research 589 Jan 04, 2023