Stochastic Gradient Trees implementation in Python

Last update: Nov 18, 2022

Overview

Stochastic Gradient Trees - Python

Stochastic Gradient Trees¹ by Henry Gouk, Bernhard Pfahringer, and Eibe Frank implementation in Python. Based on the parer's accompanied repository code.

Python Version 3.7 or later

Used Python libraries:

numpy>=1.20.2
scipy>=1.6.2
pandas>=1.3.3
scikit-learn>=0.24.2

Usage:

    from StochasticGradientTree import StochasticGradientTreeClassifier

    from sklearn.model_selection import train_test_split
    from sklearn.datasets import load_breast_cancer
    from sklearn.metrics import confusion_matrix, accuracy_score, log_loss

    def train(X, y):

        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.34)
        
        tree = StochasticGradientTreeClassifier()

        tree.fit(X_train, y_train)
    
        y_pred = tree.predict(X_test)

        proba = tree.predict_proba(X_test)        

        acc_test = accuracy_score(y_test, y_pred)
        print(confusion_matrix(y_test, y_pred))
        print('Acc test: ', acc_test)
        print('Cross entropy loss: ', log_loss(y_test, proba))

        return tree, acc_test

    if __name__ == "__main__":

        breast = load_breast_cancer(as_frame=True)

        X = breast.frame.copy()
        y = breast.frame.target
        
        X.drop(['target'], axis=1, inplace=True) 

        tree, _ = train(X, y)

Binary classification example:

python classification_breast.py

Multiclass classification (using the One-vs-the-rest multiclass strategy):

python classification_iris.py

Regression example:

python regression_diabetes.py

Gouk, H., Pfahringer, B., and Frank, E. Stochastic gradient trees. In Proceedings of The Eleventh Asian Conference on Machine Learning, volume 101 of Proceedings of Machine Learning Research, pp. 1094–1109. PMLR, 2019. ↩

Stochastic Gradient Trees implementation in Python

Related tags

Overview

Stochastic Gradient Trees - Python

Python Version 3.7 or later

Used Python libraries:

Usage:

Binary classification example:

Multiclass classification (using the One-vs-the-rest multiclass strategy):

Regression example:

Owner

John Koumentis

Tools for working with MARC data in Catalogue Bridge.

Clean and reusable data-sciency notebooks.

Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Employee Turnover Analysis

a tool that compiles a csv of all h1 program stats

Programmatically access the physical and chemical properties of elements in modern periodic table.

A crude Hy handle on Pandas library

Gaussian processes in TensorFlow

Maximum Covariance Analysis in Python

Predictive Modeling & Analytics on Home Equity Line of Credit

A Python package for the mathematical modeling of infectious diseases via compartmental models

Repository created with LinkedIn profile analysis project done

PyClustering is a Python, C++ data mining library.

BasstatPL is a package for performing different tabulations and calculations for descriptive statistics.

Educational project on how to build an ETL (Extract, Transform, Load) data pipeline, orchestrated with Airflow.

Tokyo 2020 Paralympics, Analytics

Data imputations library to preprocess datasets with missing data

DaDRA (day-druh) is a Python library for Data-Driven Reachability Analysis.

Additional tools for particle accelerator data analysis and machine information

Cleaning and analysing aggregated UK political polling data.

Stochastic Gradient Trees implementation in Python

Related tags

Overview

Stochastic Gradient Trees - Python

Python Version 3.7 or later

Used Python libraries:

Usage:

Binary classification example:

Multiclass classification (using the One-vs-the-rest multiclass strategy):

Regression example:

Footnotes

Owner

John Koumentis

Tools for working with MARC data in Catalogue Bridge.

Clean and reusable data-sciency notebooks.

Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Employee Turnover Analysis

a tool that compiles a csv of all h1 program stats

Programmatically access the physical and chemical properties of elements in modern periodic table.

A crude Hy handle on Pandas library

Gaussian processes in TensorFlow

Maximum Covariance Analysis in Python

Predictive Modeling & Analytics on Home Equity Line of Credit

A Python package for the mathematical modeling of infectious diseases via compartmental models

Repository created with LinkedIn profile analysis project done

PyClustering is a Python, C++ data mining library.

BasstatPL is a package for performing different tabulations and calculations for descriptive statistics.

Educational project on how to build an ETL (Extract, Transform, Load) data pipeline, orchestrated with Airflow.

Tokyo 2020 Paralympics, Analytics

Data imputations library to preprocess datasets with missing data

DaDRA (day-druh) is a Python library for Data-Driven Reachability Analysis.

Additional tools for particle accelerator data analysis and machine information

Cleaning and analysing aggregated UK political polling data.