pandas, scikit-learn, xgboost and seaborn integration

Last update: Dec 30, 2022

Related tags

Machine Learning pandas-ml

Overview

pandas-ml

https://travis-ci.org/pandas-ml/pandas-ml.svg?branch=master

Overview

pandas, scikit-learn and xgboost integration.

Installation

$ pip install pandas_ml

Documentation

http://pandas-ml.readthedocs.org/en/stable/

Example

>>> import pandas_ml as pdml
>>> import sklearn.datasets as datasets

# create ModelFrame instance from sklearn.datasets
>>> df = pdml.ModelFrame(datasets.load_digits())
>>> type(df)
<class 'pandas_ml.core.frame.ModelFrame'>

# binarize data (features), not touching target
>>> df.data = df.data.preprocessing.binarize()
>>> df.head()
   .target  0  1  2  3  4  5  6  7  8 ...  54  55  56  57  58  59  60  61  62  63
0        0  0  0  1  1  1  1  0  0  0 ...   0   0   0   0   1   1   1   0   0   0
1        1  0  0  0  1  1  1  0  0  0 ...   0   0   0   0   0   1   1   1   0   0
2        2  0  0  0  1  1  1  0  0  0 ...   1   0   0   0   0   1   1   1   1   0
3        3  0  0  1  1  1  1  0  0  0 ...   1   0   0   0   1   1   1   1   0   0
4        4  0  0  0  1  1  0  0  0  0 ...   0   0   0   0   0   1   1   1   0   0
[5 rows x 65 columns]

# split to training and test data
>>> train_df, test_df = df.model_selection.train_test_split()

# create estimator (accessor is mapped to sklearn namespace)
>>> estimator = df.svm.LinearSVC()

# fit to training data
>>> train_df.fit(estimator)

# predict test data
>>> test_df.predict(estimator)
0     4
1     2
2     7
...
448    5
449    8
Length: 450, dtype: int64

# Evaluate the result
>>> test_df.metrics.confusion_matrix()
Predicted   0   1   2   3   4   5   6   7   8   9
Target
0          52   0   0   0   0   0   0   0   0   0
1           0  37   1   0   0   1   0   0   3   3
2           0   2  48   1   0   0   0   1   1   0
3           1   1   0  44   0   1   0   0   3   1
4           1   0   0   0  43   0   1   0   0   0
5           0   1   0   0   0  39   0   0   0   0
6           0   1   0   0   1   0  35   0   0   0
7           0   0   0   0   2   0   0  42   1   0
8           0   2   1   0   1   0   0   0  33   1
9           0   2   1   2   0   0   0   0   1  38

Supported Packages

scikit-learn
patsy
xgboost

Comments

Fixed imports of deprecated modules which were removed in pandas 0.24.0

Certain functions were deprecated in a previous version of pandas and moved to a different module (see #117). This PR fixes the imports of those functions.

opened by kristofve 8
REL: v0.4.0
[x] Compat/test for sklearn 0.18.0 (#81)

[x] initial fix (#81)

[x] wrapper for cross validation classes (re-enable skipped tests) (#85)

[x] tests for multioutput (#86)

[x] Update doc

[x] Compat/test for pandas 0.19.0 (#83)

[x] Update release note (#88)
opened by sinhrks 4
Importation error

I tried to import pandas_ml but it gave the error :

AttributeError: type object 'NDFrame' has no attribute 'groupby'

I'm running python3.8.1 and I installed pandas_ml via pip (version 20.0.2)

I dig in the code, error is l.80 of file series.py

@Appender(pd.core.generic.NDFrame.groupby.__doc__)

Here pandas is imported at the top of the file with a classic import pandas as pd

I guess there is a problem with the versions...

Thanks in advance for any help

opened by ierezell 2
Confusion Matrix no accessible

Hi,

I've been using confusion_matrix since it was an independent package. I've installed pandas_ml to continue using the package, but it seems that the setup.py script does not install the package.

Could it be an issue with the find_packages function?

opened by mmartinortiz 2

Seaborn Scatterplot matrix / pairplot integration

import seaborn as sns
sns.set()

df = sns.load_dataset("iris")
sns.pairplot(df, hue="species")

displays

iris_scatter_matrix

but pairplot doesn't work the same way with ModelFrame

import pandas as pd
pd.set_option('max_rows', 10)
import sklearn.datasets as datasets
import pandas_ml as pdml  # https://github.com/pandas-ml/pandas-ml
import seaborn as sns
import matplotlib.pyplot as plt
df = pdml.ModelFrame(datasets.load_iris())
sns.pairplot(df, hue=".target")

iris_modelframe

There is some useless subplots

opened by scls19fr 2

Error while running train.py from speech commands in tensorflow examples.

Have the following error: File "train.py", line 27, in <module> from callbacks import ConfusionMatrixCallback File "/home/tesseract/ayush_workspace/NLP/WakeWord/tensorflow_trainer/ml/callbacks.py", line 21, in <module> from pandas_ml import ConfusionMatrix File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/__init__.py", line 3, in <module> from pandas_ml.core import ModelFrame, ModelSeries # noqa File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/__init__.py", line 3, in <module> from pandas_ml.core.frame import ModelFrame # noqa File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/frame.py", line 18, in <module> from pandas_ml.core.series import ModelSeries File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/series.py", line 11, in <module> class ModelSeries(ModelTransformer, pd.Series): File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/series.py", line 80, in ModelSeries @Appender(pd.core.generic.NDFrame.groupby.__doc__) AttributeError: type object 'NDFrame' has no attribute 'groupby' Happening with both version 5 and 6.1

opened by ayush7 1
error for example https://pandas-ml.readthedocs.io/en/latest/xgboost.html

code from example https://pandas-ml.readthedocs.io/en/latest/xgboost.html '''import pandas_ml as pdml import sklearn.datasets as datasets df = pdml.ModelFrame(datasets.load_digits()) train_df, test_df = df.cross_validation.train_test_split() estimator = df.xgboost.XGBClassifier() train_df.fit(estimator) predicted = test_df.predict(estimator) q=1 test_df.metrics.confusion_matrix() train_df.xgboost.plot_importance()

tuned_parameters = [{'max_depth': [3, 4]}] cv = df.grid_search.GridSearchCV(df.xgb.XGBClassifier(), tuned_parameters, cv=5)

df.fit(cv) df.grid_search.describe(cv) q=1

'''

gives error ''' File "E:\Pandas\my_code\S_pandas_ml_feb27.py", line 10, in train_df.xgboost.plot_importance() File "C:\Users\sndr\Anaconda3\Lib\site-packages\pandas_ml\xgboost\base.py", line 61, in plot_importance return xgb.plot_importance(self._df.estimator.booster(),

builtins.TypeError: 'str' object is not callable ''' I use Windows and 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 10:22:32) [MSC v.1900 64 bit (AMD64)] Python Type "help", "copyright", "credits" or "license" for more information.

opened by Sandy4321 1
pandas 0.24.0 has deprecated pandas.util.decorators

See https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.24.0.html#deprecations

This causes the import statement in https://github.com/pandas-ml/pandas-ml/blob/master/pandas_ml/core/frame.py to break.

Looks like just need to change it to 'from pandas.utils'

opened by usul83 1
'mean_absoloute_error

from sklearn import metrics print('MAE:',metrics.mean_absoloute_error(y_test,y_pred)) module 'sklearn.metrics' has no attribute 'mean_absoloute_error This error is occurred..any solution

opened by vikramk1507 0
AttributeError: type object 'NDFrame' has no attribute 'groupby'

AttributeError: type object 'NDFrame' has no attribute 'groupby'

from pandas_ml import ConfusionMatrix cm = ConfusionMatrix(actu, pred) cm.print_stats()

AttributeError Traceback (most recent call last) in ----> 1 from pandas_ml import confusion_matrix 2 3 cm = ConfusionMatrix(actu, pred) 4 cm.print_stats()

/usr/local/lib/python3.8/site-packages/pandas_ml/init.py in 1 #!/usr/bin/env python 2 ----> 3 from pandas_ml.core import ModelFrame, ModelSeries # noqa 4 from pandas_ml.tools import info # noqa 5 from pandas_ml.version import version as version # noqa

/usr/local/lib/python3.8/site-packages/pandas_ml/core/init.py in 1 #!/usr/bin/env python 2 ----> 3 from pandas_ml.core.frame import ModelFrame # noqa 4 from pandas_ml.core.series import ModelSeries # noqa

/usr/local/lib/python3.8/site-packages/pandas_ml/core/frame.py in 16 from pandas_ml.core.accessor import _AccessorMethods 17 from pandas_ml.core.generic import ModelPredictor, _shared_docs ---> 18 from pandas_ml.core.series import ModelSeries 19 20

/usr/local/lib/python3.8/site-packages/pandas_ml/core/series.py in 9 10 ---> 11 class ModelSeries(ModelTransformer, pd.Series): 12 """ 13 Wrapper for pandas.Series to support sklearn.preprocessing

/usr/local/lib/python3.8/site-packages/pandas_ml/core/series.py in ModelSeries() 78 return df 79 ---> 80 @Appender(pd.core.generic.NDFrame.groupby.doc) 81 def groupby(self, by=None, axis=0, level=None, as_index=True, sort=True, 82 group_keys=True, squeeze=False):

AttributeError: type object 'NDFrame' has no attribute 'groupby'

opened by gfranco008 5
AttributeError: module 'sklearn.metrics' has no attribute 'jaccard_similarity_score'

I am using scikit-learn version 0.23.1 and I get the following error: AttributeError: module 'sklearn.metrics' has no attribute 'jaccard_similarity_score' when calling the function ConfusionMatrix.

opened by petraknovak 11
Error while running train.py from speech commands in tensorflow examples. AttributeError: type object 'NDFrame' has no attribute 'groupby'

Have the following error: File "train.py", line 27, in <module> from callbacks import ConfusionMatrixCallback File "/home/tesseract/ayush_workspace/NLP/WakeWord/tensorflow_trainer/ml/callbacks.py", line 21, in <module> from pandas_ml import ConfusionMatrix File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/__init__.py", line 3, in <module> from pandas_ml.core import ModelFrame, ModelSeries # noqa File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/__init__.py", line 3, in <module> from pandas_ml.core.frame import ModelFrame # noqa File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/frame.py", line 18, in <module> from pandas_ml.core.series import ModelSeries File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/series.py", line 11, in <module> class ModelSeries(ModelTransformer, pd.Series): File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/series.py", line 80, in ModelSeries @Appender(pd.core.generic.NDFrame.groupby.__doc__) AttributeError: type object 'NDFrame' has no attribute 'groupby' Happening with both version 5 and 6.1

opened by ayush7 3

Pandas 1.0.0rc0/0.6.1 module 'sklearn.preprocessing' has no attribute 'Imputer'

SKLEARN

sklearn.preprocessing.Imputer Warning DEPRECATED

class sklearn.preprocessing.Imputer(*args, **kwargs)[source] Imputation transformer for completing missing values.

Releases(v0.6.1)

v0.6.1(Mar 5, 2019)

Source code(tar.gz)
Source code(zip)
v0.6.0(Jan 15, 2019)

Source code(tar.gz)
Source code(zip)
v0.5.0(Nov 16, 2017)

Source code(tar.gz)
Source code(zip)
v0.4.0(Oct 15, 2016)
Support scikit-learn v0.17.x and v0.18.0.

Support imbalanced-learn via .imbalance accessor.

Added pandas_ml.ConfusionMatrix class for easier classification results evaluation.

Source code(tar.gz)
Source code(zip)
v0.3.0(Oct 22, 2015)

Source code(tar.gz)
Source code(zip)
v0.2.0(Sep 12, 2015)

Source code(tar.gz)
Source code(zip)
pandas_ml-0.2.0.tar.gz(41.68 KB)
v0.1.1(Mar 13, 2015)

Source code(tar.gz)
Source code(zip)
v0.1.0(Mar 7, 2015)

Source code(tar.gz)
Source code(zip)
v0.0.1(Mar 1, 2015)

Source code(tar.gz)
Source code(zip)

Owner

GitHub Repository

Combines MLflow with a database (PostgreSQL) and a reverse proxy (NGINX) into a multi-container Docker application

Combines MLflow with a database (PostgreSQL) and a reverse proxy (NGINX) into a multi-container Docker application (with docker-compose).

2 Dec 03, 2021

Code for the TCAV ML interpretability project

Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) Been Kim, Martin Wattenberg, Justin Gilmer, C

552 Dec 27, 2022

This is an auto-ML tool specialized in detecting of outliers

Auto-ML tool specialized in detecting of outliers Description This tool will allows you, with a Dash visualization, to compare 10 models of machine le

1 Nov 03, 2021

50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster

[Due to the time taken @ uni, work + hell breaking loose in my life, since things have calmed down a bit, will continue commiting!!!] [By the way, I'm

1.4k Jan 01, 2023

PySurvival is an open source python package for Survival Analysis modeling

PySurvival What is Pysurvival ? PySurvival is an open source python package for Survival Analysis modeling - the modeling concept used to analyze or p

265 Dec 27, 2022

Microsoft contributing libraries, tools, recipes, sample codes and workshop contents for machine learning & deep learning.

366 Jan 03, 2023

A Python-based application demonstrating various search algorithms, namely Depth-First Search (DFS), Breadth-First Search (BFS), and A* Search (Manhattan Distance Heuristic)

A Python-based application demonstrating various search algorithms, namely Depth-First Search (DFS), Breadth-First Search (BFS), and the A* Search (using the Manhattan Distance Heuristic)

17 Aug 14, 2022

Time series changepoint detection

changepy Changepoint detection in time series in pure python Install pip install changepy Examples from changepy import pelt from cha

92 Nov 08, 2022

TensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models.

TensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models. The library is a collection of Keras models

538 Jan 01, 2023

XManager: A framework for managing machine learning experiments 🧑‍🔬

XManager is a platform for packaging, running and keeping track of machine learning experiments. It currently enables one to launch experiments locally or on Google Cloud Platform (GCP). Interaction

620 Dec 27, 2022

Transpile trained scikit-learn estimators to C, Java, JavaScript and others.

sklearn-porter Transpile trained scikit-learn estimators to C, Java, JavaScript and others. It's recommended for limited embedded systems and critical

1.2k Jan 05, 2023

Automatically create Faiss knn indices with the most optimal similarity search parameters.

It selects the best indexing parameters to achieve the highest recalls given memory and query speed constraints.

419 Jan 01, 2023

MaD GUI is a basis for graphical annotation and computational analysis of time series data.

MaD GUI Machine Learning and Data Analytics Graphical User Interface MaD GUI is a basis for graphical annotation and computational analysis of time se

10 Dec 19, 2022

Pydantic based mock data generation

This library offers powerful mock data generation capabilities for pydantic based models. It can also be used with other libraries that use pydantic as a foundation, for example SQLModel, Beanie and

396 Dec 28, 2022

LILLIE: Information Extraction and Database Integration Using Linguistics and Learning-Based Algorithms

LILLIE: Information Extraction and Database Integration Using Linguistics and Learning-Based Algorithms Based on the work by Smith et al. (2021) Query

5 Aug 06, 2022

Neighbourhood Retrieval (Nearest Neighbours) with Distance Correlation.

Neighbourhood Retrieval with Distance Correlation Assign Pseudo class labels to datapoints in the latent space. NNDC is a slim wrapper around FAISS. N

1 Jan 16, 2022

A Python library for detecting patterns and anomalies in massive datasets using the Matrix Profile

matrixprofile-ts matrixprofile-ts is a Python 2 and 3 library for evaluating time series data using the Matrix Profile algorithms developed by the Keo

696 Dec 26, 2022

Unofficial pytorch implementation of the paper "Context Reasoning Attention Network for Image Super-Resolution (ICCV 2021)"

CRAN Unofficial pytorch implementation of the paper "Context Reasoning Attention Network for Image Super-Resolution (ICCV 2021)" This code doesn't exa

4 Nov 11, 2021

Model factory is a ML training platform to help engineers to build ML models at scale

Model Factory Machine learning today is powering many businesses today, e.g., search engine, e-commerce, news or feed recommendation. Training high qu

16 Sep 23, 2022

CrayLabs and user contibuted examples of using SmartSim for various simulation and machine learning applications.

SmartSim Example Zoo This repository contains CrayLabs and user contibuted examples of using SmartSim for various simulation and machine learning appl

14 Mar 30, 2022

pandas, scikit-learn, xgboost and seaborn integration

Related tags

Overview

pandas-ml

Overview

Installation

Documentation

Example

Supported Packages

Comments

Releases(v0.6.1)

v0.6.1(Mar 5, 2019)

v0.6.0(Jan 15, 2019)

v0.5.0(Nov 16, 2017)

v0.4.0(Oct 15, 2016)

v0.3.0(Oct 22, 2015)

v0.2.0(Sep 12, 2015)

v0.1.1(Mar 13, 2015)

v0.1.0(Mar 7, 2015)

v0.0.1(Mar 1, 2015)

Owner

Combines MLflow with a database (PostgreSQL) and a reverse proxy (NGINX) into a multi-container Docker application

Code for the TCAV ML interpretability project

This is an auto-ML tool specialized in detecting of outliers

50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster

PySurvival is an open source python package for Survival Analysis modeling

Microsoft contributing libraries, tools, recipes, sample codes and workshop contents for machine learning & deep learning.

A Python-based application demonstrating various search algorithms, namely Depth-First Search (DFS), Breadth-First Search (BFS), and A* Search (Manhattan Distance Heuristic)

Time series changepoint detection

TensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models.

XManager: A framework for managing machine learning experiments 🧑‍🔬

Transpile trained scikit-learn estimators to C, Java, JavaScript and others.

Automatically create Faiss knn indices with the most optimal similarity search parameters.

MaD GUI is a basis for graphical annotation and computational analysis of time series data.

Pydantic based mock data generation

LILLIE: Information Extraction and Database Integration Using Linguistics and Learning-Based Algorithms

Neighbourhood Retrieval (Nearest Neighbours) with Distance Correlation.

A Python library for detecting patterns and anomalies in massive datasets using the Matrix Profile

Unofficial pytorch implementation of the paper "Context Reasoning Attention Network for Image Super-Resolution (ICCV 2021)"

Model factory is a ML training platform to help engineers to build ML models at scale

CrayLabs and user contibuted examples of using SmartSim for various simulation and machine learning applications.