Python histogram library - histograms as updateable, fully semantic objects with visualization tools. [P]ython [HYST]ograms.

Overview

physt Physt logo

P(i/y)thon h(i/y)stograms. Inspired (and based on) numpy.histogram, but designed for humans(TM) on steroids(TM).

The goal is to unify different concepts of histograms as occurring in numpy, pandas, matplotlib, ROOT, etc. and to create one representation that is easily manipulated with from the data point of view and at the same time provides nice integration into IPython notebook and various plotting options. In short, whatever you want to do with histograms, physt aims to be on your side.

Note: bokeh plotting backend has been discontinued (due to external library being redesigned.)

Travis ReadTheDocs Join the chat at https://gitter.im/physt/Lobby PyPI version Anaconda-Server Badge Anaconda-Server Badge

Versioning

  • Versions 0.3.x support Python 2.7 (no new releases in 2019)
  • Versions 0.4.x support Python 3.5+ while continuing the 0.3 API
  • Versions 0.4.9+ support only Python 3.6+ while continuing the 0.3 API
  • Versions 0.5.x slightly change the interpretation of *args in h1, h2, ...

Simple example

from physt import h1

# Create the sample
heights = [160, 155, 156, 198, 177, 168, 191, 183, 184, 179, 178, 172, 173, 175,
           172, 177, 176, 175, 174, 173, 174, 175, 177, 169, 168, 164, 175, 188,
           178, 174, 173, 181, 185, 166, 162, 163, 171, 165, 180, 189, 166, 163,
           172, 173, 174, 183, 184, 161, 162, 168, 169, 174, 176, 170, 169, 165]

hist = h1(heights, 10)           # <--- get the histogram data
hist << 190                      # <--- add a forgotten value
hist.plot()                      # <--- and plot it

Heights plot

2D example

from physt import h2
import seaborn as sns

iris = sns.load_dataset('iris')
iris_hist = h2(iris["sepal_length"], iris["sepal_width"], "human", bin_count=[12, 7], name="Iris")
iris_hist.plot(show_zero=False, cmap="gray_r", show_values=True);

Iris 2D plot

3D directional example

import numpy as np
from physt import special_histograms

# Generate some sample data
data = np.empty((1000, 3))
data[:,0] = np.random.normal(0, 1, 1000)
data[:,1] = np.random.normal(0, 1.3, 1000)
data[:,2] = np.random.normal(1, .6, 1000)

# Get histogram data (in spherical coordinates)
h = special_histograms.spherical(data)                 

# And plot its projection on a globe
h.projection("theta", "phi").plot.globe_map(density=True, figsize=(7, 7), cmap="rainbow")   

Directional 3D plot

See more in docstring's and notebooks:

Installation

Using pip:

pip install physt

Features

Implemented

  • 1D histograms
  • 2D histograms
  • ND histograms
  • Some special histograms
    • 2D polar coordinates (with plotting)
    • 3D spherical / cylindrical coordinates (beta)
  • Adaptive rebinning for on-line filling of unknown data (beta)
  • Non-consecutive bins
  • Memory-effective histogramming of dask arrays (beta)
  • Understands any numpy-array-like object
  • Keep underflow / overflow / missed bins
  • Basic numeric operations (* / + -)
  • Items / slice selection (including mask arrays)
  • Add new values (fill, fill_n)
  • Cumulative values, densities
  • Simple statistics for original data (mean, std, sem)
  • Plotting with several backends
    • matplotlib (static plots with many options)
    • vega (interactive plots, beta, help wanted!)
    • folium (experimental for geo-data)
    • plotly (very basic, help wanted!)
    • ascii (experimental)
  • Algorithms for optimized binning
    • human-friendly
    • mathematical
  • IO, conversions
    • I/O JSON
    • I/O xarray.DataSet (experimental)
    • O ROOT file (experimental)
    • O pandas.DataFrame (basic)

Planned

  • Rebinning
    • using reference to original data?
    • merging bins
  • Statistics (based on original data)?
  • Stacked histograms (with names)
  • Potentially holoviews plotting backend (instead of the discontinued bokeh one)

Not planned

  • Kernel density estimates - use your favourite statistics package (like seaborn)
  • Rebinning using interpolation - it should be trivial to use rebin (https://github.com/jhykes/rebin) with physt

Rationale (for both): physt is dumb, but precise.

Dependencies

  • Python 3.5+
  • numpy
  • (optional) matplotlib - simple output
  • (optional) xarray - I/O
  • (optional) protobuf - I/O
  • (optional) uproot - I/O
  • (optional) astropy - additional binning algorithms
  • (optional) folium - map plotting
  • (optional) vega3 - for vega in-line in IPython notebook (note that to generate vega JSON, this is not necessary)
  • (optional) asciiplotlib - for ASCII bar plots
  • (optional) xtermcolot - for ASCII color maps
  • (testing) py.test, pandas
  • (docs) sphinx, sphinx_rtd_theme, ipython

Publicity

Talk at PyData Berlin 2018:

Contribution

I am looking for anyone interested in using / developing physt. You can contribute by reporting errors, implementing missing features and suggest new one.

Thanks to:

Patches:

Alternatives and inspirations

Comments
  • python 2.7 plotting is not working

    python 2.7 plotting is not working

    When runnin plot() function I get the error below even though matplotlib is installed. Also the algorithm is pretty slow when running on something bigger than toy example.

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/lib/python2.7/dist-packages/physt/plotting/__init__.py", line 137, in __call__
        return plot(self.histogram, kind=kind, **kwargs)
      File "/usr/local/lib/python2.7/dist-packages/physt/plotting/__init__.py", line 91, in plot
        backend_name, backend = _get_backend(backend)
      File "/usr/local/lib/python2.7/dist-packages/physt/plotting/__init__.py", line 70, in _get_backend
        raise RuntimeError("No plotting backend available. Please, install matplotlib (preferred) or bokeh (limited).")
    RuntimeError: No plotting backend available. Please, install matplotlib (preferred) or bokeh (limited).
    
    bug 
    opened by romange 13
  • Smooth polar histograms?

    Smooth polar histograms?

    Thanks for writing this awesome library!

    I have a question regarding smoothing of polar 2D histograms. I am constructing a histogram like described on this page https://physt.readthedocs.io/en/latest/special_histograms.html#Polar-histogram and now I want to smooth it with a Gaussian kernel (like scipy.ndimage.gaussian_filter). What is the most elegant / correct method to do that?

    question 
    opened by horsto 7
  • Rebinning histograms related project

    Rebinning histograms related project

    Hi I found a project on rebinning histogram at https://github.com/jhykes/rebin and I opened an issue (jhykes/rebin#5) on that project page asking about integrating his code to this project. I hope you will appreciate it.

    enhancement idea? 
    opened by DancingQuanta 7
  • Option to center labels on bins

    Option to center labels on bins

    If you have a large dataset with a small number of values (such as consisting only of integers 1-10) then it would be nice to have the bin x-axis labels at the center under the respective bin instead of at the bin edges.

    I recognise this case is more of a 'histogram as bar plot' kind of thing, but it is a use-case I have often.

    opened by nzjrs 5
  • Usage of spherical histogram

    Usage of spherical histogram

    Hi, I have tried the example of spherical histogram. After a small modification of the code (normalized the data as unit vectors),

    n = 100 data = np.empty((n, 3)) data[:,0] = np.random.normal(0, 1, n) data[:,1] = np.random.normal(0, 1, n) data[:,2] = np.random.normal(0, 1, n) for i in range(n): scale = np.sqrt(data[i,0]**2 + data[i,1]**2 + data[i,2]**2) data[i,0] = data[i,0]/scale data[i,1] = data[i,1]/scale data[i,2] = data[i,2]/scale

    h = special.spherical_histogram(data, theta_bins=20, phi_bins=20) ax.scatter(data[:,0], data[:,1], data[:,2])

    globe = h.projection("theta", "phi") globe.plot.globe_map(density=True, figsize=(7, 7), cmap="rainbow")

    plt.show()

    I got an error: “RuntimeError: Bins not in rising order.” What did I do wrong? Thank you for your support.

    question 
    opened by zhengpuchen 3
  • approximate histograms

    approximate histograms

    I'm following the paper (http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf) implemented by https://github.com/carsonfarmer/streamhist, and the notion of approximate histograms seems elegant and efficient.

    After seeing the internals of streamhist (trying to fix bugs) and reading the paper, I can imagine ways to make a better implementation: e.g. much more efficient discovery of bins to be joined, and avoiding temporary lists when possible. Also the code seems overly complex, partially due to features like "bin freezing" which try to workaround poor bin joining performance.

    Anyway since streamhist is defunct, I'm thinking about trying an implementation. I wonder if this kind of histogram would fit into physt (and if sortedcollections would be reasonable as a dependency).

    opened by belm0 3
  • please make this library discoverable

    please make this library discoverable

    name: physt (?) github tag line: P(i/y)thon h(i/y)stograms (???)

    google search for "python streaming histogram"

    • top result is https://github.com/carsonfarmer/streamhist (unused / unmaintained)
    • physt not in initial 10 pages of results...

    For over a year I've wanted to find a Python library which supports efficient histogram updates without a bunch of ugly dependencies. I've searched many times. Today I happened to get lucky by seeing physt mentioned at the bottom of a SO question (https://stackoverflow.com/questions/40627274/).

    To improve discoverability by search, please consider updating the github tag line to concisely and accurately describe the library (... rather than be cute).

    opened by belm0 2
  • Warning in current numpy

    Warning in current numpy

    If you try to merge bins:

    from physt import h2
    from scipy.stats import multivariate_normal
    hist = h2(*multivariate_normal.rvs((0,0), size=100_000).T, bins=100)
    hist.merge_bins(2)
    

    You get a warning from numpy:

    /home/schreihf/.local/lib/python3.7/site-packages/physt/histogram_base.py:572: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
      new_frequencies[new_index] += old_frequencies[old_index]
    /home/schreihf/.local/lib/python3.7/site-packages/physt/histogram_base.py:573: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
      new_errors2[new_index] += old_errors2[old_index]
    
    opened by henryiii 2
  • Add 2D & ND histograms

    Add 2D & ND histograms

    • [x] Analogous data model to Histogram1D
    • [x] refactor HistogramBase class -> common behaviour of 1D and 2D
    • [x] revisit binning schemas
    • [x] histogram2D facade function to be compatible with numpy one
    • [x] plotting
    • [x] arithmetic operations
    • [x] documentation
    • [ ] stats
    enhancement 
    opened by janpipek 2
  • ImportError with newer plotly

    ImportError with newer plotly

    [SOMEDIR}\physt\physt\plotting\plotly.py in <module>
         12 
         13 import plotly.offline as pyo
    ---> 14 import plotly.plotly as pyp
         15 import plotly.graph_objs as go
         16 
    
    ~\Miniconda3\lib\site-packages\plotly\plotly\__init__.py in <module>
          2 from _plotly_future_ import _chart_studio_error
          3 
    ----> 4 _chart_studio_error("plotly")
    
    ~\Miniconda3\lib\site-packages\_plotly_future_\__init__.py in _chart_studio_error(submodule)
         41 
         42 def _chart_studio_error(submodule):
    ---> 43     raise ImportError(
         44         """
         45 The plotly.{submodule} module is deprecated,
    
    ImportError: 
    The plotly.plotly module is deprecated,
    please install the chart-studio package and use the
    chart_studio.plotly module instead. 
    
    bug visualization 
    opened by janpipek 1
  • Wrong bars center in polar_map

    Wrong bars center in polar_map

    I have found that the bars in polar_map are centered on the left edge of the phi bins instead of their center. Because of this, the representation of the histogram does not coincide with the data, as in the figure below: polarmap_wrong

    I think this can be easily solved by replacing

    bars = ax.bar(phipos[i], dr[i], width=dphi[i], bottom=rpos[i], color=bin_color,

    with

    bars = ax.bar(phipos[i] + 0.5*dphi[i], dr[i], width=dphi[i], bottom=rpos[i], color=bin_color,

    in the definition of polar_map.

    By the way, thank you for this amazing package!

    bug visualization 
    opened by ruhugu 1
  • Be more explicit about bins too narrow for float representation

    Be more explicit about bins too narrow for float representation

    If the computed range for the binning divided by the number of bins is lower than the minimum float difference at the scale, we receive an error [ValueError: Bins not in rising order.] which is not very informative.

    To reproduce:

    data = [1, np.nextafter(1, 2)]
    physt.h1(data)
    

    It also happens when the range is 0, like in:

    data = [1, 1]
    physt.h1(data)
    
    enhancement 
    opened by janpipek 1
Releases(v0.5.2)
Owner
Jan Pipek
PyData Prague
Jan Pipek
The repository is my code for various types of data visualization cases based on the Matplotlib library.

ScienceGallery The repository is my code for various types of data visualization cases based on the Matplotlib library. It summarizes the code and cas

Warrick Xu 2 Apr 20, 2022
A flexible tool for creating, organizing, and sharing visualizations of live, rich data. Supports Torch and Numpy.

Visdom A flexible tool for creating, organizing, and sharing visualizations of live, rich data. Supports Python. Overview Concepts Setup Usage API To

FOSSASIA 9.4k Jan 07, 2023
Histogramming for analysis powered by boost-histogram

Hist Hist is an analyst-friendly front-end for boost-histogram, designed for Python 3.7+ (3.6 users get version 2.4). See what's new. Installation You

Scikit-HEP Project 97 Dec 25, 2022
Interactive Dashboard for Visualizing OSM Data Change

Dashboard and intuitive data downloader for more interactive experience with interpreting osm change data.

1 Feb 20, 2022
A command line tool for visualizing CSV/spreadsheet-like data

PerfPlotter Read data from CSV files using pandas and generate interactive plots using bokeh, which can then be embedded into HTML pages and served by

Gino Mempin 0 Jun 25, 2022
Simple implementation of Self Organizing Maps (SOMs) with rectangular and hexagonal grid topologies

py-self-organizing-map Simple implementation of Self Organizing Maps (SOMs) with rectangular and hexagonal grid topologies. A SOM is a simple unsuperv

Jonas Grebe 1 Feb 10, 2022
Minimal Ethereum fee data viewer for the terminal, contained in a single python script.

Minimal Ethereum fee data viewer for the terminal, contained in a single python script. Connects to your node and displays some metrics in real-time.

48 Dec 05, 2022
Plot and save the ground truth and predicted results of human 3.6 M and CMU mocap dataset.

Visualization-of-Human3.6M-Dataset Plot and save the ground truth and predicted results of human 3.6 M and CMU mocap dataset. human-motion-prediction

Gaurav Kumar Yadav 5 Nov 18, 2022
This project is created to visualize the system statistics such as memory usage, CPU usage, memory accessible by process and much more using Kibana Dashboard with Elasticsearch.

System Stats Visualizer This project is created to visualize the system statistics such as memory usage, CPU usage, memory accessible by process and m

Vishal Teotia 5 Feb 06, 2022
Fast data visualization and GUI tools for scientific / engineering applications

PyQtGraph A pure-Python graphics library for PyQt5/PyQt6/PySide2/PySide6 Copyright 2020 Luke Campagnola, University of North Carolina at Chapel Hill h

pyqtgraph 3.1k Jan 08, 2023
mysql relation charts

sqlcharts 自动生成数据库关联关系图 复制settings.py.example 重命名为settings.py 将数据库配置信息填入settings.DATABASE,目前支持mysql和postgresql 执行 python build.py -b,-b是读取数据库表结构,如果只更新匹

6 Aug 22, 2022
NW 2022 Hackathon Project by Angelique Clara Hanzel, Aryan Sonik, Damien Fung, Ramit Brata Biswas

Spiral-Data-Visualizer NW 2022 Hackathon Project by Angelique Clara Hanzell, Aryan Sonik, Damien Fung, Ramit Brata Biswas Description This project vis

Damien Fung 2 Jan 16, 2022
A script written in Python that generate output custom color (HEX or RGB input to x1b hexadecimal)

ColorShell ─ 1.5 Planned for v2: setup.sh for setup alias This script converts HEX and RGB code to x1b x1b is code for colorize outputs, works on ou

Riley 4 Oct 31, 2021
The visual framework is designed on the idea of module and implemented by mixin method

Visual Framework The visual framework is designed on the idea of module and implemented by mixin method. Its biggest feature is the mixins module whic

LEFTeyes 9 Sep 19, 2022
The interactive graphing library for Python (includes Plotly Express) :sparkles:

plotly.py Latest Release User forum PyPI Downloads License Data Science Workspaces Our recommended IDE for Plotly’s Python graphing library is Dash En

Plotly 12.7k Jan 05, 2023
Python library that makes it easy for data scientists to create charts.

Chartify Chartify is a Python library that makes it easy for data scientists to create charts. Why use Chartify? Consistent input data format: Spend l

Spotify 3.2k Jan 04, 2023
Rockstar - Makes you a Rockstar C++ Programmer in 2 minutes

Rockstar Rockstar is one amazing library, which will make you a Rockstar Programmer in just 2 minutes. In last decade, people learned C++ in 21 days.

4k Jan 05, 2023
Curvipy - The Python package for visualizing curves and linear transformations in a super simple way

Curvipy - The Python package for visualizing curves and linear transformations in a super simple way

Dylan Tintenfich 55 Dec 28, 2022
view cool stats related to your discord account.

DiscoStats cool statistics generated using your discord data. How? DiscoStats is not a service that breaks the Discord Terms of Service or Community G

ibrahim hisham 5 Jun 02, 2022
python partial dependence plot toolbox

PDPbox python partial dependence plot toolbox Motivation This repository is inspired by ICEbox. The goal is to visualize the impact of certain feature

Li Jiangchun 723 Jan 07, 2023