Fast scatter density plots for Matplotlib

Overview

Azure Status Coverage Status

About

Plotting millions of points can be slow. Real slow... 😴

So why not use density maps?

The mpl-scatter-density mini-package provides functionality to make it easy to make your own scatter density maps, both for interactive and non-interactive use. Fast. The following animation shows real-time interactive use with 10 million points, but interactive performance is still good even with 100 million points (and more if you have enough RAM).

Demo of mpl-scatter-density with NY taxi data

When panning, the density map is shown at a lower resolution to keep things responsive (though this is customizable).

To install, simply do:

pip install mpl-scatter-density

This package requires Numpy, Matplotlib, and fast-histogram - these will be installed by pip if they are missing. Both Python 2.7 and Python 3.x are supported, and the package should work correctly on Linux, MacOS X, and Windows.

Usage

There are two main ways to use mpl-scatter-density, both of which are explained below.

scatter_density method

The easiest way to use this package is to simply import mpl_scatter_density, then create Matplotlib axes as usual but adding a projection='scatter_density' option (if your reaction is 'wait, what?', see here). This will return a ScatterDensityAxes instance that has a scatter_density method in addition to all the usual methods (scatter, plot, etc.).

import numpy as np
import mpl_scatter_density
import matplotlib.pyplot as plt

# Generate fake data

N = 10000000
x = np.random.normal(4, 2, N)
y = np.random.normal(3, 1, N)

# Make the plot - note that for the projection option to work, the
# mpl_scatter_density module has to be imported above.

fig = plt.figure()
ax = fig.add_subplot(1, 1, 1, projection='scatter_density')
ax.scatter_density(x, y)
ax.set_xlim(-5, 10)
ax.set_ylim(-5, 10)
fig.savefig('gaussian.png')

Which gives:

Result from the example script

The scatter_density method takes the same options as imshow (for example cmap, alpha, norm, etc.), but also takes the following optional arguments:

  • dpi: this is an integer that is used to determine the resolution of the density map. By default, this is 72, but you can change it as needed, or set it to None to use the default for the Matplotlib backend you are using.
  • downres_factor: this is an integer that is used to determine how much to downsample the density map when panning in interactive mode. Set this to 1 if you don't want any downsampling.
  • color: this can be set to any valid matplotlib color, and will be used to automatically make a monochromatic colormap based on this color. The colormap will fade to transparent, which means that this mode is ideal when showing multiple density maps together.

Here is an example of using the color option:

import numpy as np
import matplotlib.pyplot as plt
import mpl_scatter_density  # noqa

fig = plt.figure()
ax = fig.add_subplot(1, 1, 1, projection='scatter_density')

n = 10000000

x = np.random.normal(0.5, 0.3, n)
y = np.random.normal(0.5, 0.3, n)

ax.scatter_density(x, y, color='red')

x = np.random.normal(1.0, 0.2, n)
y = np.random.normal(0.6, 0.2, n)

ax.scatter_density(x, y, color='blue')

ax.set_xlim(-0.5, 1.5)
ax.set_ylim(-0.5, 1.5)

fig.savefig('double.png')

Which produces the following output:

Result from the example script

ScatterDensityArtist

If you are a more experienced Matplotlib user, you might want to use the ScatterDensityArtist directly (this is used behind the scenes in the above example). To use this, initialize the ScatterDensityArtist with the axes as first argument, followed by any arguments you would have passed to scatter_density above (you can also take a look at the docstring for ScatterDensityArtist). You should then add the artist to the axes:

from mpl_scatter_density import ScatterDensityArtist
a = ScatterDensityArtist(ax, x, y)
ax.add_artist(a)

Advanced

Non-linear stretches for high dynamic range plots

In some cases, your density map might have a high dynamic range, and you might therefore want to show the log of the counts rather than the counts. You can do this by passing a matplotlib.colors.Normalize object to the norm argument in the same wasy as for imshow. For example, the astropy package includes a nice framework for making such a Normalize object for different functions. The following example shows how to show the density map on a log scale:

import numpy as np
import mpl_scatter_density
import matplotlib.pyplot as plt

# Make the norm object to define the image stretch
from astropy.visualization import LogStretch
from astropy.visualization.mpl_normalize import ImageNormalize
norm = ImageNormalize(vmin=0., vmax=1000, stretch=LogStretch())

N = 10000000
x = np.random.normal(4, 2, N)
y = np.random.normal(3, 1, N)

fig = plt.figure()
ax = fig.add_subplot(1, 1, 1, projection='scatter_density')
ax.scatter_density(x, y, norm=norm)
ax.set_xlim(-5, 10)
ax.set_ylim(-5, 10)
fig.savefig('gaussian_log.png')

Which produces the following output:

Result from the example script

Adding a colorbar

You can show a colorbar in the same way as you would for an image - the following example shows how to do it:

import numpy as np
import mpl_scatter_density
import matplotlib.pyplot as plt

N = 10000000
x = np.random.normal(4, 2, N)
y = np.random.normal(3, 1, N)

fig = plt.figure()
ax = fig.add_subplot(1, 1, 1, projection='scatter_density')
density = ax.scatter_density(x, y)
ax.set_xlim(-5, 10)
ax.set_ylim(-5, 10)
fig.colorbar(density, label='Number of points per pixel')
fig.savefig('gaussian_colorbar.png')

Which produces the following output:

Result from the example script

Color-coding 'markers' with individual values

In the same way that a 1-D array of values can be passed to Matplotlib's scatter function/method, a 1-D array of values can be passed to scatter_density using the c= argument:

import numpy as np
import mpl_scatter_density
import matplotlib.pyplot as plt

N = 10000000
x = np.random.normal(4, 2, N)
y = np.random.normal(3, 1, N)
c = x - y + np.random.normal(0, 5, N)

fig = plt.figure()
ax = fig.add_subplot(1, 1, 1, projection='scatter_density')
ax.scatter_density(x, y, c=c, vmin=-10, vmax=+10, cmap=plt.cm.RdYlBu)
ax.set_xlim(-5, 13)
ax.set_ylim(-5, 11)
fig.savefig('gaussian_color_coded.png')

Which produces the following output:

Result from the example script

Note that to keep performance as good as possible, the values from the c attribute are averaged inside each pixel of the density map, then the colormap is applied. This is a little different to what scatter would converge to in the limit of many points (since in that case it would apply the color to all the markers than average the colors).

Q&A

Isn't this basically the same as datashader?

This follows the same ideas as datashader, but the aim of mpl-scatter-density is specifically to bring datashader-like functionality to Matplotlib users. Furthermore, mpl-scatter-density is intended to be very easy to install - for example it can be installed with pip. But if you have datashader installed and regularly use bokeh, mpl-scatter-density won't do much for you. Note that if you are interested in datashader and Matplotlib together, there is a work in progress (pull request) by @tacaswell to create a Matplotlib artist similar to that in this package but powered by datashader.

What about vaex?

Vaex is a powerful package to visualize large datasets on N-dimensional grids, and therefore has some functionality that overlaps with what is here. However, the aim of mpl-scatter-density is just to provide a lightweight solution to make it easy for users already using Matplotlib to add scatter density maps to their plots rather than provide a complete environment for data visualization. I highly recommend that you take a look at Vaex and determine which approach is right for you!

Why on earth have you defined scatter_density as a projection?

If you are a Matplotlib developer: I truly am sorry for distorting the intended purpose of projection 😊 . But you have to admit that it's a pretty convenient way to have users get a custom Axes sub-class even if it has nothing to do with actual projection!

Where do you see this going?

There are a number of things we could add to this package, for example a way to plot density maps as contours, or a way to color code each point by a third quantity and have that reflected in the density map. If you have ideas, please open issues, and even better contribute a pull request! 😄

Can I contribute?

I'm glad you asked - of course you are very welcome to contribute! If you have some ideas, you can open issues or create a pull request directly. Even if you don't have time to contribute actual code changes, I would love to hear from you if you are having issues using this package.

[![Build Status](https://dev.azure.com/thomasrobitaille/mpl-scatter-density/_apis/build/status/astrofrog.mpl-scatter-density?branchName=master)](https://dev.azure.com/thomasrobitaille/mpl-scatter-density/_build/latest?definitionId=17&branchName=master)

Running tests

To run the tests, you will need pytest and the pytest-mpl plugin. You can then run the tests with:

pytest mpl_scatter_density --mpl
Owner
Thomas Robitaille
Thomas Robitaille
A simple python script using Numpy and Matplotlib library to plot a Mohr's Circle when given a two-dimensional state of stress.

Mohr's Circle Calculator This is a really small personal project done for Department of Civil Engineering, Delhi Technological University (formerly, D

Agyeya Mishra 0 Jul 17, 2021
Make scripted visualizations in blender

Scripted visualizations in blender The goal of this project is to script 3D scientific visualizations using blender. To achieve this, we aim to bring

Praneeth Namburi 10 Jun 01, 2022
Extract data from ThousandEyes REST API and visualize it on your customized Grafana Dashboard.

ThousandEyes Grafana Dashboard Extract data from the ThousandEyes REST API and visualize it on your customized Grafana Dashboard. Deploy Grafana, Infl

Flo Pachinger 16 Nov 26, 2022
A toolkit to generate MR sequence diagrams

mrsd: a toolkit to generate MR sequence diagrams mrsd is a Python toolkit to generate MR sequence diagrams, as shown below for the basic FLASH sequenc

Julien Lamy 3 Dec 25, 2021
Pebble is a stat's visualization tool, this will provide a skeleton to develop a monitoring tool.

Pebble is a stat's visualization tool, this will provide a skeleton to develop a monitoring tool.

Aravind Kumar G 2 Nov 17, 2021
又一个云探针

ServerStatus-Murasame 感谢ServerStatus-Hotaru,又一个云探针诞生了(大雾 本项目在ServerStatus-Hotaru的基础上使用fastapi重构了服务端,部分修改了客户端与前端 项目还在非常原始的阶段,可能存在严重的问题 演示站:https://stat

6 Oct 19, 2021
Implement the Perspective open source code in preparation for data visualization

Task Overview | Installation Instructions | Link to Module 2 Introduction Experience Technology at JP Morgan Chase Try out what real work is like in t

Abdulazeez Jimoh 1 Jan 23, 2022
This is a Web scraping project using BeautifulSoup and Python to scrape basic information of all the Test matches played till Jan 2022.

Scraping-test-matches-data This is a Web scraping project using BeautifulSoup and Python to scrape basic information of all the Test matches played ti

Souradeep Banerjee 4 Oct 10, 2022
Sprint planner considering JIRA issues and google calendar meetings schedule.

Sprint planner Sprint planner is a Python script for planning your Jira tasks based on your calendar availability. Installation Use the package manage

Apptension 2 Dec 05, 2021
An open-source tool for visual and modular block programing in python

PyFlow PyFlow is an open-source tool for modular visual programing in python ! Although for now the tool is in Beta and features are coming in bit by

1.1k Jan 06, 2023
Simple addon for snapping active object to mesh ground

Snap to Ground Simple addon for snapping active object to mesh ground How to install: install the Python file as an addon use shortcut "D" in 3D view

Iyad Ahmed 12 Nov 07, 2022
An XLSX spreadsheet renderer for Django REST Framework.

drf-renderer-xlsx provides an XLSX renderer for Django REST Framework. It uses OpenPyXL to create the spreadsheet and returns the data.

The Wharton School 166 Dec 01, 2022
阴阳师后台全平台(使用网易 MuMu 模拟器)辅助。支持御魂,觉醒,御灵,结界突破,秘闻副本,地域鬼王。

阴阳师后台全平台辅助 Python 版本:Python 3.8.3 模拟器:网易 MuMu | 雷电模拟器 模拟器分辨率:1024*576 显卡渲染模式:兼容(OpenGL) 兼容 Windows 系统和 MacOS 系统 思路: 利用 adb 截图后,使用 opencv 找图找色,模拟点击。使用

简讯 27 Jul 09, 2022
Simple Inkscape Scripting

Simple Inkscape Scripting Description In the Inkscape vector-drawing program, how would you go about drawing 100 diamonds, each with a random color an

Scott Pakin 140 Dec 27, 2022
Data science project for exploratory analysis on the kcse grades dataset (Kamilimu Data Science Track)

Kcse-Data-Analysis Data science project for exploratory analysis on the kcse grades dataset (Kamilimu Data Science Track) Findings The performance of

MUGO BRIAN 1 Feb 23, 2022
Tools for calculating and visualizing Elo-like ratings of MLB teams using Retosheet data

Overview This project uses historical baseball games data to calculate an Elo-like rating for MLB teams based on regular season match ups. The Elo rat

Lukas Owens 0 Aug 25, 2021
Interactive plotting for Pandas using Vega-Lite

pdvega: Vega-Lite plotting for Pandas Dataframes pdvega is a library that allows you to quickly create interactive Vega-Lite plots from Pandas datafra

Altair 342 Oct 26, 2022
Ana's Portfolio

Ana's Portfolio ✌️ Welcome to my Portfolio! You will find here different Projects I have worked on (from scratch) 💪 Projects 💻 1️⃣ Hangman game (Mad

Ana Katherine Cortes Sobrino 9 Mar 15, 2022
Plotly Dash Command Line Tools - Easily create and deploy Plotly Dash projects from templates

🛠️ dash-tools - Create and Deploy Plotly Dash Apps from Command Line | | | | | Create a templated multi-page Plotly Dash app with CLI in less than 7

Andrew Hossack 50 Dec 30, 2022
Generate "Jupiter" plots for circular genomes

jupiter Generate "Jupiter" plots for circular genomes Description Python scripts to generate plots from ViennaRNA output. Written in "pidgin" python w

Robert Edgar 2 Nov 29, 2021