PipeChain is a utility library for creating functional pipelines.

Overview

PipeChain

Motivation

PipeChain is a utility library for creating functional pipelines. Let's start with a motivating example. We have a list of Australian phone numbers from our users. We need to clean this data before we insert it into the database. With PipeChain, you can do this whole process in one neat pipeline:

from pipechain import PipeChain, PLACEHOLDER as _

nums = [
    "493225813",
    "0491 570 156",
    "55505488",
    "Barry",
    "02 5550 7491",
    "491570156",
    "",
    "1800 975 707"
]

PipeChain(
    nums
).pipe(
    # Remove spaces
    map, lambda x: x.replace(" ", ""), _
).pipe(
    # Remove non-numeric entries
    filter, lambda x: x.isnumeric(), _
).pipe(
    # Add the mobile code to the start of 8-digit numbers
    map, lambda x: "04" + x if len(x) == 8 else x, _
).pipe(
    # Add the 0 to the start of 9-digit numbers
    map, lambda x: "0" + x if len(x) == 9 else x, _
).pipe(
    # Convert to a set to remove duplicates
    set
).eval()
{'0255507491', '0455505488', '0491570156', '0493225813', '1800975707'}

Without PipeChain, we would have to horrifically nest our code, or else use a lot of temporary variables:

set(
    map(
        lambda x: "0" + x if len(x) == 9 else x,
        map(
            lambda x: "04" + x if len(x) == 8 else x,
            filter(
                lambda x: x.isnumeric(),
                map(
                    lambda x: x.replace(" ", ""),
                    nums
                )
            )
        )
    )
)
{'0255507491', '0455505488', '0491570156', '0493225813', '1800975707'}

Installation

pip install pipechain

Usage

Basic Usage

PipeChain has only two exports: PipeChain, and PLACEHOLDER.

PipeChain is a class that defines a pipeline. You create an instance of the class, and then call .pipe() to add another function onto the pipeline:

from pipechain import PipeChain, PLACEHOLDER
PipeChain(1).pipe(str)
PipeChain(arg=1, pipes=[functools.partial(
   
    )])

   

Finally, you call .eval() to run the pipeline and return the result:

PipeChain(1).pipe(str).eval()
'1'

You can "feed" the pipe at either end, either during construction (PipeChain("foo")), or during evaluation .eval("foo"):

PipeChain().pipe(str).eval(1)
'1'

Each call to .pipe() takes a function, and any additional arguments you provide, both positional and keyword, will be forwarded to the function:

PipeChain(["b", "a", "c"]).pipe(sorted, reverse=True).eval()
['c', 'b', 'a']

Argument Position

By default, the previous value is passed as the first positional argument to the function:

PipeChain(2).pipe(pow, 3).eval()
8

The only magic here is that if you use the PLACEHOLDER variable as an argument to .pipe(), then the pipeline will replace it with the output of the previous pipe at runtime:

PipeChain(2).pipe(pow, 3, PLACEHOLDER).eval()
9

Note that you can rename PLACEHOLDER to something more usable using Python's import statement, e.g.

from pipechain import PLACEHOLDER as _
PipeChain(2).pipe(pow, 3, _).eval()
9

Methods

It might not see like methods will play that well with this pipe convention, but after all, they are just functions. You should be able to access any object's method as a function by accessing it on that object's parent class. In the below example, str is the parent class of "":

"".join(["a", "b", "c"])
'abc'
PipeChain(["a", "b", "c"]).pipe(str.join, "", _).eval()
'abc'

Operators

The same goes for operators, such as +, *, [] etc. We just have to use the operator module in the standard library:

from operator import add, mul, getitem

PipeChain(5).pipe(mul, 3).eval()
15
PipeChain(5).pipe(add, 3).eval()
8
PipeChain(["a", "b", "c"]).pipe(getitem, 1).eval()
'b'

Test Suite

Note, you will need poetry installed.

To run the test suite, use:

git clone https://github.com/multimeric/PipeChain.git
cd PipeChain
poetry install
poetry run pytest test/test.py
Owner
Michael Milton
Michael Milton
Aggregating gridded data (xarray) to polygons

A package to aggregate gridded data in xarray to polygons in geopandas using area-weighting from the relative area overlaps between pixels and polygons. Check out the binder link above for a sample c

Kevin Schwarzwald 42 Nov 09, 2022
Using Python to scrape some basic player information from www.premierleague.com and then use Pandas to analyse said data.

PremiershipPlayerAnalysis Using Python to scrape some basic player information from www.premierleague.com and then use Pandas to analyse said data. No

5 Sep 06, 2021
A meta plugin for processing timelapse data timepoint by timepoint in napari

napari-time-slicer A meta plugin for processing timelapse data timepoint by timepoint. It enables a list of napari plugins to process 2D+t or 3D+t dat

Robert Haase 2 Oct 13, 2022
A real data analysis and modeling project - restaurant inspections

A real data analysis and modeling project - restaurant inspections Jafar Pourbemany 9/27/2021 This project represents data analysis and modeling of re

Jafar Pourbemany 2 Aug 21, 2022
Package for decomposing EMG signals into motor unit firings, as used in Formento et al 2021.

EMGDecomp Package for decomposing EMG signals into motor unit firings, created for Formento et al 2021. Based heavily on Negro et al, 2016. Supports G

13 Nov 01, 2022
Data Intelligence Applications - Online Product Advertising and Pricing with Context Generation

Data Intelligence Applications - Online Product Advertising and Pricing with Context Generation Overview Consider the scenario in which advertisement

Manuel Bressan 2 Nov 18, 2021
Weather analysis with Python, SQLite, SQLAlchemy, and Flask

Surf's Up Weather analysis with Python, SQLite, SQLAlchemy, and Flask Overview The purpose of this analysis was to examine weather trends (precipitati

Art Tucker 1 Sep 05, 2021
Analyzing Earth Observation (EO) data is complex and solutions often require custom tailored algorithms.

eo-grow Earth observation framework for scaled-up processing in Python. Analyzing Earth Observation (EO) data is complex and solutions often require c

Sentinel Hub 18 Dec 23, 2022
A notebook to analyze Amazon Recommendation Review Dataset.

Amazon Recommendation Review Dataset Analyzer A notebook to analyze Amazon Recommendation Review Dataset. Features Calculates distinct user count, dis

isleki 3 Aug 22, 2022
A multi-platform GUI for bit-based analysis, processing, and visualization

A multi-platform GUI for bit-based analysis, processing, and visualization

Mahlet 529 Dec 19, 2022
The Spark Challenge Student Check-In/Out Tracking Script

The Spark Challenge Student Check-In/Out Tracking Script This Python Script uses the Student ID Database to match the entries with the ID Card Swipe a

1 Dec 09, 2021
Statistical package in Python based on Pandas

Pingouin is an open-source statistical package written in Python 3 and based mostly on Pandas and NumPy. Some of its main features are listed below. F

Raphael Vallat 1.2k Dec 31, 2022
A collection of learning outcomes data analysis using Python and SQL, from DQLab.

Data Analyst with PYTHON Data Analyst berperan dalam menghasilkan analisa data serta mempresentasikan insight untuk membantu proses pengambilan keputu

6 Oct 11, 2022
Extract Thailand COVID-19 Cluster data from daily briefing pdf.

Thailand COVID-19 Cluster Data Extraction About Extract Clusters from Thailand Daily COVID-19 briefing PDF Download latest data Here. Data will be upd

Noppakorn Jiravaranun 5 Sep 27, 2021
Probabilistic reasoning and statistical analysis in TensorFlow

TensorFlow Probability TensorFlow Probability is a library for probabilistic reasoning and statistical analysis in TensorFlow. As part of the TensorFl

3.8k Jan 05, 2023
A lightweight interface for reading in output from the Weather Research and Forecasting (WRF) model into xarray Dataset

xwrf A lightweight interface for reading in output from the Weather Research and Forecasting (WRF) model into xarray Dataset. The primary objective of

National Center for Atmospheric Research 43 Nov 29, 2022
Deep universal probabilistic programming with Python and PyTorch

Getting Started | Documentation | Community | Contributing Pyro is a flexible, scalable deep probabilistic programming library built on PyTorch. Notab

7.7k Dec 30, 2022
Making the DAEN information accessible.

The purpose of this repository is to make the information on Australian COVID-19 adverse events accessible. The Therapeutics Goods Administration (TGA) keeps a database of adverse reactions to medica

10 May 10, 2022
Implementation in Python of the reliability measures such as Omega.

OmegaPy Summary Simple implementation in Python of the reliability measures: Omega Total, Omega Hierarchical and Omega Hierarchical Total. Name Link O

Rafael Valero Fernández 2 Apr 27, 2022