whylogs: A Data and Machine Learning Logging Standard

Overview

whylogs: A Data and Machine Learning Logging Standard

License PyPI version Coverage Status Code style: black CII Best Practices PyPi Downloads CI Maintainability

whylogs is an open source standard for data and ML logging

whylogs logging agent is the easiest way to enable logging, testing, and monitoring in an ML/AI application. The lightweight agent profiles data in real time, collecting thousands of metrics from structured data, unstructured data, and ML model predictions with zero configuration.

whylogs can be installed in any Python, Java or Spark environment; it can be deployed as a container and run as a sidecar; or invoked through various ML tools (see integrations).

whylogs is designed by data scientists, ML engineers and distributed systems engineers to log data in the most cost-effective, scalable and accurate manner. No sampling. No post-processing. No manual configurations.

whylogs is released under the Apache 2.0 open source license. It supports many languages and is easy to extend. This repo contains the whylogs CLI, language SDKs, and individual libraries are in their own repos.

This repository contains both a Python implementation and a Java implementation.

If you have any questions, comments, or just want to hang out with us, please join our Slack channel.

Getting started

Using pip

Install whylogs using the pip package manager by running

pip install whylogs

From source

make install # installs dependencies
make         # builds the wheel

Quickly Logging Data

whylogs is easy to get up and runnings

from whylogs import get_or_create_session
import pandas as pd

session = get_or_create_session()

df = pd.read_csv("path/to/file.csv")

with session.logger(dataset_name="my_dataset") as logger:
    
    #dataframe
    logger.log_dataframe(df)

    #dict
    logger.log({"name": 1})

    #images
    logger.log_image("path/to/image.png")

whylogs collects approximate statistics and sketches of data on a column-basis into a statistical profile. These metrics include:

  • Simple counters: boolean, null values, data types.
  • Summary statistics: sum, min, max, median, variance.
  • Unique value counter or cardinality: tracks an approximate unique value of your feature using HyperLogLog algorithm.
  • Histograms for numerical features. whyLogs binary output can be queried to with dynamic binning based on the shape of your data.
  • Top frequent items (default is 128). Note that this configuration affects the memory footprint, especially for text features.

Multiple Profile Plots

To view your logger profiles you can use, methods within whylogs.viz:

") figure.savefig("/my/image/path.png")">
vizualization = ProfileVisualizer()
vizualization.set_profiles([profile_day_1, profile_day_2])
figure= vizualization.plot_distribution("
    
     "
    )
figure.savefig("/my/image/path.png")

Individual profiles are saved to disk, AWS S3, or WhyLabs API, automatically when loggers are closed, per the configuration found in the Session configuration.

Current profiles from active loggers can be loaded from memory with:

profile = logger.profile()

Profile Viewer

You can also load a local profile viewer, where you upload the json summary file. The default path for the json files is set as output/{dataset_name}/{session_id}/json/dataset_profile.json.

from whylogs.viz import profile_viewer
profile_viewer()

This will open a viewer on your default browser where you can load a profile json summary, using the Select JSON profile button: Once the json is selected you can view your profile's features and associated and statistics.

Documentation

The documentation of this package is generated automatically.

Features

  • Accurate data profiling: whylogs calculates statistics from 100% of the data, never requiring sampling, ensuring an accurate representation of data distributions
  • Lightweight runtime: whylogs utilizes approximate statistical methods to achieve minimal memory footprint that scales with the number of features in the data
  • Any architecture: whylogs scales with your system, from local development mode to live production systems in multi-node clusters, and works well with batch and streaming architectures
  • Configuration-free: whylogs infers the schema of the data, requiring zero manual configuration to get started
  • Tiny storage footprint: whylogs turns data batches and streams into statistical fingerprints, 10-100MB uncompressed
  • Unlimited metrics: whylogs collects all possible statistical metrics about structured or unstructured data

Data Types

Whylogs supports both structured and unstructured data, specifically:

Data type Features Notebook Example
Structured Data Distribution, cardinality, schema, counts, missing values Getting started with structured data
Images exif metadata, derived pixels features, bounding boxes Getting started with images
Video In development Github Issue #214
Tensors derived 1d features (more in developement) Github Issue #216
Text top k values, counts, cardinality String Features
Audio In developement Github Issue #212

Integrations

current integration

Integration Features Resources
Spark Run whylogs in Apache Spark environment
Pandas Log and monitor any pandas dataframe
Kafka Log and monitor Kafka topics with whylogs
MLflow Enhance MLflow metrics with whylogs:
Github actions Unit test data with whylogs and github actions
RAPIDS Use whylogs in RAPIDS environment
Java Run whylogs in Java environment
Docker Run whylogs as in Docker
AWS S3 Store whylogs profiles in S3

Examples

For a full set of our examples, please check out whylogs-examples.

Check out our example notebooks with Binder: Binder

Roadmap

whylogs is maintained by WhyLabs.

Community

If you have any questions, comments, or just want to hang out with us, please join our Slack channel.

Contribute

We welcome contributions to whylogs. Please see our contribution guide and our development guide for details.

Comments
  • Some Example Notebooks are out date  or have unclear UX

    Some Example Notebooks are out date or have unclear UX

    Problem

    Some of the sample note books have gotten outdated and need modification. Some may need to be updated, others are duplicated, and some may not serve a clear purpose.

    This issue is to track through the comments those that should be looked into so the knowledge is not lost or siloed.

    Fixes

    Update through comments with PR or decisions made.

    Related to #508

    bug maintenance stale :zzz: 
    opened by TheMellyBee 24
  • Link to our doc/nbviewer for example notebooks

    Link to our doc/nbviewer for example notebooks

    Description

    The constraint violation report is not being rendered (end of notebook, cells 35+) when viewing the example notebook in Github: Screen Shot 2022-09-16 at 1 11 51 AM

    Compare this with the collab (or local) render: Screen Shot 2022-09-16 at 1 16 10 AM

    I Googled around but the causes of render issues are so legion it's hard to know what might be causing this. The underling cell has quite a bit of javascript - maybe there's some cross site library issue (totally out of my subject area here)?

    This is suboptimal because folks won't be able to see what the constraint report looks like, without going to the collab site or running the notebook themselves.

    If we can't find a fix, maybe at least a note pointing readers to the collab site?

    wontfix 
    opened by jghoman 9
  • int values are being stringified with floating points

    int values are being stringified with floating points

    Description

    import pandas as pd
    
    df = pd.DataFrame({'int-only': [1, 1, 1]})
    
    print(why.log(df).view().to_pandas()["frequent_items/frequent_strings"])
    

    Expected to see only "1", not "1.0000"

    The actual output:

    column
    int-only    [FrequentItem(value='1.000000', est=3, upper=3...
    Name: frequent_items/frequent_strings, dtype: object
    

    Related

    I suspect somehow the Python code is passing the numpy array as float type, but I'm not sure

    bug stale :zzz: 
    opened by andyndang 8
  • [Windows/conda/python3.8] Unable to import whylogs : No matching distribution found for whylabs-datasketches>=2.2.0b1

    [Windows/conda/python3.8] Unable to import whylogs : No matching distribution found for whylabs-datasketches>=2.2.0b1

    Hi, i was successfully able to install whylogs using : pip install whylogs. but when i try to import it in my notebooks, i am facing this error Error: No module named 'google.protobuf.pyext._message'

    OS : windows 10, anaconda env python : 3.8 whylogs version : 0.1.5

    i tried uninstalling protobuf and re installing it again. current version of protobuf = 3.17.3

    Please help me resolve this error

    opened by dalavayi 8
  • updated glossary for v1

    updated glossary for v1

    Description

    updated glossary for v1

    General Checklist

    • [ ] Tests added for this feature/bug if it was a bug, test must cover it.
    • [ ] Conform by the style guides, by using formatter
    • [ ] Documentation updated
    • [ ] (optional) Please add a label to your PR
    opened by FelipeAdachi 7
  • Support custom mergeable metrics in whylogs

    Support custom mergeable metrics in whylogs

    There are three kinds of metrics that whylogs users track:

    1. Tracking derived metrics from customers. Typically this is numerical data. You can use the approach above to track these metrics because they will show up as a “whylogs” column
    2. Custom metrics that are mergeable: basically if you have metrics that can be “summed” or “aggregated” across different profiles, this is a feature request that we are tracking from other customers as well.
    3. One-off metrics: sometimes users have one-off metrics that they want to piggy back on top of whylogs. These metrics are not aggregatable, but they want to use whylogs object to store these metrics.
    feature v1 
    opened by andyndang 7
  • The code repository for `whylogs-sketches` as mentioned on PyPI is NONACCESSIBLE

    The code repository for `whylogs-sketches` as mentioned on PyPI is NONACCESSIBLE

    Description

    whylogs has a dependency: whylogs-sketches.

    When you try accessing the code repository of whylogs-sketches on PyPI (https://pypi.org/project/whylogs-sketching), it points to the following repository, which DOES NOT EXIST or IS NOT ACCESSIBLE.

    • https://github.com/whylabs/whylogs-sketching

    Please update the metadata to point to the correct repository. (Or, is the repository a private one?)

    question integrations stale :zzz: 
    opened by sugatoray 6
  • Break metrics.py <-> schema.py circular dependency

    Break metrics.py <-> schema.py circular dependency

    Description

    General Checklist

    • [ ] Tests added for this feature/bug if it was a bug, test must cover it.
    • [ ] Conform by the style guides, by using formatter
    • [ ] Documentation updated
    • [ ] (optional) Please add a label to your PR
    opened by richard-rogers 6
  • Create requirements.txt

    Create requirements.txt

    The github CI for the docs keeps running into issues due to the lack of a doc/requirements.txt. This hopefully will fix that

    General Checklist

    • [X] Conform by the style guides, by using formatter
    • [X] (optional) Please add a label to your PR
    documentation 
    opened by TheMellyBee 6
  • Unable to import whylogs

    Unable to import whylogs

    Description

    Unable to import whylogs

    >>> import whylogs as why gives AttributeError: module 'site' has no attribute 'getsitepackages'

    Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/monk/why/lib/python3.7/site-packages/whylogs/__init__.py", line 29, in <module> from .api.usage_stats import emit_usage as __emit_usage_stats File "/home/monk/why/lib/python3.7/site-packages/whylogs/api/usage_stats/__init__.py", line 30, in <module> _SITE_PACKAGES = site.getsitepackages() AttributeError: module 'site' has no attribute 'getsitepackages'

    • System info

      • Operating system and version: Ubuntu 18.04
      • Tried on Python3.7, Python3.8, Python3.9
      • virtualenv
    • Steps to reproduce:

      • Create a virtualenv -- virtualenv why --python=python3.7
      • activate the virtualenv -- source why/bin/activate
      • pip install whylogs
      • python
      • import whylogs

    Although I was able to install this on colab. I don't know why it's not working on my local machine.

    opened by mayankjobanputra 5
  • mypy is not catching signature issues in core classes in our CI

    mypy is not catching signature issues in core classes in our CI

    Description

    mypy is enabled in our pre-commit hooks but seems to be missing issues in the signatures of classes such as: DatasetProfile's get_default_path method has no type annotation for the input parameter. Why is this not showing up in pre-commit checks?

    stale :zzz: workflow 
    opened by jamie256 5
  • make lint-fix doesn't fix linting issue

    make lint-fix doesn't fix linting issue

    Description

    • This repo: https://github.com/whylabs/whylogs/tree/2dd782a3e284aa5063c5a4b0738407b7244ec6e2
    • make lint-fix is a no op locally
    • make format-fix doesn'tchange anything

    However the CI is faiilng

    - files were modified by this hook
    
    reformatted python/tests/core/view/test_dataset_profile.py
    
    All done! ✨ 🍰 ✨
    
    
    opened by andyndang 1
  • Multi-column constraint proposal

    Multi-column constraint proposal

    Description

    Refactor to support multi-column constraints

    Changes

    Adds DatasetConstraint and some other refactoring

    A DatasetConstraint implements constraints that aren't attached to a specific column. It has a Callable that takes a DatasetProfileView. It returns a bool and a dictionary mapping column_name/metric_name to the metrics used in evaluating the constraint. The latter is necessary for generating the report.

    It looks like there's a similar case of a MetricConstraint that applies to all metrics of a specified type across all columns. ~I've changed the implementation of this to find the set of metrics the constraint applies to by using the MetricSelector, then attaching the constraint to each specific column. This might not be the best approach... maybe converting it to a DatasetConstraint would be better.~ MetricConstraintWrapper turns such a MetricConstraint into the Callable for an equivalent DatasetConstraint.

    I'm implementing a PrefixCondition that implements the prefix expression syntax used to serialize the Predicate expressions with the addition of a few arithmetic operators. This is probably a reasonable serialization format, but might not be the ultimate user interface for specifying constraints... It also might want a better name :)

    Related

    Relates to organization/repo#number

    Closes organization/repo#number

    opened by richard-rogers 0
  • add a print_debug method to help with reported issues

    add a print_debug method to help with reported issues

    Description

    We should add a print_debug() method that outputs the whylogs version, OS, python version, and which extras are installed to help with debugging issues (especially for identifying already fixed issues being reported in older builds).

    feature 
    opened by jamie256 0
  • Need example notebook of single value metrics

    Need example notebook of single value metrics

    Description

    We have complex custom value examples, but it would help to have a simple single value metric that show WhyLabs integration.

    documentation 
    opened by jamie256 0
Releases(v1.1.20)
  • v1.1.20(Jan 3, 2023)

    whylogs release v1.1.20

    Hi everyone! We’ve now released whylogs 1.1.20 🚀. whylogs is the open standard for data and ML logging created by WhyLabs. 👩🏽‍🔬 This version includes:

    • Data validation docs #1040 [@FelipeAdachi]
    • Bump version to release version #1038 [@jamie256]
    Source code(tar.gz)
    Source code(zip)
  • v1.1.19(Dec 27, 2022)

    whylogs release v1.1.19

    Hi everyone! We’ve now released whylogs 1.1.19 🚀. whylogs is the open standard for data and ML logging created by WhyLabs. 👩🏽‍🔬 This version includes:

    • Support computing segmented performance metrics in top level API #1037 [@jamie256]
    • Update performance metrics example to show log_full_data behavior #1028 [@FelipeAdachi]
    Source code(tar.gz)
    Source code(zip)
  • v1.1.18(Dec 20, 2022)

    whylogs release v1.1.18

    Hi everyone! We’ve now released whylogs 1.1.18 🚀. whylogs is the open standard for data and ML logging created by WhyLabs. 👩🏽‍🔬 This version includes:

    • hellinger calculation edge cases #1023 [@FelipeAdachi]
    • add_resolver to DeclarativeResolver and DeclarativeSchema #1016 [@FelipeAdachi]
    • updating blog link in KS benchmarks/KS_Profiling.ipynb and adding it … #1025 [@FelipeAdachi]
    • Fix read_v0_to_view cardinality metric #1020 [@jamie256]
    • Fix conversion of counters between v0 and v1 format #1026 [@jamie256]
    • Adding warning for vector dictionary on why.log #1022 [@murilommen]
    Source code(tar.gz)
    Source code(zip)
  • v1.1.17(Dec 13, 2022)

    whylogs release v1.1.17

    Hi everyone! We’ve now released whylogs 1.1.17 🚀. whylogs is the open standard for data and ML logging created by WhyLabs. 👩🏽‍🔬 This version includes:

    • performance improvement for low latency row logging #1014 [@jamie256]
    • Fix reversed frequent item upper/lower bounds #1013 [@richard-rogers]
    • [fix] make dataset_id param default on ProfileStore #1012 [@murilommen]
    • Fluent condition predicate API #967 [@richard-rogers]
    • Declarative dataset schema #997 [@richard-rogers]
    Source code(tar.gz)
    Source code(zip)
  • v1.1.16(Dec 6, 2022)

    whylogs release v1.1.16

    Hi everyone! We’ve now released whylogs 1.1.16 🚀. whylogs is the open standard for data and ML logging created by WhyLabs. 👩🏽‍🔬 This version includes:

    • Register non-standard metrics for deserialization #1010 [@richard-rogers]
    • Add whylogs benchmark from docs to README.md #1006 [@jamie256]
    • Add DatasetSchema to Fugue API, add pytest-spark #1008 [@goodwanghan]
    • [feature] Implementing an SQLite Store #958 [@murilommen]
    • Bump version to release version #1001 [@jamie256]
    Source code(tar.gz)
    Source code(zip)
  • v1.1.15(Nov 29, 2022)

    whylogs release v1.1.15

    Hi everyone! We’ve now released whylogs 1.1.15 🚀. whylogs is the open standard for data and ML logging created by WhyLabs. 👩🏽‍🔬 This version includes:

    • Fix deserialize timezone info and merge of _metadata #1000 [@jamie256]
    • fixing broken links in examples/README.md #999 [@FelipeAdachi]
    • Add actions to ConditionCountMetric #989 [@richard-rogers]
    • update ks experiments #995 [@FelipeAdachi]
    • add condition_meets constraints factory helper function for condition count metrics. #991 [@FelipeAdachi]
    • viz - make null or undefined statistics appear as "-" instead of 0 #986 [@FelipeAdachi]
    • replace UCI repository dataset link to local S3 one #993 [@FelipeAdachi]
    • Fix Open in Colab link on Constraints_Suite #992 [@jamie256]
    • add tooltip text to "Observations" in summary drift report and profile summary #988 [@FelipeAdachi]
    Source code(tar.gz)
    Source code(zip)
  • v1.1.14(Nov 22, 2022)

    whylogs release v1.1.14

    Hi everyone! We’ve now released whylogs 1.1.14 🚀. whylogs is the open standard for data and ML logging created by WhyLabs. 👩🏽‍🔬 This version includes:

    • Fix formatted output for warning message when referencing utc_now #984 [@jamie256]
    • default performance metrics log full data false #980 [@jamie256]
    • add feature weights example to repository #982 [@FelipeAdachi]
    • Add error/warning logging for old or future dataset_timestamps #977 [@jamie256]
    • Add summary information to Viz' Constraints Report #976 [@FelipeAdachi]
    • Add setter for dataset_timestamp and model_performance_metrics #978 [@jamie256]
    Source code(tar.gz)
    Source code(zip)
  • v1.1.13(Nov 15, 2022)

    whylogs release v1.1.13

    Hi everyone! We’ve now released whylogs 1.1.13 🚀. whylogs is the open standard for data and ML logging created by WhyLabs. 👩🏽‍🔬 This version includes:

    • pyspark datasetschema #972 [@gramhagen]
    • local segmented profiles serialization fix #971 [@jamie256]
    • Delete datasets symbolic link #970 [@FelipeAdachi]
    • Add timezone info to default dataset_timestamp #969 [@jamie256]
    • Standardize using single quotes around pip install in examples #960 [@jamie256]
    Source code(tar.gz)
    Source code(zip)
  • v1.1.12(Nov 9, 2022)

    whylogs release v1.1.12

    Hi everyone! We’ve now released whylogs 1.1.12 🚀. whylogs is the open standard for data and ML logging created by WhyLabs. 👩🏽‍🔬 This version includes:

    • Fix whylabs upload bug #963 [@jamie256]
    Source code(tar.gz)
    Source code(zip)
  • v1.1.11(Nov 8, 2022)

    whylogs release v1.1.11

    Hi everyone! We’ve now released whylogs 1.1.11 🚀. whylogs is the open standard for data and ML logging created by WhyLabs. 👩🏽‍🔬 This version includes:

    • Allow passing schema to log method #957 [@jamie256]
    • add benchmark for KS test. #951 [@FelipeAdachi]
    • Add separate NaN and inf counts, added tests #955 [@jamie256]
    • Don't reopen tempfile when writing profiles #954 [@jamie256]
    • enable quantiles as a parameter for KS drift calculation #945 [@FelipeAdachi]
    • [fix] Docs: Replacing advanced with basic examples #944 [@murilommen]
    Source code(tar.gz)
    Source code(zip)
  • v1.1.10(Nov 1, 2022)

    whylogs release v1.1.10

    Hi everyone! We’ve now released whylogs 1.1.10 🚀. whylogs is the open standard for data and ML logging created by WhyLabs. 👩🏽‍🔬 This version includes:

    • Java Loggers #939 [@TheMellyBee]

    ✨ Features

    • implements hellinger distance as a drift calculation #940 [@FelipeAdachi]
    • Add zero() to result_set and DatasetProfileView and test merge #942 [@jamie256]

    🐛 Bug Fixes

    • Avoid KeyError indexing into pandas DataFrame #928 [@jamie256]
    • [Fix] Fixing broken links on readme #941 [@murilommen]

    📚 Documentation

    • Adjusting overall README examples and integrations #934 [@murilommen]
    Source code(tar.gz)
    Source code(zip)
  • v1.1.9(Oct 26, 2022)

    whylogs release v1.1.9

    Hi everyone! We’ve now released whylogs 1.1.9 🚀. whylogs is the open standard for data and ML logging created by WhyLabs. 👩🏽‍🔬 This version includes:

    • Add the Result Set to Java #908 [@TheMellyBee]
    • Add segmented profile serialization to rolling logger #935 [@jamie256]
    • remove TransientLogger::flush() and ::close() #931 [@richard-rogers]
    • Fix stubs to handle no pandas scenario #930 [@jamie256]
    • Update to latest stable release of whylabs-client and poetry update #924 [@jamie256]
    • Changing quantiles for KS drift detection from 9 to 100 buckets. #932 [@FelipeAdachi]

    🐛 Bug Fixes

    • Renames add_metrics to add_metric as it only adds a singular metric #917 [@TheMellyBee]
    Source code(tar.gz)
    Source code(zip)
  • v1.1.8(Oct 18, 2022)

    whylogs release v1.1.8

    Hi everyone! We’ve now released whylogs 1.1.8 🚀. whylogs is the open standard for data and ML logging created by WhyLabs. 👩🏽‍🔬 This version includes:

    • Adding GCS writer #913 [@murilommen]
    • Handle numpy.bool_ when resolving metrics for bool types #914 [@jamie256]
    • Allow test to run on automated version bump PR #906 [@jamie256]
    • Implements Profiles & Views in Java #867 [@TheMellyBee]
    • Bump version to release version #905 [@github-actions]
    Source code(tar.gz)
    Source code(zip)
  • v1.1.7(Oct 11, 2022)

    whylogs release v1.1.7

    Hi everyone! We’ve now released whylogs 1.1.7 🚀. whylogs is the open standard for data and ML logging created by WhyLabs. 👩🏽‍🔬 This version includes:

    • Count booleans in cardinality metric, and add tests #904 [@jamie256]
    • Make SummaryConfig optional #903 [@richard-rogers]
    • Wrong image for visualizer #899 [@dleybz]
    • Generalize CompoundMetric for better whylabs support #894 [@richard-rogers]
    • Adding Profile Store functionality #852 [@murilommen]
    • More links to WhyLabs #898 [@dleybz]
    • Bump version to release version #897 [@github-actions]
    Source code(tar.gz)
    Source code(zip)
  • v1.1.6(Oct 4, 2022)

    whylogs release v1.1.6

    Hi everyone! We’ve now released whylogs 1.1.6 🚀. whylogs is the open standard for data and ML logging created by WhyLabs. 👩🏽‍🔬 This version includes:

    • Enable Fugue Implementation to take arbitrary column names #896 [@goodwanghan]
    • Update image logging example notebook #886 [@richard-rogers]
    • Bump version to release version #895 [@github-actions]
    Source code(tar.gz)
    Source code(zip)
  • v1.1.5(Sep 30, 2022)

    whylogs release v1.1.5

    Hi everyone! We’ve now released whylogs 1.1.5 🚀. whylogs is the open standard for data and ML logging created by WhyLabs. 👩🏽‍🔬 This version includes:

    • Fixed standard aggregators/serializers/deserializers #893 [@andyndang]
    • Fix stubs to allow isinstance checks #891 [@jamie256]
    Source code(tar.gz)
    Source code(zip)
  • v1.1.4(Sep 27, 2022)

    whylogs release v1.1.4

    Hi everyone! We’ve now released whylogs 1.1.4 🚀. whylogs is the open standard for data and ML logging created by WhyLabs. 👩🏽‍🔬 This version includes:

    • Comment out .doc lfs pattern in .gitattributes #884 [@jamie256]
    • add dependencies to viz extra #883 [@FelipeAdachi]
    • Update version to 1.1.3 #882 [@jamie256]
    Source code(tar.gz)
    Source code(zip)
  • v1.1.3(Sep 27, 2022)

    whylogs release v1.1.3

    Hi everyone! We’ve now released whylogs 1.1.3 🚀. whylogs is the open standard for data and ML logging created by WhyLabs. 👩🏽‍🔬 This version includes:

    • removing image compounded metric in profile visualization #880 [@FelipeAdachi]
    • Fix ImageMetric::merge() bug #878 [@richard-rogers]
    • Notebook visualizer support for ImageMetric #877 [@richard-rogers]
    • Update ecommerce.rst #871 [@FelipeAdachi]
    • Better whylabs support for logged images #870 [@richard-rogers]
    • Fix empty view to_pandas #868 [@jamie256]
    • MetricConfig was supposed to be an optional argument to Metric::zero() #863 [@richard-rogers]
    Source code(tar.gz)
    Source code(zip)
  • v1.1.2(Sep 22, 2022)

    whylogs release v1.1.2

    Hi everyone! We’ve now released whylogs 1.1.2 🚀. whylogs is the open standard for data and ML logging created by WhyLabs. 👩🏽‍🔬 This version includes:

    • Add 'requests' package as dependency of whylogs[whylabs] #855 [@TheMellyBee]
    • Update required protobuf version to 3.19.4 or higher #862 [@jamie256]
    • Add open in colab button and fix links #861 [@jamie256]
    • Speed up confusion matrix calculation and add test #859 [@jamie256]
    • Example updates - adding WhyLabs blurb to every examples #856 [@FelipeAdachi]
    • Move .dockerignore to root #853 [@jamie256]

    ✨ Features

    • Update whylabs-client version for feature importance #854 [@jamie256]
    • feature importance writers API #857 [@FelipeAdachi]
    Source code(tar.gz)
    Source code(zip)
  • v1.1.1(Sep 20, 2022)

    whylogs release v1.1.1

    Hi everyone! We’ve now released whylogs 1.1.1 🚀. whylogs is the open standard for data and ML logging created by WhyLabs. 👩🏽‍🔬 This version includes:

    • Update fugue integration #850 [@goodwanghan]
    • Fugue Integration #839 [@goodwanghan]
    • poetry update #847 [@jamie256]
    • Update example links to point to nbviewer #834 [@jghoman]
    • Uncomment pip install so notebook examples mostly just work #837 [@jamie256]
    • Update getting started with WhyLabs notebook to v1 #832 [@jamie256]
    • 'Check out our docs' link in Inspecting Profiles notebook is 404ing #829 [@jghoman]
    • Add support for serder into bytes and pickle support for DatasetProfileView #827 [@andyndang]
    • StandardResolver and Supporting classes #779 [@TheMellyBee]
    • Added image logging example notebook #815 [@richard-rogers]
    • Initial Sagemaker example readme and link to v0 #826 [@jamie256]
    • Update README.md #824 [@FelipeAdachi]
    • Update weather dataset test dates to UTC time #777 [@bernease]
    • regression metrics example #818 [@FelipeAdachi]
    • classification metrics example #819 [@FelipeAdachi]
    • Writer Examples - fixing environment variables and deprecated parameters #810 [@FelipeAdachi]

    🐛 Bug Fixes

    • Fix bug in compound metric to_summary_dict #841 [@richard-rogers]
    • Fix multi-column segment key translation #848 [@jamie256]
    • typing-extensions dependency entry update #845 [@jamie256]
    • Have Writer's Return a Response #823 [@TheMellyBee]
    Source code(tar.gz)
    Source code(zip)
  • v1.1.0(Sep 14, 2022)

    whylogs release v1.1.0

    Hi everyone! We’ve now released whylogs 1.1.0 🚀. whylogs is the open standard for data and ML logging created by WhyLabs. 👩🏽‍🔬 This version includes:

    • Initialize RegressionMetricsMessage prediction_field to 0 #814 [@jamie256]
    • Handle missing PIL #812 [@richard-rogers]
    • Added top level performance metrics methods #811 [@jamie256]
    • segments example #806 [@FelipeAdachi]
    • Model Performance Metrics in v1.1 #796 [@jamie256]
    • Datasets Module - Add Ecommerce Dataset #797 [@FelipeAdachi]
    • Tweak the information collection sentence so it's clear immediately this can be turned off. #801 [@jghoman]
    • Fix whylabs writer uncompound on segmented profiles #803 [@jamie256]
    • Fixed handling int with frequent item bug #794 [@andyndang]
    • Image metric logging v1 #776 [@richard-rogers]
    • Initial segments support for v1.1 #792 [@jamie256]
    • Hardcode en/latest into Announcement URL. Otherwise is coming up 404. #795 [@jghoman]
    • Add .dockerignore to .gitignore #790 [@jamie256]
    Source code(tar.gz)
    Source code(zip)
  • v1.0.14(Sep 6, 2022)

    whylogs release v1.0.14

    Hi everyone! We’ve now released whylogs 1.0.14 🚀. whylogs is the open standard for data and ML logging created by WhyLabs. 👩🏽‍🔬 This version includes:

    • Update version of create-pull-request #789 [@jamie256]
    • Update whylogs-sketching version and update poetry.lock #788 [@jamie256]
    • Condition Validators #761 [@FelipeAdachi]
    • Add Serialization and Deserialization Registry #762 [@TheMellyBee]
    Source code(tar.gz)
    Source code(zip)
  • v1.0.13(Aug 31, 2022)

    whylogs release v1.0.13

    Hi everyone! We’ve now released whylogs 1.0.13 🚀. whylogs is the open standard for data and ML logging created by WhyLabs. 👩🏽‍🔬 This version includes:

    • move link check into separate action #783 [@jamie256]
    • including Dask example #780 [@murilommen]
    • Aggregator for Metric Components #757 [@TheMellyBee]
    • bumpversion and allow dirty fix #782 [@FelipeAdachi]
    • changing WhyLabs Writer example's section title #778 [@FelipeAdachi]
    • Enable overriding SSL cert store #773 [@andyndang]
    • Update Documentation - Getting Started #768 [@FelipeAdachi]
    • DatasetSchema no longer a @dataclass #750 [@richard-rogers]
    Source code(tar.gz)
    Source code(zip)
  • v1.0.12(Aug 23, 2022)

    whylogs release v1.0.12

    Hi everyone! We’ve now released whylogs 1.0.12 🚀. whylogs is the open standard for data and ML logging created by WhyLabs. 👩🏽‍🔬 This version includes:

    • Remove the java release's python integration now that we have pyspark #760 [@jamie256]
    • WhyLabsWriter's log_reference_profile #753 [@FelipeAdachi]
    • Adding Java to the CI #758 [@TheMellyBee]
    • Using our dedicated stats endpoint for telemetry collection #759 [@andyndang]

    🐛 Bug Fixes

    • Add Twine to dev dependency #766 [@TheMellyBee]
    • Broken Link Detection Action #749 [@TheMellyBee]

    📚 Documentation

    • Broken Link Detection Action #749 [@TheMellyBee]
    Source code(tar.gz)
    Source code(zip)
  • v1.0.11(Aug 16, 2022)

    whylogs release v1.0.11

    Hi everyone! We’ve now released whylogs 1.0.11 🚀. whylogs is the open standard for data and ML logging created by WhyLabs. 👩🏽‍🔬 This version includes:

    • Java IntegralMetric and Components #719 [@TheMellyBee]
    • Update push-release to not block on autobump version #752 [@jamie256]
    • Update the telemetry account ID #751 [@andyndang]
    • Condition Count Metric Example #748 [@FelipeAdachi]
    • Datasets Module #724 [@FelipeAdachi]
    • Enabling whylogs Reader #745 [@murilommen]
    • Fix Schema Configuration example #744 [@FelipeAdachi]
    • Viz module UI/UX fixes #742 [@FelipeAdachi]
    • Fix Constraints Report's search feature #737 [@FelipeAdachi]
    • Update README.md #739 [@jamie256]
    • adding BigQuery example #734 [@murilommen]

    📚 Documentation

    • Updates the ReadMe for Java #741 [@TheMellyBee]
    Source code(tar.gz)
    Source code(zip)
  • v1.0.10(Aug 9, 2022)

    whylogs release v1.0.10

    Hi everyone! We’ve now released whylogs 1.0.10 🚀. whylogs is the open standard for data and ML logging created by WhyLabs. 👩🏽‍🔬 This version includes:

    • just adding Custom Metrics in the documentation #735 [@FelipeAdachi]
    • Add some missing properties on metrics to cover things in summaries #731 [@jamie256]
    • make plots not sticky when scrolling horizontally #730 [@FelipeAdachi]
    • Condition count metric / value constraints #618 [@richard-rogers]

    ✨ Features

    • Add callback for rolling logger profile writes #725 [@jamie256]
    Source code(tar.gz)
    Source code(zip)
    whylogs-1.0.10-py3-none-any.whl(1.66 MB)
    whylogs-1.0.10.tar.gz(1.63 MB)
  • v1.0.9(Aug 2, 2022)

    whylogs release v1.0.9

    Hi everyone! We’ve now released whylogs 1.0.9 🚀. whylogs is the open standard for data and ML logging created by WhyLabs. 👩🏽‍🔬 This version includes:

    • add bump2version to dev dependencies and sync poetry.lock file #721 [@jamie256]
    • Update .bumpversion.cfg #720 [@TheMellyBee]
    • Fix DatasetProfile not resolving with a custom schema #718 [@richard-rogers]
    • Making constraint helpers publicly available #715 [@murilommen]
    • Examples of custom Metrics #586 [@richard-rogers]
    Source code(tar.gz)
    Source code(zip)
  • v1.0.8(Jul 26, 2022)

    whylogs release v1.0.8

    Hi everyone! We’ve now released whylogs 1.0.8 🚀. whylogs is the open standard for data and ML logging created by WhyLabs. 👩🏽‍🔬 This version includes:

    • Add testing python for 3.10 and restrict compatibility to less than 3.11 #710 [@jamie256]
    • Copy minimal structure for whylogs java #680 [@andyndang]
    • Fix using None as type when pandas is not present #706 [@jamie256]
    • Allow python 3.10 in pyproject.toml #704 [@jamie256]
    • Stabilizing IPython init #707 [@murilommen]
    • Add python working directory for python commands in release workflow #701 [@jamie256]

    ✨ Features

    • single-profile visualization (Profile Summary) #705 [@FelipeAdachi]
    • Orders all views columns before sending to pandas #702 [@TheMellyBee]

    📚 Documentation

    • Fix small typo. #711 [@jghoman]
    • Flask Integration of whylogs and WhyLabs Notebook #685 [@TheMellyBee]
    Source code(tar.gz)
    Source code(zip)
    whylogs-1.0.8-py3-none-any.whl(1.65 MB)
    whylogs-1.0.8.tar.gz(1.62 MB)
  • v1.0.7(Jul 19, 2022)

    whylogs release v1.0.7

    Hi everyone! We’ve now released whylogs 1.0.7 🚀. whylogs is the open standard for data and ML logging created by WhyLabs. 👩🏽‍🔬 This version includes:

    • Decoupling Notebook Profile Vizualizer #661 [@murilommen]
    • add sort by drift p-values #699 [@FelipeAdachi]
    • Update dependency on whylogs-sketching that has kll merge fix #693 [@jamie256]
    • Change KLL accuracy according to numerical/submetrics type #690 [@FelipeAdachi]
    • Comment pip install in examples notebooks #687 [@FelipeAdachi]
    • Adding Push-Release #689 [@TheMellyBee]
    • Adds the BumpVersion for our CICD workflow #691 [@TheMellyBee]

    📚 Documentation

    • Updates links for readme, merges it in, and removes an unavailable co… #698 [@TheMellyBee]
    • fix contribution.md hyperlink on readme #696 [@murilommen]
    Source code(tar.gz)
    Source code(zip)
    whylogs-1.0.7-py3-none-any.whl(1.64 MB)
    whylogs-1.0.7.tar.gz(1.61 MB)
  • v1.0.6(Jul 14, 2022)

    whylogs release v1.0.6

    Hi everyone! We’ve now released whylogs 1.0.6 🚀. whylogs is the open standard for data and ML logging created by WhyLabs. 👩🏽‍🔬 This version includes:

    • Fix README Docs hyperlink #683 [@murilommen]
    • Add WhyLabs Writer example notebook #672 [@FelipeAdachi]
    • Optionally skip lower-casing the strings #678 [@richard-rogers]
    • Make Profile/Profile Views retain the same order of columns as original DataFrame #675 [@FelipeAdachi]
    • Constraints - helper functions #652 [@FelipeAdachi]
    • make Writing_Profiles & Mlflow_Logging testable #670 [@FelipeAdachi]
    • [Fix] Making installs on examples "shell safe" #608 [@murilommen]
    • Adds in Release Drafter to v1 #688 [@TheMellyBee]
    Source code(tar.gz)
    Source code(zip)
Owner
WhyLabs
Observability for AI pipelines and applications. Instrument data pipelines, analyze data quality and drift, catch deviations before they cause model failures.
WhyLabs
Model search (MS) is a framework that implements AutoML algorithms for model architecture search at scale.

Model Search Model search (MS) is a framework that implements AutoML algorithms for model architecture search at scale. It aims to help researchers sp

AriesTriputranto 1 Dec 13, 2021
Azure MLOps (v2) solution accelerators.

Azure MLOps (v2) solution accelerator Welcome to the MLOps (v2) solution accelerator repository! This project is intended to serve as the starting poi

Microsoft Azure 233 Jan 01, 2023
Apple-voice-recognition - Machine Learning

Apple-voice-recognition Machine Learning How does Siri work? Siri is based on large-scale Machine Learning systems that employ many aspects of data sc

Harshith VH 1 Oct 22, 2021
This repository demonstrates the usage of hover to understand and supervise a machine learning task.

Hover Example Apps (works out-of-the-box on Binder) This repository demonstrates the usage of hover to understand and supervise a machine learning tas

Pavel 43 Dec 03, 2021
A Python implementation of GRAIL, a generic framework to learn compact time series representations.

GRAIL A Python implementation of GRAIL, a generic framework to learn compact time series representations. Requirements Python 3.6+ numpy scipy tslearn

3 Nov 24, 2021
Automatic extraction of relevant features from time series:

tsfresh This repository contains the TSFRESH python package. The abbreviation stands for "Time Series Feature extraction based on scalable hypothesis

Blue Yonder GmbH 7k Jan 06, 2023
Causal Inference and Machine Learning in Practice with EconML and CausalML: Industrial Use Cases at Microsoft, TripAdvisor, Uber

Causal Inference and Machine Learning in Practice with EconML and CausalML: Industrial Use Cases at Microsoft, TripAdvisor, Uber

EconML/CausalML KDD 2021 Tutorial 124 Dec 28, 2022
Programming assignments and quizzes from all courses within the Machine Learning Engineering for Production (MLOps) specialization offered by deeplearning.ai

Machine Learning Engineering for Production (MLOps) Specialization on Coursera (offered by deeplearning.ai) Programming assignments from all courses i

Aman Chadha 173 Jan 05, 2023
Python bindings for MPI

MPI for Python Overview Welcome to MPI for Python. This package provides Python bindings for the Message Passing Interface (MPI) standard. It is imple

MPI for Python 604 Dec 29, 2022
A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.

Machine Learning Notebooks, 3rd edition This project aims at teaching you the fundamentals of Machine Learning in python. It contains the example code

Aurélien Geron 1.6k Jan 05, 2023
SmartSim makes it easier to use common Machine Learning (ML) libraries like PyTorch and TensorFlow

SmartSim makes it easier to use common Machine Learning (ML) libraries like PyTorch and TensorFlow, in High Performance Computing (HPC) simulations and workloads.

Python module for machine learning time series:

seglearn Seglearn is a python package for machine learning time series or sequences. It provides an integrated pipeline for segmentation, feature extr

David Burns 536 Dec 29, 2022
A pure-python implementation of the UpSet suite of visualisation methods by Lex, Gehlenborg et al.

pyUpSet A pure-python implementation of the UpSet suite of visualisation methods by Lex, Gehlenborg et al. Contents Purpose How to install How it work

288 Jan 04, 2023
Machine Learning toolbox for Humans

Reproducible Experiment Platform (REP) REP is ipython-based environment for conducting data-driven research in a consistent and reproducible way. Main

Yandex 663 Dec 31, 2022
Estudos e projetos feitos com PySpark.

PySpark (Spark com Python) PySpark é uma biblioteca Spark escrita em Python, e seu objetivo é permitir a análise interativa dos dados em um ambiente d

Karinne Cristina 54 Nov 06, 2022
Kalman filter library

The kalman filter framework described here is an incredibly powerful tool for any optimization problem, but particularly for visual odometry, sensor fusion localization or SLAM.

comma.ai 276 Jan 01, 2023
Pragmatic AI Labs 421 Dec 31, 2022
Xeasy-ml is a packaged machine learning framework.

xeasy-ml 1. What is xeasy-ml Xeasy-ml is a packaged machine learning framework. It allows a beginner to quickly build a machine learning model and use

9 Mar 14, 2022
Class-imbalanced / Long-tailed ensemble learning in Python. Modular, flexible, and extensible

IMBENS: Class-imbalanced Ensemble Learning in Python Language: English | Chinese/中文 Links: Documentation | Gallery | PyPI | Changelog | Source | Downl

Zhining Liu 176 Jan 04, 2023
虚拟货币(BTC、ETH)炒币量化系统项目。在一版本的基础上加入了趋势判断

🎉 第二版本 🎉 (现货趋势网格) 介绍 在第一版本的基础上 趋势判断,不在固定点位开单,选择更优的开仓点位 优势: 🎉 简单易上手 安全(不用将api_secret告诉他人) 如何启动 修改app目录下的authorization文件

幸福村的码农 250 Jan 07, 2023