Python client for using Prefect Cloud with Saturn Cloud

Overview

prefect-saturn

GitHub Actions PyPI Version

prefect-saturn is a Python package that makes it easy to run Prefect Cloud flows on a Dask cluster with Saturn Cloud. For a detailed tutorial, see "Fault-Tolerant Data Pipelines with Prefect Cloud ".

Installation

prefect-saturn is available on PyPi.

pip install prefect-saturn

prefect-saturn can be installed directly from GitHub

pip install git+https://github.com/saturncloud/[email protected]

Getting Started

prefect-saturn is intended for use inside a Saturn Cloud environment, such as a Jupyter notebook.

import prefect
from prefect import Flow, task
from prefect_saturn import PrefectCloudIntegration


@task
def hello_task():
    logger = prefect.context.get("logger")
    logger.info("hello prefect-saturn")


flow = Flow("sample-flow", tasks=[hello_task])

project_name = "sample-project"
integration = PrefectCloudIntegration(
    prefect_cloud_project_name=project_name
)
flow = integration.register_flow_with_saturn(flow)

flow.register(
    project_name=project_name,
    labels=["saturn-cloud"]
)

Customize Dask

You can customize the size and behavior of the Dask cluster used to run prefect flows. prefect_saturn.PrefectCloudIntegration.register_flow_with_saturn() accepts to arguments to accomplish this:

For example, the code below tells Saturn that this flow should run on a Dask cluster with 3 xlarge workers, and that prefect should shut down the cluster once the flow run has finished.

flow = integration.register_flow_with_saturn(
    flow=flow,
    dask_cluster_kwargs={
        "n_workers": 3,
        "worker_size": "xlarge",
        "autoclose": True
    }
)

flow.register(
    project_name=project_name,
    labels=["saturn-cloud"]
)

Contributing

See CONTRIBUTING.md for documentation on how to test and contribute to prefect-saturn.

Comments
  • [CU-feu7x7] saturn labels

    [CU-feu7x7] saturn labels

    • [x] passes make lint
    • [x] adds tests to tests/ (if appropriate)

    What does this PR change?

    Automatically add saturn-specific labels to the flow environment.

    How does this PR improve prefect-saturn?

    Along with the related change in saturn, this allows flows to be assigned to the correct cluster agent even if there are agents in multiple clusters that are all using the same prefect-cloud tenant.

    opened by bhperry 7
  • Bump development version

    Bump development version

    • [X] passes make lint
    • [X] adds tests to tests/ (if appropriate)

    What does this PR change?

    • Bumps version to 0.5.1.9000 for development

    How does this PR improve prefect-saturn?

    With this change, users can rely on the behavior that installations from source control will have a newer version than the newest release available on PyPI, but guaranteed to be older than the next release on PyPI.

    opened by dotNomad 6
  • [CU-frwyvw] Set flow instance size

    [CU-frwyvw] Set flow instance size

    • [x] passes make lint
    • [x] adds tests to tests/ (if appropriate)

    What does this PR change?

    Adds instance size argument for flow node

    How does this PR improve prefect-saturn?

    Enables users to select a larger node size for their flow to run on. This means we are not forcing users to farm all of the work out to dask-clusters, they can instead run flows with a local executor and not be constrained by a medium tier node.

    opened by bhperry 4
  • pickle is a bad choice for hashing.

    pickle is a bad choice for hashing.

    Without this PR, modifying the python version could lead to different hashes for flow metadata. This PR hashes identifying information for flows using json, rather than pickle. This PR does change the hashing function. As a result re-registering a flow will create a new flow in Saturn Cloud.

    opened by hhuuggoo 3
  • add expectation that BASE_URL does not end in a slash

    add expectation that BASE_URL does not end in a slash

    • [x] passes make lint
    • [x] adds tests to tests/ (if appropriate)

    What does this PR change?

    Changes PrefectCloudIntegration to expect the environment BASE_URL to NOT end in a trailing slash. In all Saturn production environments, BASE_URL does not end in a trailing slash.

    This PR also bumps the version of prefect-saturn to 0.2.0, since this change in behavior is a breaking change.

    How does this PR improve prefect-saturn?

    Without this fix, prefect-saturn is currently broken in Saturn production environments.

    Why can't we just keep the code that adds or removes slashes as needed?

    This project uses prefect's WebhookStorage (https://docs.prefect.io/orchestration/execution/storage_options.html#webhook). That type of storage uses template strings to reference environment variables, like this:

    flow = Flow(
        "some-flow",
        storage=Webhook(
            build_request_kwargs={
                "url": "${BASE_URL}/api/whatever",
            },
            build_request_http_method="POST",
           ....
        )
    )
    

    There is not place in there where we can introduce Saturn-written code that checks the environment variable BASE_URL and deals with missing or extra trailing slashes. So an assumption about whether or not the environment variable BASE_URL ends in a slash has to be hard-coded into the codebase here.

    opened by jameslamb 2
  • Fix tenant_id attr for Prefect 15

    Fix tenant_id attr for Prefect 15

    • [x] passes make lint
    • [ ] adds tests to tests/ (if appropriate)

    I don't believe any tests need to be added.

    What does this PR change?

    This PR changes the Client()._active_tenant_id > Client().tenant_id (when using Prefect >= 0.15) since _active_tenant_id is no longer a property of Prefect's Client in 0.15.

    How does this PR improve prefect-saturn?

    This allows prefect-saturn to work with prefect in version 0.15 which removed the private attribute.

    opened by dotNomad 1
  • [DEV-1227] Replace pyyaml

    [DEV-1227] Replace pyyaml

    Unlike PyYAML, ruamel.yaml supports:

    • YAML <= 1.2. PyYAML only supports YAML <= 1.1 This is vital, as YAML 1.2 intentionally breaks backward compatibility with YAML 1.1 in several edge cases. This would usually be a bad thing. In this case, this renders YAML 1.2 a strict superset of JSON. Since YAML 1.1 is not a strict superset of JSON, this is a good thing.
    • Roundtrip preservation When calling yaml.dump() to dump a dictionary loaded by a prior call to yaml.load():

    See more details at https://yaml.readthedocs.io/en/latest/pyyaml.html

    opened by wreis 1
  • Switch from Docker storage to Webhook storage

    Switch from Docker storage to Webhook storage

    • [x] passes make lint
    • [x] adds tests to tests/ (if appropriate)

    What does this PR change?

    This PR replaces Docker storage with Webhook storage.

    This PR also adds a small working example to the README, to show users how it works. A longer-form tutorial will be up on https://www.saturncloud.io/docs/ some time in the next week.

    How does this PR improve prefect-saturn?

    The use of Webhook storage will make the integration between Prefect Cloud and Saturn Cloud much faster and less error-prone. It eliminates several hacks and special cases, should be much quicker, and removes an awkward race condition. With Docker storage, after calling .register_flow_with_saturn() you had to wait for a k8s job that built and pushed the image to complete. That could take up to 10 minutes, and nothing in prefect-saturn or the Saturn UI allowed you to see logs or other progress of that job.

    That job was also very brittle...it required docker-in-docker trickery, patching user-chosen images with several build-time-only dependencies, and running a sequence of multiple gnarly shell scripts.

    With Webhook storage, all of that complexity is eliminated. When you call .register_flow_with_saturn(), the flow is serialized with cloudpickle, sent to Saturn, and stored as bytes in an object store. At run time, when flow.storage.get_flow() is called, Saturn retrieves the binary content of the flow from an object store and sends it back over the wire. That's it! Everything is synchronous, so no weird race conditions, and it's just passing bytes around, so the storage process goes from 10+ minutes to under a second.

    opened by jameslamb 1
  • Add Saturn project details and remove unnecessary stuff

    Add Saturn project details and remove unnecessary stuff

    • [x] passes make lint
    • [x] adds tests to tests/ (if appropriate)

    What does this PR change?

    • removes fields from saturn_details that are unnecessary as of #8
    • renames saturn_details to storage_details since that now only contains information for building storage
    • bump version floor to Python 3.7 (all the Saturn images are Python 3.7)
    • set ignore_healthchecks=True on the storage object.
      • This avoids a weird error where the image being built with flow.storage.build() doesn't have stuff like the Saturn start script in it, which could cause errors.
      • We don't need to care about that because every job in the flow's lifecycle will have the start script added to it's command / args

    How does this PR improve prefect-saturn?

    This PR removes unnecessary fields that could have become dependencies in users' code, reducing the surface for breaking changes.

    The ignore_healthchecks thing makes it possible for users to rely on the start script to install libraries.

    opened by jameslamb 1
  • change strategy for identifying flows

    change strategy for identifying flows

    • [x] passes make lint
    • [x] adds tests to tests/ (if appropriate)

    What does this PR change?

    This PR changes the strategy for uniquely identifying flows. Now the flow_hash sent to Saturn Cloud is the sha256 hash of:

    • project name
    • flow name
    • Prefect Cloud tenant id

    This means that flow_hash is equivalent to the Prefect Cloud concept flow_group_id. This PR proposes creating this hash ourselves because Prefect Cloud's flow_group_id can't be known until you've registered a flow with Prefect Cloud, and prefect-saturn needs to register with Saturn first.

    How does this PR improve prefect-saturn?

    This PR provides a reliable way to uniquely identify all versions of the same flow. It improves on the previous model, where tenant id was not considered. The previous model could have caused conflicts in the case where two flows with identical names and code, in Prefect Cloud projects with the same name but in different tenants, would get the same hash and conflict.

    Because this PR no longer considers the task graph in the hash, it also means that the hash will not change as a flow's task graph changes. That means pushing Docker storage to a container registry should be a lot faster, since it'll be more likely to hit the registry's cache.

    Notes for Reviewers

    I had to introduce prefect.client.Client in this PR, and then mock it with patch(). A lot of the diff in the test files is just whitespace, the result of adding in a with patch(....). I recommend reviewing with whitespace changes hidden.

    opened by jameslamb 1
  • Add more metadata to flows

    Add more metadata to flows

    This PR includes a few updates that improve execution and testing for the end-to-end experience with Prefect Cloud.

    What does this PR change?

    This PR adds more details from Saturn to the flow, so the agent running it knows how to configure the first job that loads the flow.

    How does this PR improve prefect-saturn?

    This PR allows flows executed from Prefect Cloud to take advantage of Saturn-y customization features like env secrets, filesystem secrets, and a custom start script.

    opened by jameslamb 1
Releases(v0.6.0)
  • v0.6.0(Nov 4, 2021)

    What's Changed

    • No default dask cluster by @jsignell in https://github.com/saturncloud/prefect-saturn/pull/50
    • Add encoding kwarg to open by @jsignell in https://github.com/saturncloud/prefect-saturn/pull/52

    New Contributors

    • @jsignell made their first contribution in https://github.com/saturncloud/prefect-saturn/pull/50

    Full Changelog: https://github.com/saturncloud/prefect-saturn/compare/v0.5.1...v0.6.0

    Source code(tar.gz)
    Source code(zip)
  • v0.5.1(Jul 15, 2021)

  • v0.5.0(Apr 23, 2021)

    Breaking

    None

    Features

    • replace pyyaml with ruamel-yaml (#28)
    • add support for KubernetesRun "RunConfig", if using this library with prefect >= 0.13.10 (#34, #36, #37)
      • this does not break compatibility with prefect >0.13.0,<=0.13.9

    Bug Fixes

    • fix deprecation warnings from prefect 0.14.x (#32)
      • some modules were reorganized from prefect 0.13.x to 0.14.x, and using the 0.13.x-style paths raises deprecation warnings

    Docs

    • support Python 3.8 (#33)
      • this library was already compatible with Python 3.8, but that is now tested on every build and documented in the package classifiers
    • fix keywords in package metadata (#39)
      • this improves the discoverability of this project on PyPi
    Source code(tar.gz)
    Source code(zip)
  • v0.4.4(Dec 9, 2020)

    Breaking

    None

    Features

    None

    Bug Fixes

    • set an explicit default of autoclose = False for dask_cluster_kwargs (#25)
      • this ensures that, by default, flows registered with prefect-saturn leave their Dask cluster up at the end of execution
      • this avoids the risk of one flow run closing down a Dask cluster that is in use by another flow run
      • this was already prefect-saturn's behavior, but only indirectly because autoclose defaults to False in dask-saturn. Not that is directly the default in prefect-saturn.
    • add tests on describe_sizes() (#24)

    Docs

    • added more docs in the README on how to customize the Dask cluster used by DaskExecutor (#25)
    Source code(tar.gz)
    Source code(zip)
  • v0.4.3(Dec 9, 2020)

    Breaking

    None

    Features

    • You can now set the instance size for the node that runs flow.run() (#23)
      • PrefectCloudIntegration.register_flow_with_saturn() gets a new keyword argument `instance_size
      • use new function describe_sizes() to list the valid options

    Bug Fixes

    None

    Docs

    • added documentation on changing the size of the instance that a flow runs on
    Source code(tar.gz)
    Source code(zip)
  • v0.4.2(Nov 19, 2020)

    Breaking

    None

    Features

    None

    Bug Fixes

    • fix broken installations from source distribution (prefect-saturn-*.tar.gz) (#22)

    Docs

    • package LICENSE file with package artifacts (#22)
    Source code(tar.gz)
    Source code(zip)
  • v0.4.1(Nov 6, 2020)

  • v0.4.0(Oct 21, 2020)

  • v0.3.0(Oct 16, 2020)

  • v0.2.0(Sep 18, 2020)

    Breaking

    • prefect-saturn now expects that the environment variable BASE_URL does not end in a slash (#17)

    Features

    None

    Bug Fixes

    None

    Docs

    None

    Source code(tar.gz)
    Source code(zip)
  • v0.1.1(Aug 31, 2020)

  • v0.1.0(Aug 13, 2020)

    Breaking

    • Moved .add_storage() and .add_environment() internal, and made .register_flow_with_saturn() do more. (#14) Now the interface is just like this:

      integration = PrefectCloudIntegration("some-project")
      flow.register_flow_with_saturn()
      flow.register(project_name="some-project", labels=["saturn-cloud"])
      
    • Replaced Docker storage with Webhook storage (#14)

    • Bumped prefect version floor to 0.13.0 (the first release that had Webhook) (#14)

    Features

    None

    Bug Fixes

    None

    Docs

    • Added a minimal working example in README (#14)
    Source code(tar.gz)
    Source code(zip)
  • v0.0.2(Jul 23, 2020)

    • moved some details of building KubernetesJobEnvironment into Saturn's back-end and out of this library
    • removed unnecessary elements in saturn_details
    • renamed saturn_details to storage_details since it now only contains things needed for building storage
    Source code(tar.gz)
    Source code(zip)
  • v0.0.1(Jul 21, 2020)

Owner
Saturn Cloud
End-to-End Data Science in Python Featuring Dask and Jupyter
Saturn Cloud
An API wrapper for Henrik's Unofficial VALORANT API

ValorantAPI.py An API wrapper for Henrik's Unofficial VALORANT API Warning!! This project is still in beta and only contains barely anything yet. If y

Jakkaphat Chalermphanaphan 0 Feb 04, 2022
Discord-disnake - This package allows to use disnake without changing the discord namespace

This package is a shim This module allows to use disnake using discord namespace. This is not an independent library. Installing Python 3.8 or higher

5 Dec 13, 2022
Slack->DynamDB->Some applications

slack-event-subscriptions About The Project Do you want to get simple attendance checks? If you are using Slack, participants can just react on a spec

UpstageAI 26 May 28, 2022
This repository is used to provide data to zzhack,

This repository is used to provide data to zzhack, but you don't have to care about anything, just write your thinking down, and you can see your thinking is rendered in zzhack perfectly

5 Apr 29, 2022
Data Platform com AWS CDK

Welcome to your CDK Python project! This is a blank project for Python development with CDK. The cdk.json file tells the CDK Toolkit how to execute yo

Andre Sionek 8 Jul 02, 2022
Telegram vc - A bot that can play music on telegram group's voice call

Telegram Voice Chat Bot A bot that can play music on telegram group's voice call

1 Jan 02, 2022
Stackoverflow Telegram Bot With Python

Template for Telegram Bot Template to create a telegram bot in python. How to Run Set your telegram bot token as environment variable TELEGRAM_BOT_TOK

PyTopia 10 Mar 07, 2022
🐍 The official Python client library for Google's discovery based APIs.

Google API Client This is the Python client library for Google's discovery based APIs. To get started, please see the docs folder. These client librar

Google APIs 6.2k Jan 08, 2023
πŸ™ Share your Github stats for 2020 on Twitter

Year on Github πŸ™ Share your Github stats for 2020 on Twitter. This project contains a small web app that let's you share stats about your Github acti

Johannes Rieke 129 Dec 25, 2022
Unofficial python api for MicroBT Whatsminer ASICs

whatsminer-api Unofficial python api for MicroBT Whatsminer ASICs Code adapted from a python file found in the Whatsminer Telegram group that is credi

Satoshi Anonymoto 16 Dec 23, 2022
Python3 program to control Elgato Ring Light on your local network without Elgato's Control Center software

Elgato Light Controller I'm really happy with my Elgato Key Light from an illumination perspective. However, their control software has been glitchy f

Jeff Tarr 14 Nov 16, 2022
A discord nuking tool made by python, this also has nuke accounts, inbuilt Selfbot, Massreport, Token Grabber, Nitro Sniper and ALOT more!

Disclaimer: Rage Multi Tool was made for Educational Purposes This project was created only for good purposes and personal use. By using Rage, you agr

†† 50 Jul 19, 2022
Nflmetrics - Johns Hopkins Spring 2022 Sports Analytics research project about NFL Draft Metrics

nflmetrics GitHub repo for Johns Hopkins Spring 2022 Sports Analytics research p

Anish Kulkarni 4 Feb 24, 2022
Python library for interacting with the Wunderlist 2 REST API

Overview Wunderpy2 is a thin Python library for accessing the official Wunderlist 2 API. What does a thin library mean here? Only the bare minimum of

mieubrisse 24 Dec 29, 2020
The implementation of Learning Instance and Task-Aware Dynamic Kernels for Few Shot Learning

INSTA: Learning Instance and Task-Aware Dynamic Kernels for Few Shot Learning This repository provides the implementation and demo of Learning Instanc

11 Jan 02, 2023
This is a simple collection of instructions and scripts to accompany the computerphile video about mininet and openflow.

How to get going. This project should work on Linux or MacOS. I used Ubuntu 20.04 and provide some notes here. Note, this is certainly not intended as

Richard G. Clegg 70 Jan 02, 2023
SpamBot.py allows you, to spam other Chat Partners etc.

SpamBot -SpamBot.py allows you, to spam other Chat Partners etc. Install If you downloaded it yet, you have to install "requirements.txt" write the di

Marco 1 Jan 16, 2022
A telegram mirror bot with an integrated RSS feed reader.

About What is this repo? This is a slightly modified fork which includes some extra features & memes added to my liking. How's this different from the

11 May 15, 2022
Grocy-create-product - A script supports the batch creation of new products in Grocy

grocy-create-product This script supports the batch creation of new products in

AndrΓ© Heuer 6 Jul 28, 2022
just another discord bot

boredbot just another discord bot made to learn python this bots main function is to cache teams meeting links and send them right before the classes

macky 3 Sep 03, 2021