API>local_db>AWS_RDS - Disclaimer! All data used is for educational purposes only.

Overview

API>local_db>AWS_RDS

Disclaimer! All data used is for educational purposes only.

alt text
ETL pipeline diagram.

Aim of project

By creating a fully working pipeline:

  • Familiarise with ETL
  • Improve python (API,pandas), SQL (triggers&procedures) knowledge
  • Work with a cloud storage service

What does it do?

The data used is electricity prices and weather conditions. The pipeline is fully autonomous: scheduled daily (crontab), electricity price data (.xls) is dowloaded, weather data fetched via an API, and inserted into a local database (Postgres). It is then cleaned and transferred (PL/pgSQL) into 3NF-tables (see ERDs below). Lastly, the clean useful data is migrated to Amazon Web Services' RDS remote database via the foreign-fata wrapper in PL/pgSQL.

alt text
Price data ERD.

alt text
Weather data ERD

Further improvements/learnings

  • Switch from time to event-based triggers
  • Upload data in batches, not 'for each row'
  • Prevent SQL injection
A stock analysis app with streamlit

StockAnalysisApp A stock analysis app with streamlit. You select the ticker of the stock and the app makes a series of analysis by using the price cha

Antonio Catalano 50 Nov 27, 2022
2019 Data Science Bowl

Kaggle-2019-Data-Science-Bowl-Solution - Here i present my solution to kaggle 2019 data science bowl and how i improved it to win a silver medal in that competition.

Deepak Nandwani 1 Jan 01, 2022
Validation and inference over LinkML instance data using souffle

Translates LinkML schemas into Datalog programs and executes them using Souffle, enabling advanced validation and inference over instance data

Linked data Modeling Language 7 Aug 07, 2022
A distributed block-based data storage and compute engine

Nebula is an extremely-fast end-to-end interactive big data analytics solution. Nebula is designed as a high-performance columnar data storage and tabular OLAP engine.

Columns AI 131 Dec 26, 2022
The Dash Enterprise App Gallery "Oil & Gas Wells" example

This app is based on the Dash Enterprise App Gallery "Oil & Gas Wells" example. For more information and more apps see: Dash App Gallery See the Dash

Austin Caudill 1 Nov 08, 2021
Minimal working example of data acquisition with nidaqmx python API

Data Aquisition using NI-DAQmx python API Based on this project It is a minimal working example for data acquisition using the NI-DAQmx python API. It

Pablo 1 Nov 05, 2021
DefAP is a program developed to facilitate the exploration of a material's defect chemistry

DefAP is a program developed to facilitate the exploration of a material's defect chemistry. A large number of features are provided and rapid exploration is supported through the use of autoplotting

6 Oct 25, 2022
For making Tagtog annotation into csv dataset

tagtog_relation_extraction for making Tagtog annotation into csv dataset How to Use On Tagtog 1. Go to Project Downloads 2. Download all documents,

hyeong 4 Dec 28, 2021
talkbox is a scikit for signal/speech processing, to extend scipy capabilities in that domain.

talkbox is a scikit for signal/speech processing, to extend scipy capabilities in that domain.

David Cournapeau 76 Nov 30, 2022
Udacity-api-reporting-pipeline - Udacity api reporting pipeline

udacity-api-reporting-pipeline In this exercise, you'll use portions of each of

Fabio Barbazza 1 Feb 15, 2022
A utility for functional piping in Python that allows you to access any function in any scope as a partial.

WithPartial Introduction WithPartial is a simple utility for functional piping in Python. The package exposes a context manager (used with with) calle

Michael Milton 1 Oct 26, 2021
Statistical Rethinking: A Bayesian Course Using CmdStanPy and Plotnine

Statistical Rethinking: A Bayesian Course Using CmdStanPy and Plotnine Intro This repo contains the python/stan version of the Statistical Rethinking

Andrés Suárez 3 Nov 08, 2022
Datashader is a data rasterization pipeline for automating the process of creating meaningful representations of large amounts of data.

Datashader is a data rasterization pipeline for automating the process of creating meaningful representations of large amounts of data.

HoloViz 2.9k Jan 06, 2023
This tool parses log data and allows to define analysis pipelines for anomaly detection.

logdata-anomaly-miner This tool parses log data and allows to define analysis pipelines for anomaly detection. It was designed to run the analysis wit

AECID 32 Nov 27, 2022
PyTorch implementation for NCL (Neighborhood-enrighed Contrastive Learning)

NCL (Neighborhood-enrighed Contrastive Learning) This is the official PyTorch implementation for the paper: Zihan Lin*, Changxin Tian*, Yupeng Hou* Wa

RUCAIBox 73 Jan 03, 2023
Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Theano

PyMC3 is a Python package for Bayesian statistical modeling and Probabilistic Machine Learning focusing on advanced Markov chain Monte Carlo (MCMC) an

PyMC 7.2k Dec 30, 2022
Python data processing, analysis, visualization, and data operations

Python This is a Python data processing, analysis, visualization and data operations of the source code warehouse, book ISBN: 9787115527592 Descriptio

FangWei 1 Jan 16, 2022
Python package to transfer data in a fast, reliable, and packetized form.

pySerialTransfer Python package to transfer data in a fast, reliable, and packetized form.

PB2 101 Dec 07, 2022
Transform-Invariant Non-Negative Matrix Factorization

Transform-Invariant Non-Negative Matrix Factorization A comprehensive Python package for Non-Negative Matrix Factorization (NMF) with a focus on learn

EMD Group 6 Jul 01, 2022
Techdegree Data Analysis Project 2

Basketball Team Stats Tool In this project you will be writing a program that reads from the "constants" data (PLAYERS and TEAMS) in constants.py. Thi

2 Oct 23, 2021