Explorative Data Analysis Guidelines

Overview

Explorative Data Analysis

Get data into a usable format!
Find out if the following predictive modeling phase will be successful!

  • Combine everything into a single big table
    • Convert files to .csv
    • Merge files
    • Fix encoding issues
    • Clean column names (english, no whitespace, no special chars)
    • Are there duplicate columns?
    • Fix datatypes (datetime, int, float, string)
  • Look at the raw data
    • Sort data
    • Filter data by various criteria
  • Investigation
    • Non-sensical observations/artifacts?
    • Coding of categorical features?
    • Missing values?
    • Outliers?
    • Constant values (=Zero Importance)?
    • Low importance features?
    • Collinear, correlated or otherwise dependent features?
    • Highly skewed features?
    • Irrelevant features?
  • Univariate Analysis
    • Look at mean, median, min, max, std, iqr, quantiles (1%, 5%, 25%, 50%, 75%, 95%, 99%)
    • Draw boxplots, histograms
  • Multivariate Analysis
    • Draw scatter plots
    • Create correlation matrix
  • Time Series? -> Plot variables over time
  • Fixing issues
    • Impute missing values (mode, median, mean)
    • Remove variables that have too many missings
    • Remove observations that have too many missings
    • Select appropriate time slice
  • Preparation
    • Clip values that are too small/too large
    • Scale to [0,1] or normalize (mean=0, std=1) or Robust / Quantile Scaling
    • One-hot encoding, Label Encoding (0,1,2,3)
    • Create log-transformed versions for highly skewed variables
    • Create binned versions for variables
    • Combine categories for highly skewed categorical variables
    • Create sum/difference/product/quotient of variables
    • Create polynomial features
Owner
Florian Rohrer
Florian Rohrer
VSCode extension that generates docstrings for python files

VSCode Python Docstring Generator Visual Studio Code extension to quickly generate docstrings for python functions. Features Quickly generate a docstr

Nils Werner 506 Jan 03, 2023
🌱 Complete API wrapper of Seedr.cc

Python API Wrapper of Seedr.cc Table of Contents Installation How I got the API endpoints? Start Guide Getting Token Logging with Username and Passwor

Hemanta Pokharel 43 Dec 26, 2022
Build documentation in multiple repos into one site.

mkdocs-multirepo-plugin Build documentation in multiple repos into one site. Setup Install plugin using pip: pip install git+https://github.com/jdoiro

Joseph Doiron 47 Dec 28, 2022
The purpose of this project is to share knowledge on how awesome Streamlit is and can be

Awesome Streamlit The fastest way to build Awesome Tools and Apps! Powered by Python! The purpose of this project is to share knowledge on how Awesome

Marc Skov Madsen 1.5k Jan 07, 2023
Sphinx Bootstrap Theme

Sphinx Bootstrap Theme This Sphinx theme integrates the Bootstrap CSS / JavaScript framework with various layout options, hierarchical menu navigation

Ryan Roemer 584 Nov 16, 2022
Python For Finance Cookbook - Code Repository

Python For Finance Cookbook - Code Repository

Packt 544 Dec 25, 2022
300+ Python Interview Questions

300+ Python Interview Questions

Pradeep Kumar 1.1k Jan 02, 2023
Variable Transformer Calculator

✠ VASCO - VAriable tranSformer CalculatOr Software que calcula informações de transformadores feita para a matéria de "Conversão Eletromecânica de Ene

Arthur Cordeiro Andrade 2 Feb 12, 2022
Yet Another MkDocs Parser

yamp Motivation You want to document your project. You make an effort and write docstrings. You try Sphinx. You think it sucks and it's slow -- I did.

Max Halford 10 May 20, 2022
Markdown documentation generator from Google docstrings

mkgendocs A Python package for automatically generating documentation pages in markdown for Python source files by parsing Google style docstring. The

Davide Nunes 44 Dec 18, 2022
DataAnalysis: Some data analysis projects in charles_pikachu

DataAnalysis DataAnalysis: Some data analysis projects in charles_pikachu You can star this repository to keep track of the project if it's helpful fo

9 Nov 04, 2022
Automated generation of real Swagger/OpenAPI 2.0 schemas from Django REST Framework code.

drf-yasg - Yet another Swagger generator Generate real Swagger/OpenAPI 2.0 specifications from a Django Rest Framework API. Compatible with Django Res

Cristi Vîjdea 3k Dec 31, 2022
An MkDocs plugin to export content pages as PDF files

MkDocs PDF Export Plugin An MkDocs plugin to export content pages as PDF files The pdf-export plugin will export all markdown pages in your MkDocs rep

Terry Zhao 266 Dec 13, 2022
PyPresent - create slide presentations from notes

PyPresent Create slide presentations from notes Add some formatting to text file

1 Jan 06, 2022
Testing-crud-login-drf - Creation of an application in django on music albums

testing-crud-login-drf Creation of an application in django on music albums Befo

Juan 1 Jan 11, 2022
A clean customizable documentation theme for Sphinx

A clean customizable documentation theme for Sphinx

Pradyun Gedam 1.5k Jan 06, 2023
Get link preview of a website.

Preview Link You may have seen a preview of a link with a title, image, domain, and description when you share a link on social media. This preview ha

SREEHARI K.V 8 Jan 08, 2023
charcade is a string manipulation library that can animate, color, and bruteforce strings

charcade charcade is a string manipulation library that can animate, color, and bruteforce strings. Features Animating text for CLI applications with

Aaron 8 May 23, 2022
A document format conversion service based on Pandoc.

reformed Document format conversion service based on Pandoc. Usage The API specification for the Reformed server is as follows: GET /api/v1/formats: L

David Lougheed 3 Jul 18, 2022
Generate a backend and frontend stack using Python and json-ld, including interactive API documentation.

d4 - Base Project Generator Generate a backend and frontend stack using Python and json-ld, including interactive API documentation. d4? What is d4 fo

Markus Leist 3 May 03, 2022