Diabetes Prediction with Logistic Regression

Exploratory Data Analysis
Data Preprocessing
Model & Prediction
Model Evaluation
Model Validation: Holdout
Model Validation: 10-Fold Cross Validation
Prediction for A New Observation

Business Problem

Characteristics of people with diabetes will be able to predict whether they have a patient or not it is desirable to develop a machine learning model.

Dataset Story

The data set is part of a large data set maintained at the National Institutes of Diabetes-dIgestive-Kidney Diseases in the United States. this data used for a diabetes study conducted on Pima Indian women aged 21 years and older living in the city of Phoenix, which is their city. The data consists of 768 observations and 8 numerical independent variables. The target variable is specified as "output";

1 diabetes test result is positive, 0 indicates that it is negative.

Variables

Pregnancies: Number of pregnancies
Glucose: 2 Hours plasma glucose concentration in the oral glucose tolerance test
Blood Pressure: mm Hg
SkinThickness:
Insulin: 2 Hours serum insulin (mu U/ml)
DiabetesPedigreeFunction
Age: years
Outcome: Having diabete (1) or not (0)

In this study, the diabetes data set was reviewed and it was tried to predict whether a person has diabetes with a Logistic Regression model. Firstly, the dependent variable "outcome" was reviewed in the study. In the last step, new variables were produced and the success of the model was tried to be increased. The accuracy rate and F1 score of the established model were determined as 0.63 and the AUC value was determined as 0.84. Finally, it was estimated by the established model whether a randomly selected person has diabetes or not.

Diabetes Prediction with Logistic Regression

Related tags

Overview

Diabetes Prediction with Logistic Regression

Business Problem

Dataset Story

Variables

Owner

AZİZE SULTAN PALALI

Interactive Web App with Streamlit and Scikit-learn that applies different Classification algorithms to popular datasets

A Python-based application demonstrating various search algorithms, namely Depth-First Search (DFS), Breadth-First Search (BFS), and A* Search (Manhattan Distance Heuristic)

This repository contains the code to predict house price using Linear Regression Method

A Time Series Library for Apache Spark

A single Python file with some tools for visualizing machine learning in the terminal.

distfit - Probability density fitting

Penguins species predictor app is used to classify penguins species created using python's scikit-learn, fastapi, numpy and joblib packages.

Data Version Control or DVC is an open-source tool for data science and machine learning projects

Time series forecasting with PyTorch

A basic Ray Tracer that exploits numpy arrays and functions to work fast.

MiniTorch - a diy teaching library for machine learning engineers

A Python implementation of FastDTW

K-means clustering is a method used for clustering analysis, especially in data mining and statistics.

Bayesian optimization based on Gaussian processes (BO-GP) for CFD simulations.

Python based GBDT implementation

A library of extension and helper modules for Python's data analysis and machine learning libraries.

Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.

Distributed deep learning on Hadoop and Spark clusters.

Neural Machine Translation (NMT) tutorial with OpenNMT-py

SIMD-accelerated bitwise hamming distance Python module for hexidecimal strings