Predicting Global Crop Yield for World Hunger

Overview

Project 5: Predicting Global Crop Yield for World Hunger

Problem Statement

You are a team of data scientists hand-picked by the United Nations in order to help come up with a machine learning model to help the UN reach its Zero-Hunger goal by 2030. Currently there are nearly 1 in 8 people who do not have enough food to lead a healthy life. 870 million people do not have enough food to eat. Currently there are 7.9 billion people on the planet. To make things more difficult, the global population has been increasing steadily and is expected to reach 8.5 billion people. Therefore, with some back-of-envelope calculations, you can see that in order to end world hunger by 2030, the UN needs to come up with a strategy for nearly 940 million people at the current rate or up to 1.5 billion if we add all the new people projected to be on the planet as well as the existing number of hungry individuals. Either way, we are talking about nearly 1-1.5 billion people lacking sufficient food. For this reason, your team has been tasked with analyzing global historical data related to crop yields and figuring out how the citizens of the world can use machine learning and data science to understand the most important factors related to crop yield, temperature, rainfall, irrigation, and pesticides.

Project Goal:

  1. Create a model that successfully predicts Crop yield given various basic features related to agriculture on a global scale using longitudinal data

  2. Using this data and these models, can you predict which crops will be the most important crops to target worldwide production and in which continents? What about in which countries?

Executive Summary:

For this work, our main data set was pulled from FAOSTAT (by the Food and Agriculture Databank of the FAO). Our goal was to build various types of regression models in order to predict crop yield, as we felt this parameter is incredibly important to help solve the global hunger crisis and to support the UN mission of ending world hunger by 2030. We first needed to clean the data set by dropping null values and merging available data sets. In the Exploratory Data Analysis, we visualized the cleaned data in order to get a better sense of how crop yield related to other features in the data set. In the modeling phase, we tested various models on two feature sets and prioritized the strongest model that predicted yield for this data set by comparing R2, MAE, RMSE, and MSE scores. We concluded that Adaboost Regressor was the best model and we were able to get a 0.96 R2 score for our testing set. We were able to find which features were most predictive of our target variable, crop yield such as: 'crop potatoes','area' (in hectares), and 'fertilizer use.' Our model was succesfully able to predict crop yield in a global data set. We were able to determine that potatoes have a high yield, but low levels of production, while other crops such as rice and wheat have a high level of production, despite decreasing harvested area, indicating higher agronomic efficiency.

Data Sources

FAO Data

Our dataset was derived from FAOSTAT(The Food and Agriculture Databank of the FAO). Dataset Link

FAO, the Food and Agriculture Organization of the United Nations, is a specialized agency of the United Nations that leads international efforts to defeat global hunger. With over 194 member states, FAO works in over 130 countries worldwide. About FAO

FAOSTAT provides free access to food and agriculture data for over 245 countries and territories and covers all FAO regional groupings from 1961 to the most recent year available. FAOSTAT data are organized within the following domains:

  • Production
  • Food Security and Nutrition
  • Food Balances
  • Trade
  • Prices
  • Land, Input and Sustainability
  • Population and Employment
  • Investment Macro-Economics Indicators
  • Climate Change
  • Forestry

Data Dictionary

Type Description Example
Area_code float64 FAO code associated to the Country 1
Country object Country name Albania
Item_code float64 FAO code associated with the crop 44
Crop object Name of the crop Wheat
Year float64 Calendar year 1961
Area_ha float64 Harvested area for the crop in ha 350000
Yield_hg_ha float64 Yield per crop in hg/ha 14000
Value_N_tonnes float64 Total N applied in the country in tonnes 1000
Value_P_tonnes float64 Total P applied in the country in tonnes 100
Value_K_tonnes float64 Total K applied in the country in tonnes 50
pop_unit object Unit of pop_value (1000 person) 1000 persons
pop_value float64 Number of people to be multiplied by 1000 9169.41

Staple Crop Selection

A crop is a plant that can be grown and harvested for food or profit. By use, crops fall into six categories: food crops, feed crops, fiber crops, oil crops, ornamental crops, and industrial crops (Source). For our research we to selected the most important food crops based on their share of global caloric intake from all sources. The ranking was based on data from the WorldAtlas ranking (Source), wikiepedia (Source) and FAO (Source). We also included barley as it is the fourth most important cultivated cereal in the world (Source). The selected food crops are:

  • Maize
  • Potato
  • Rice, paddy
  • Wheat
  • Sorghum
  • Cassava
  • Barley
  • Soybeans
  • Yams

Fertilizer

For each Crop, we downloaded harvested area and yield data from 1961 through 2019 for all the countries from which FAO collects data. Unfortunately, there are no data on the type and quantity of fertilizer used for each of crop we selected. Since fertilizer is the most important input in crop production we decided to use fertilizer data for the entire country as a metric of the input for each crop. We used data for the three macronutrients : nitrogen total (N), phosphate total (P) and potash total K. Data for K are not as complete as those for N and P, in many cases data prior to 1970 is non-existent.

Population

Data on population were download for each country selected. Values are for 1000 person

Data Import and Handling

All dataset were downloaded as csv. To merge datasets unique keys were created. When merging data for crop and yield the key was “CountryYearCrop”. To merge fertilizer and population data the key was “CountryYear”. After import and the merge columns were renamed for ease of use. Redundant columns were eliminated.

MODELING

The modeling was done using the dataset created after initial data cleaning and EDA, it centered around using two feature sets to train and test the model. These two feature sets were defined as either having crop and continent dummy columns or having crop, continent, and country dummy columns. The distinction between these two were further heightened when looking at the total feature size, while the first feature set only had 19 features, the second feature set which included dummy columns for countries had 189 columns.

We used seven different models for each of these two feature sets. These models were Linear Regression, K-Nearest Neighbors, Decision Tree Regressor, Bagging Regressor, Random Forest Regressor, Ada-Boost Regressor, and a Gradient-Boost Regressor. Through numerous trials, we were able to determine that for both feature sets, Ada-Boost Regressor had the greatest overall performance.

CONCLUSION

  • A machine learning model has value in predicting crop yield and total production

  • Our models can successfully isolate the most important factors for predicting crop yield

  • Crop Yield is generally increasing for all major crops, even while harvested area decreases

  • Crop yield will need to be considered with other types of metrics (crop yield / capita, total production, total production per capita) to get a fuller picture of the global hunger crisis

  • More agronomical data will be necessary to correctly predict each single crop locally

SOFTWARE REQUIREMENTS

Programming language used: Python

Packages prominently used:

Pandas: For data structures and operations for manipulating numerical tables

Numpy: For work on large, multi-dimensional arrays, mathematical functions, and matrices.

Seaborn: Data visualization built on top of Matplotlib and integrates well with Pandas.

Matplotlib: The base data visualization and plotting library for Python, seaborn is built on top of this package

Scikit-Learn: Scikit-learn is a free software machine learning library for the Python programming language. Specific Scikit-Learn libraries used are neighbors, ensemble, pipeline, model selection, metrics, linear model, and pre-processing

Owner
Adam Muhammad Klesc
Hopeful data scientist. Currently in General Assembly and taking their data science immersive course!
Adam Muhammad Klesc
A collection of Python library code for building Python applications.

Abseil Python Common Libraries This repository is a collection of Python library code for building Python applications. The code is collected from Goo

Abseil 2k Jan 07, 2023
Architecture example simulator

SCADA architecture Example of a SCADA-like console application, used to serve as a minimal example of a standard architecture of an IIoT system. Insta

1 Nov 06, 2021
Todos os exercícios do Curso de Python, do canal Curso em Vídeo, resolvidos em Python, Javascript, Java, C++, C# e mais...

Exercícios - CeV Oferecido por Linguagens utilizadas atualmente O que vai encontrar aqui? 👀 Esse repositório é dedicado a armazenar todos os enunciad

Coding in Community 43 Nov 10, 2022
A Github Action for sending messages to a Matrix Room.

matrix-commit A Github Action for sending messages to a Matrix Room. Screenshot: Example Usage: # .github/workflows/matrix-commit.yml on: push:

3 Sep 11, 2022
A basic notes app to store your notes.

Notes Webapp A basic notes webapp to keep your notes.You can add, edit and delete notes after signing up. To add a note type your note in the text box

2 Oct 23, 2021
tool to automate exploitation of android degubg bridge vulnerability

DISCLAIMER DISCLAIMER: ANY MALICIOUS USE OF THE CONTENTS FROM THIS ARTICLE WILL NOT HOLD THE AUTHOR RESPONSIBLE HE CONTENTS ARE SOLELY FOR EDUCATIONAL

6 Feb 12, 2022
APC Power Usage is an application which shows power consuption overtime for UPS units manufactured by APC.

APC Power Usage Introduction APC Power Usage is an application which shows power consuption overtime for UPS units manufactured by APC. Screenshoots G

Stefan Kondinski 3 Oct 08, 2021
A command-line utility that creates projects from cookiecutters (project templates), e.g. Python package projects, VueJS projects.

Cookiecutter A command-line utility that creates projects from cookiecutters (project templates), e.g. creating a Python package project from a Python

18.6k Jan 02, 2023
Yet another basic python package.

ironmelts A basic python package. Easy to use. Minimum requirements. Installing Linux python3 -m pip install -U ironmelts macOS python3 -m pip install

IRONMELTS 1 Oct 26, 2021
Fisherman is a free open source fishing bot written in python.

Fisherman is a free open source fishing bot written in python.

Pure | Cody 33 Jan 29, 2022
A fast python implementation of DTU MVS 2014 evaluation

DTUeval-python A python implementation of DTU MVS 2014 evaluation. It only takes 1min for each mesh evaluation. And the gap between the two implementa

82 Dec 27, 2022
京东热爱狂欢趴&京东扫码获取cookie

京东热爱狂欢趴 一键完成任务脚本来袭 活动地址: https://wbbny.m.jd.com/babelDiy/Zeus/2s7hhSTbhMgxpGoa9JDnbDzJTaBB/index.html#/home 2021-06-02更新: 1、删除京东星推官 2、更新脚本,修复火爆问题 2021

xoyi 48 Dec 28, 2022
WMIC Serial Checker For Python

WMIC Serial Checker Follow me here: Discord | Github FR: A but éducatif seulement. EN: For educational purposes only. ❓ Informations FR: WMIC Serial C

AkaTool's 0 Apr 25, 2022
An addin for Autodesk Fusion 360 that lets you view your design in a Looking Glass Portrait 3D display

An addin for Autodesk Fusion 360 that lets you view your design in a Looking Glass Portrait 3D display

Brian Peiris 12 Nov 02, 2022
Run PD patches in NRT using Python

The files in this repository demonstrate how to use Pure Data (Pd) patches designed to run in Non-Real-Time mode to batch-process (synthesize, analyze, etc) sounds in series using Python.

Jose Henrique Padovani 3 Feb 08, 2022
A wrapper script to make working with ADB (Android Debug Bridge) easier

Python-ADB-Wrapper A wrapper script to make working with ADB (Android Debug Bridge) easier This project was just a simple test to see if I could wrap

18iteration 1 Nov 25, 2021
Unofficial package for fetching users information based on National ID Number (Tanzania)

Nida Unofficial package for fetching users information based on National ID Number made by kalebu Installation You can install it directly or using pi

Jordan Kalebu 57 Dec 28, 2022
This is a working model for which I have used python.

Jarvis_voiceAssistance This is a working model for which I have used python. This model can: 1)Play a video or song on youtube. 2)Tell us time. 3)Tell

Hardik Jain 1 Jan 30, 2022
A simple but complete exercise to learning Python

ResourceReservationProject This is a simple but complete exercise to learning Python. Task and flow chart We are going to do a new fork of the existin

2 Nov 14, 2022
Labspy06 With Python

Labspy06 Profil Nama : Nafal mumtaz fuadi Nim : 312110457 Kelas : T1.21.A.2 Latihan 1 Ubahlah kode dibawah ini menjadi fungsi menggunakan lambda impor

Mas Nafal 1 Dec 12, 2021