How to use COG's (Cloud optimized GeoTIFFs) with Rasterio

Related tags

Geolocationcog_how_to
Overview

How to use COG's (Cloud optimized GeoTIFFs) with Rasterio

According to Cogeo.org:

A Cloud Opdtimized GeoTIFF (COG) is a regular GeoTIFF file, aimed at being hosted on a HTTP file server, with an internal organization that enables more efficient workflows on the cloud. It does this by leveraging the ability of clients issuing ​HTTP GET range requests to ask for just the parts of a file they need.

Think about the following case: You want to analyze the NDVI of your local 1km² park by using Sentinel 2 geoTIFF imaginery. Sentinel 2 satellite images cover very big regions. In the past, you had to download the whole file (100mb +) for band 4 (red) and the whole file for band 8 (near infrared) even that in fact, you need only a small portion of the data. That's why COG's (cloud optimized geoTIFFs) have been invented. With them, we ask the server to only send specific bytes of the image.

Cloud optimized geoTIFFs offer:

  • efficient imaginery data access
  • reduced duplication of data
  • legacy compatibility

COG's can be read just like normal geoTIFFs. In our example, we will use an AOI (area of interest), that is described in a geoJSON. We will also use sat-search to query the latest available Sentinel-2 satellite imaginery for our specific location. Then we will use Rasterio to perform a range request to download only the parts of the files we need. We will also use Pyproj to perform neccessary coordinate transformations. The cloud optimized Sentinel 2 imaginery is hosted in a AWS S3 repository.

Install libraries (matplotlib optional)

pip install rasterio pyproj sat-search matplotlib

Import libraries

from satsearch import Search
from datetime import datetime, timedelta
from pyproj import Transformer
from json import load

import rasterio
from rasterio.features import bounds

First, we need to open our geoJSON file and extract the geometry. To create a geoJSON, you can go to geojson.io. Do not make a very large geoJSON (a good size is 1x1km²), otherwise you might get an error later.

file_path = "path/to/your/file.geojson"
with open(file_path,"r") as fp:
    file_content = load(fp)
geometry = file_content["features"][0]["geometry"]

We will query for images not older than 60 days that contain less than 20% clouds.

# search last 60 days
current_date = datetime.now()
date_60_days_ago = current_date - timedelta(days=60)
current_date = current_date.strftime("%Y-%m-%d")
date_60_days_ago = date_60_days_ago.strftime("%Y-%m-%d")

# only request images with cloudcover less than 20%
query = {
    "eo:cloud_cover": {
        "lt": 20
        }
    }
search = Search(
    url='https://earth-search.aws.element84.com/v0',
    intersects=geometry,
    datetime=date_60_days_ago + "/" + current_date,
    collections=['sentinel-s2-l2a-cogs'],
    query=query
    )        
# grep latest red && nir
items = search.items()
latest_data = items.dates()[-1]
red = items[0].asset('red')["href"]
nir = items[0].asset('nir')["href"]
print(f"Latest data found that intersects geometry: {latest_data}")
print(f"Url red band: {red}")
print(f"Url nir band: {nir}")

Now we got the URLs of the most recent Sentinel 2 imaginery for our region. In the next step, we need to calculate which pixels to query from our geoTIFF server. The satellite image comes with 10980 x 10980 pixels. Every pixel represents 10 meter ground resolution. In order to calculate which pixels fall into our area of interest, we need to reproject our geoJSON coordinates into pixel row/col. With the recent Rasterio versions, we can read COGs by passing a rasterio.windows.Window (that specifies which row/col to query) to the read function. Before we can query, we need to open a virtual file(urls of a hosted file):

for geotiff_file in [red, nir]:
    with rasterio.open(geotiff_file) as geo_fp:

Then, we calculate the bounding box around our geometry and use the pyproj.Transformer to transform our geoJSON coordinates (EPSG 4326) into Sentinel Sat's EPSG 32633 projection.

        bbox = bounds(geometry)
        coord_transformer = Transformer.from_crs("epsg:4326", geo_fp.crs) 
        # calculate pixels to be streamed in cog 
        coord_upper_left = coord_transformer.transform(bbox[3], bbox[0])
        coord_lower_right = coord_transformer.transform(bbox[1], bbox[2]) 

Now that we have the right coordinates, we can calculate from coordinates to pixels in our geoTIFF file using rasterio.

        pixel_upper_left = geo_fp.index(
            coord_upper_left[0], 
            coord_upper_left[1]
            )
        pixel_lower_right = geo_fp.index(
            coord_lower_right[0], 
            coord_lower_right[1]
            )
        
        for pixel in pixel_upper_left + pixel_lower_right:
            # If the pixel value is below 0, that means that
            # the bounds are not inside of our available dataset.
            if pixel < 0:
                print("Provided geometry extends available datafile.")
                print("Provide a smaller area of interest to get a result.")
                exit()

Now we are ready for the desired range request.

        # make http range request only for bytes in window
        window = rasterio.windows.Window.from_slices(
            (
            pixel_upper_left[0], 
            pixel_lower_right[0]
            ), 
            (
            pixel_upper_left[1], 
            pixel_lower_right[1]
            )
        )
        subset = geo_fp.read(1, window=window)

The subset object contains the desired data. We can access and vizualize it with:

        import matplotlib.pyplot as plt
        plt.imshow(subset, cmap="seismic")
        plt.colorbar()

red nir

I hope, I was able to show you how COG's work and that you are ready now to access your cloud optimized geoTIFF images in seconds compared to minutes in the past. Have a great day!

All together:

from satsearch import Search
from datetime import datetime, timedelta
from pyproj import Transformer
from json import load

import rasterio
from rasterio.features import bounds

file_path = "path/to/your/file.geojson"
with open(file_path,"r") as fp:
    file_content = load(fp)
geometry = file_content["features"][0]["geometry"]

# search last 60 days
current_date = datetime.now()
date_60_days_ago = current_date - timedelta(days=60)
current_date = current_date.strftime("%Y-%m-%d")
date_60_days_ago = date_60_days_ago.strftime("%Y-%m-%d")

# only request images with cloudcover less than 20%
query = {
    "eo:cloud_cover": {
        "lt": 20
        }
    }
search = Search(
    url='https://earth-search.aws.element84.com/v0',
    intersects=geometry,
    datetime=date_60_days_ago + "/" + current_date,
    collections=['sentinel-s2-l2a-cogs'],
    query=query
    )        
# grep latest red && nir
items = search.items()
latest_data = items.dates()[-1]
red = items[0].asset('red')["href"]
nir = items[0].asset('nir')["href"]
print(f"Latest data found that intersects geometry: {latest_data}")
print(f"Url red band: {red}")
print(f"Url nir band: {nir}")

for geotiff_file in [red, nir]:
    with rasterio.open(geotiff_file) as geo_fp:
        bbox = bounds(geometry)
        coord_transformer = Transformer.from_crs("epsg:4326", geo_fp.crs) 
        # calculate pixels to be streamed in cog 
        coord_upper_left = coord_transformer.transform(bbox[3], bbox[0])
        coord_lower_right = coord_transformer.transform(bbox[1], bbox[2]) 
        pixel_upper_left = geo_fp.index(
            coord_upper_left[0], 
            coord_upper_left[1]
            )
        pixel_lower_right = geo_fp.index(
            coord_lower_right[0], 
            coord_lower_right[1]
            )
        
        for pixel in pixel_upper_left + pixel_lower_right:
            # If the pixel value is below 0, that means that
            # the bounds are not inside of our available dataset.
            if pixel < 0:
                print("Provided geometry extends available datafile.")
                print("Provide a smaller area of interest to get a result.")
                exit()
        
        # make http range request only for bytes in window
        window = rasterio.windows.Window.from_slices(
            (
            pixel_upper_left[0], 
            pixel_lower_right[0]
            ), 
            (
            pixel_upper_left[1], 
            pixel_lower_right[1]
            )
        )
        subset = geo_fp.read(1, window=window)

        # vizualize
        import matplotlib.pyplot as plt
        plt.imshow(subset, cmap="seismic")
        plt.colorbar()
        plt.show()
Owner
Marvin Gabler
specialized in climate, data & risk | interested in nature, rockets and outer space | The earth's data for our world's future
Marvin Gabler
A toolbox for processing earth observation data with Python.

eo-box eobox is a Python package with a small collection of tools for working with Remote Sensing / Earth Observation data. Package Overview So far, t

13 Jan 06, 2022
Simple CLI for Google Earth Engine Uploads

geeup: Simple CLI for Earth Engine Uploads with Selenium Support This tool came of the simple need to handle batch uploads of both image assets to col

Samapriya Roy 79 Nov 26, 2022
List of Land Cover datasets in the GEE Catalog

List of Land Cover datasets in the GEE Catalog A list of all the Land Cover (or discrete) datasets in Google Earth Engine. Values, Colors and Descript

David Montero Loaiza 5 Aug 24, 2022
🌐 Local tile server for viewing geospatial raster files with ipyleaflet

🌐 Local Tile Server for Geospatial Rasters Need to visualize a rather large raster (gigabytes) you have locally? This is for you. A Flask application

Bane Sullivan 192 Jan 04, 2023
A utility to search, download and process Landsat 8 satellite imagery

Landsat-util Landsat-util is a command line utility that makes it easy to search, download, and process Landsat imagery. Docs For full documentation v

Development Seed 681 Dec 07, 2022
Python library to visualize circular plasmid maps

Plasmidviewer Plasmidviewer is a Python library to visualize plasmid maps from GenBank. This library provides only the function to visualize circular

Mori Hideto 9 Dec 04, 2022
prettymaps - A minimal Python library to draw customized maps from OpenStreetMap data.

A small set of Python functions to draw pretty maps from OpenStreetMap data. Based on osmnx, matplotlib and shapely libraries.

Marcelo de Oliveira Rosa Prates 9k Jan 08, 2023
Construct and use map tile grids in different projection.

Morecantile +-------------+-------------+ ymax | | | | x: 0 | x: 1 | | y: 0 | y: 0

Development Seed 67 Dec 23, 2022
Create Siege configuration files from Cloud Optimized GeoTIFF.

cogeo-siege Documentation: Source Code: https://github.com/developmentseed/cogeo-siege Description Create siege configuration files from Cloud Optimiz

Development Seed 3 Dec 01, 2022
Track International space station with python

NASA-ISS-tracker Track International space station with python Modules import json import turtle import urllib.request import time import webbrowser i

Nikhil Yadav 8 Aug 12, 2021
A bot that tweets info and location map for new bicycle parking added to OpenStreetMap within a GeoJSON boundary.

Bike parking tweepy bot app A twitter bot app that searches for bicycle parking added to OpenStreetMap. Relies on AWS Lambda/S3, Python3, Tweepy, Flas

Angelo Trivisonno 1 Dec 19, 2021
GeoIP Legacy Python API

MaxMind GeoIP Legacy Python Extension API Requirements Python 2.5+ or 3.3+ GeoIP Legacy C Library 1.4.7 or greater Installation With pip: $ pip instal

MaxMind 230 Nov 10, 2022
FDTD simulator that generates s-parameters from OFF geometry files using a GPU

Emport Overview This repo provides a FDTD (Finite Differences Time Domain) simulator called emport for solving RF circuits. Emport outputs its simulat

4 Dec 15, 2022
Open Data Cube analyses continental scale Earth Observation data through time

Open Data Cube Core Overview The Open Data Cube Core provides an integrated gridded data analysis environment for decades of analysis ready earth obse

Open Data Cube 410 Dec 13, 2022
iNaturalist observations along hiking trails

This tool reads the route of a hike and generates a table of iNaturalist observations along the trails. It also shows the observations and the route of the hike on a map. Moreover, it saves waypoints

7 Nov 11, 2022
A ninja python package that unifies the Google Earth Engine ecosystem.

A Python package that unifies the Google Earth Engine ecosystem. EarthEngine.jl | rgee | rgee+ | eemont GitHub: https://github.com/r-earthengine/ee_ex

47 Dec 27, 2022
Python module to access the OpenCage geocoding API

OpenCage Geocoding Module for Python A Python module to access the OpenCage Geocoder. Build Status / Code Quality / etc Usage Supports Python 3.6 or n

OpenCage GmbH 57 Nov 01, 2022
PySAL: Python Spatial Analysis Library Meta-Package

Python Spatial Analysis Library PySAL, the Python spatial analysis library, is an open source cross-platform library for geospatial data science with

Python Spatial Analysis Library 1.1k Dec 18, 2022
Pure Python NetCDF file reader and writer

Pyncf Pure Python NetCDF file reading and writing. Introduction Inspired by the pyshp library, which provides simple pythonic and dependency free data

Karim Bahgat 14 Sep 30, 2022
Implementation of Trajectory classes and functions built on top of GeoPandas

MovingPandas MovingPandas implements a Trajectory class and corresponding methods based on GeoPandas. Visit movingpandas.org for details! You can run

Anita Graser 897 Jan 01, 2023