MapReader: A computer vision pipeline for the semantic exploration of maps at scale

Overview

MapReader

A computer vision pipeline for the semantic exploration of maps at scale

Continuous integration badge License

MapReader is an end-to-end computer vision (CV) pipeline designed by the Living with Machines project. It has two main components: preprocessing/annotation and training/inference:

MapReader pipeline

MapReader provides a set of tools to:

  • load images/maps stored locally or retrieve maps via web-servers (e.g., tileservers which can be used to retrieve maps from OpenStreetMap (OSM), the National Library of Scotland (NLS), or elsewhere). ⚠️ Refer to the credits and re-use terms section if you are using digitized maps or metadata provided by NLS.
  • preprocess images/maps (e.g., divide them into patches, resampling the images, removing borders outside the neatline or reprojecting the map).
  • annotate images/maps or their patches (i.e. slices of an image/map) using an interactive annotation tool.
  • train, fine-tune, and evaluate various CV models.
  • predict labels (i.e., model inference) on large sets of images/maps.
  • Other functionalities include:
    • various plotting tools using, e.g., matplotlib, cartopy, Google Earth, and kepler.gl.
    • compute mean/standard-deviation pixel intensity of image patches.

Below is an example of MapReader CV model output (see the paper on MapReader for more details):

British railspace and buildings as predicted by a MapReader computer vision model

British 'railspace' and buildings as predicted by a MapReader computer vision model. ~30.5M patches from ~16K nineteenth-century Ordnance Survey map sheets were used (courtesy of the National Library of Scotland). (a) Predicted railspace; (b) predicted buildings; (c) and (d) predicted railspace (red) and buildings (black) in and around Middlesbrough and London, respectively. MapReader extracts information from large images or a set of images at a patch level, as depicted in the insets. For both railspace and buildings, we removed those patches that had no other neighboring patches with the same label within a distance of 250 meters.

Table of contents

Installation

Set up a conda environment

We strongly recommend installation via Anaconda:

conda create -n mr_py38 python=3.8
  • Activate the environment:
conda activate mr_py38

Method 1

  • Install mapreader:
pip install git+https://github.com/Living-with-machines/MapReader.git
python -m ipykernel install --user --name mr_py38 --display-name "Python (mr_py38)"

Method 2

  • Clone mapreader source code:
git clone https://github.com/Living-with-machines/MapReader.git 
cd /path/to/MapReader
poetry install
poetry shell

How to cite MapReader

Please consider acknowledging MapReader if it helps you to obtain results and figures for publications or presentations, by citing:

Link: https://arxiv.org/abs/2111.15592

Kasra Hosseini, Daniel C. S. Wilson, Kaspar Beelen and Katherine McDonough (2021), MapReader: A Computer Vision Pipeline for the Semantic Exploration of Maps at Scale, arXiv:2111.15592.

and in BibTeX:

@misc{hosseini2021mapreader,
      title={MapReader: A Computer Vision Pipeline for the Semantic Exploration of Maps at Scale}, 
      author={Kasra Hosseini and Daniel C. S. Wilson and Kaspar Beelen and Katherine McDonough},
      year={2021},
      eprint={2111.15592},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Credits and re-use terms

Digitized maps

MapReader can retrieve maps from NLS (National Library of Scotland) via webservers. For all the digitized maps (retrieved or locally stored), please note the re-use terms:

⚠️ Use of the digitised maps for commercial purposes is currently restricted by contract. Use of these digitised maps for non-commercial purposes is permitted under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC-BY-NC-SA) licence. Please refer to https://maps.nls.uk/copyright.html#exceptions-os for details on copyright and re-use license.

Metadata

We have provided some metadata files in mapreader/persistent_data. For all these file, please note the re-use terms:

⚠️ Use of the metadata for commercial purposes is currently restricted by contract. Use of this metadata for non-commercial purposes is permitted under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC-BY-NC-SA) licence. Please refer to https://maps.nls.uk/copyright.html#exceptions-os for details on copyright and re-use license.

Acknowledgements

This work was supported by Living with Machines (AHRC grant AH/S01179X/1) and The Alan Turing Institute (EPSRC grant EP/N510129/1). Living with Machines, funded by the UK Research and Innovation (UKRI) Strategic Priority Fund, is a multidisciplinary collaboration delivered by the Arts and Humanities Research Council (AHRC), with The Alan Turing Institute, the British Library and the Universities of Cambridge, East Anglia, Exeter, and Queen Mary University of London.

Comments
  • Update README.md

    Update README.md

    • [x] TODOs: See https://github.com/Living-with-machines/MapReader/pull/38#issuecomment-1109569025
    • [x] Rename Maps / Non-maps to Geospatial / Non-geospatial.
    • [x] @kasra-hosseini Review the changes, check the links and merge.
    opened by kasra-hosseini 24
  • Testing `MapReader`

    Testing `MapReader`

    Hi All 👋🏼

    I will be testing MapReader install and the demo notebooks run to evecute the analysis. I will document my process here

    • [x] Installation
      • [x] Install git clone [email protected]:Living-with-machines/MapReader.git
      • [x] git branch -> * dev
      • [X] git pull origin dev
      • [X] poetry install
      • [X] poetry shell
        • this command was not included in the README.md (unlike conda activate ...)
    • [x] Notebooks code execution
      • [x] 001_retrieve_patchify_plot.ipynb
      • [x] 002_annotation.ipynb
      • [x] 003_train_classifier.ipynb
      • [x] 004_inference.ipynb
    opened by ChristinaLast 18
  • :bug: some errors in `binder` deployment.

    :bug: some errors in `binder` deployment.

    Tasks

    • [x] Fix 'great_circle' is not defined
    • [x] Fix simplekml needs to be installed to create KML outputs!

    Associated tracebacks

    ---------------------------------------------------------------------------
    NameError                                 Traceback (most recent call last)
    /tmp/ipykernel_60/1620428857.py in <module>
          3 
          4 xmin, xmax, ymin, ymax, myimg_shape, size_in_m = \
    ----> 5         mymaps.calc_pixel_width_height(all_maps[0])
    
    /srv/conda/envs/notebook/lib/python3.7/site-packages/mapreader/loader/images.py in calc_pixel_width_height(self, parent_id, calc_size_in_m)
        349 
        350         elif calc_size_in_m in ['gc', 'great-circle']:
    --> 351             bottom = great_circle((ymin, xmin), (ymin, xmax)).meters
        352             right = great_circle((ymin, xmax), (ymax, xmax)).meters
        353             top = great_circle((ymax, xmax), (ymax, xmin)).meters
    
    NameError: name 'great_circle' is not defined
    
    ---------------------------------------------------------------------------
    ModuleNotFoundError                       Traceback (most recent call last)
    /srv/conda/envs/notebook/lib/python3.7/site-packages/mapreader/loader/images.py in _createKML(self, path2kml, value, coords, counter)
        817         try:
    --> 818             import simplekml
        819         except:
    
    ModuleNotFoundError: No module named 'simplekml'
    
    During handling of the above exception, another exception occurred:
    
    ImportError                               Traceback (most recent call last)
    /tmp/ipykernel_60/28836796.py in <module>
          4             save_kml_dir="./kml_tutorial",
          5             figsize=(20, 20),
    ----> 6             image_width_resolution=600)
    
    /srv/conda/envs/notebook/lib/python3.7/site-packages/mapreader/loader/images.py in show(self, image_ids, value, plot_parent, border, border_color, vmin, vmax, colorbar, alpha, discrete_colorbar, tree_level, grid_plot, plot_histogram, save_kml_dir, image_width_resolution, kml_dpi_image, **kwds)
        675                                     value=one_image_id,
        676                                     coords=self.images["parent"][one_image_id]["coord"],
    --> 677                                     counter=-1)
        678                 else:
        679                     plt.title(one_image_id)
    
    /srv/conda/envs/notebook/lib/python3.7/site-packages/mapreader/loader/images.py in _createKML(self, path2kml, value, coords, counter)
        818             import simplekml
        819         except:
    --> 820             raise ImportError("[ERROR] simplekml needs to be installed to create KML outputs!")
        821 
        822         (lon_min, lon_max, lat_min, lat_max) = coords
    
    ImportError: [ERROR] simplekml needs to be installed to create KML outputs!
    
    opened by ChristinaLast 6
  • 🐛 `LoadAnnotations` not returning annotation interface

    🐛 `LoadAnnotations` not returning annotation interface

    When using a local notebook to run through the annotation section of the quick_start notebook, I am unable to see the LoadAnnonations object returned in order to generate new labels! See screen shot below:

    Screenshot 2022-05-09 at 13 58 51

    opened by ChristinaLast 3
  • d actual edits to first para

    d actual edits to first para

    I've restored the order to 'maps' -> 'images' so we get a clearer narative as in the current existing repo; and shortened / combined a sentence, as it was repeating 'non-maps' and 'maps', so I used 'any images' instead to make it more intuitive to read.

    I was also going to add a few sentences giving the nice positive spin about interdisciplinary cross-pollination of image analysis, but not sure where this should go: I don't want to break the flow to the instructions, so perhaps it can go after the bullet points?

    opened by dcsw2 2
  • Deploying `MapReader` through `binder`

    Deploying `MapReader` through `binder`

    • [x] @ChristinaLast and @andrewphilipsmith to walk through binder deployment
      • [x] adding requirements.txt with no hashed libraries for binderhub deployment
    opened by ChristinaLast 2
  • Model inference in one step

    Model inference in one step

    Summary

    Currently, we first need to patchify an image and then do the model inference (in two separate steps). In this issue, we plan to have a method that does both steps, i.e.,

    # example interface
    my_classifier.inference(path2image, **kwds for the slice method, including patch size, ...)
    my_classifier.plot()
    

    TODO

    • Refer to https://github.com/alan-turing-institute/mapreader-plant-scivision. Here, we have a function/method called "predict" that does model inference on an image. Under the hood, it slices an image into patches, does model inference on the patches and then plot the results (and return the predicted labels).
    • It would be interesting to have a similar function/method in MapReader.
    opened by kasra-hosseini 2
  • Dev

    Dev

    Creating requirements.txt from pyproject.toml to generate package list needed for binderhub build

    Commands run:

    • to generate requirements.txt
    poetry export -f requirements.txt --output requirements.txt --without-hashes
    

    After doing this, I am required to add the github repo manually to the requirements.txt file to install MapReader such as:

    git+https://github.com/Living-with-machines/[email protected]#egg=mapreader
    
    opened by ChristinaLast 1
  • Bump ipython from 8.0.0 to 8.0.1

    Bump ipython from 8.0.0 to 8.0.1

    Bumps ipython from 8.0.0 to 8.0.1.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • Add plant phenotyping example notebooks and data

    Add plant phenotyping example notebooks and data

    Add directory with cleaned and updated notebooks demonstrating classification of plant patches in images. Also includes examples of open access data that can be used in running these notebooks, annotation files to facilitate annotating plant vs. non-plant patches.

    opened by evangeline-corcoran 1
  • Bump pillow from 8.4.0 to 9.0.0

    Bump pillow from 8.4.0 to 9.0.0

    Bumps pillow from 8.4.0 to 9.0.0.

    Release notes

    Sourced from pillow's releases.

    9.0.0

    https://pillow.readthedocs.io/en/stable/releasenotes/9.0.0.html

    Changes

    ... (truncated)

    Changelog

    Sourced from pillow's changelog.

    9.0.0 (2022-01-02)

    • Restrict builtins for ImageMath.eval(). CVE-2022-22817 #5923 [radarhere]

    • Ensure JpegImagePlugin stops at the end of a truncated file #5921 [radarhere]

    • Fixed ImagePath.Path array handling. CVE-2022-22815, CVE-2022-22816 #5920 [radarhere]

    • Remove consecutive duplicate tiles that only differ by their offset #5919 [radarhere]

    • Improved I;16 operations on big endian #5901 [radarhere]

    • Limit quantized palette to number of colors #5879 [radarhere]

    • Fixed palette index for zeroed color in FASTOCTREE quantize #5869 [radarhere]

    • When saving RGBA to GIF, make use of first transparent palette entry #5859 [radarhere]

    • Pass SAMPLEFORMAT to libtiff #5848 [radarhere]

    • Added rounding when converting P and PA #5824 [radarhere]

    • Improved putdata() documentation and data handling #5910 [radarhere]

    • Exclude carriage return in PDF regex to help prevent ReDoS #5912 [hugovk]

    • Fixed freeing pointer in ImageDraw.Outline.transform #5909 [radarhere]

    • Added ImageShow support for xdg-open #5897 [m-shinder, radarhere]

    • Support 16-bit grayscale ImageQt conversion #5856 [cmbruns, radarhere]

    • Convert subsequent GIF frames to RGB or RGBA #5857 [radarhere]

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • Satellite images (some references)

    Satellite images (some references)

    • I just had a talk with one of the REG members on https://github.com/urbangrammarai and they are using this tool to download satellite images: https://github.com/urbangrammarai/gee_pipeline/.
    • The other option is : https://planetarycomputer.microsoft.com/
    opened by kasra-hosseini 0
  • Add `min_std_pixel` and `max_std_pixel` to `prepare_annotation`

    Add `min_std_pixel` and `max_std_pixel` to `prepare_annotation`

    So that we can filter out black patches easier. We have trained some MapReader models using ~6K annotated patches (the plant phenotyping project), and now we need to extend the dataset, particularly for non-black patches.

    enhancement 
    opened by kasra-hosseini 1
  • Choose a tool to simplify diffs on .ipynb files.

    Choose a tool to simplify diffs on .ipynb files.

    Consider

    • https://www.reviewnb.com/
    • https://jupyter.org/enhancement-proposals/08-notebook-diff/notebook-diff.html
    • https://blog.ouseful.info/2017/01/27/displaying-differences-in-jupyter-notebooks-nbdime-nbdiff/

    and others

    Build into workflow using pre-commit/CI as appropriate.

    opened by andrewphilipsmith 1
  • Create CODE_OF_CONDUCT.md

    Create CODE_OF_CONDUCT.md

    @DavidBeavan Could you please review this PR? I am using "Contributor Covenant" of GitHub with the following edit:

    Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement at https://livingwithmachines.ac.uk/contact-us/. All complaints will be reviewed and investigated promptly and fairly.

    opened by kasra-hosseini 0
  • Adding a notebook containing start of implementation for maps

    Adding a notebook containing start of implementation for maps

    This PR aims to implement the requirements of this issue https://github.com/Living-with-machines/MapReader/issues/36

    For details: See https://hackmd.io/bL3y2cWdT-y3qPGkyVzD5Q?both

    Tasks:

    • [ ] Get or create annotations for example map data
    • [ ] Complete text in HackMD above and transfer it into an appropriate place within the repo. (readme.md or quick_start.ipynb etc)
    • [ ] Resolve all of the questions in the HackMD (whether adding more detail or explicitly deciding to exclude from a quick start guide).
    • [ ] Give the quick_start.ipynb (maps) and quick_start.ipynb (plants) distinct names.
    • [ ] Complete the quick_start.ipynb (maps) to, at least, the same level of detail as the quick_start.ipynb (plants).
    opened by andrewphilipsmith 1
Releases(v0.3.3)
Owner
Living with Machines
A radical collaboration between computational linguists, curators, data scientists, software engineers, geographers and historians
Living with Machines
Employee Turnover Analysis

Employee Turnover Analysis Submission to the DataCamp competition "Can you help reduce employee turnover?"

Jannik Wiedenhaupt 1 Feb 13, 2022
Performance analysis of predictive (alpha) stock factors

Alphalens Alphalens is a Python Library for performance analysis of predictive (alpha) stock factors. Alphalens works great with the Zipline open sour

Quantopian, Inc. 2.5k Jan 09, 2023
Zipline, a Pythonic Algorithmic Trading Library

Zipline is a Pythonic algorithmic trading library. It is an event-driven system for backtesting. Zipline is currently used in production as the backte

Quantopian, Inc. 15.7k Jan 07, 2023
Udacity-api-reporting-pipeline - Udacity api reporting pipeline

udacity-api-reporting-pipeline In this exercise, you'll use portions of each of

Fabio Barbazza 1 Feb 15, 2022
Making the DAEN information accessible.

The purpose of this repository is to make the information on Australian COVID-19 adverse events accessible. The Therapeutics Goods Administration (TGA) keeps a database of adverse reactions to medica

10 May 10, 2022
SparseLasso: Sparse Solutions for the Lasso

SparseLasso: Sparse Solutions for the Lasso Introduction SparseLasso provides a Scikit-Learn based estimation of the Lasso with cross-validation tunin

Gabriel Okasa 1 Nov 08, 2021
cLoops2: full stack analysis tool for chromatin interactions

cLoops2: full stack analysis tool for chromatin interactions Introduction cLoops2 is an extension of our previous work, cLoops. From loop-calling base

YaqiangCao 25 Dec 14, 2022
MotorcycleParts DataAnalysis python

We work with the accounting department of a company that sells motorcycle parts. The company operates three warehouses in a large metropolitan area.

NASEEM A P 1 Jan 12, 2022
A python package which can be pip installed to perform statistics and visualize binomial and gaussian distributions of the dataset

GBiStat package A python package to assist programmers with data analysis. This package could be used to plot : Binomial Distribution of the dataset p

Rishikesh S 4 Oct 17, 2022
Find exposed data in Azure with this public blob scanner

BlobHunter A tool for scanning Azure blob storage accounts for publicly opened blobs. BlobHunter is a part of "Hunting Azure Blobs Exposes Millions of

CyberArk 250 Jan 03, 2023
A tax calculator for stocks and dividends activities.

Revolut Stocks calculator for Bulgarian National Revenue Agency Information Processing and calculating the required information about stock possession

Doino Gretchenliev 200 Oct 25, 2022
Basis Set Format Converter

Basis Set Format Converter Repository for the online tool that allows you to enter a basis set in the form of text input for a variety of Quantum Chem

Manas Sharma 3 Jun 27, 2022
A Pythonic introduction to methods for scaling your data science and machine learning work to larger datasets and larger models, using the tools and APIs you know and love from the PyData stack (such as numpy, pandas, and scikit-learn).

This tutorial's purpose is to introduce Pythonistas to methods for scaling their data science and machine learning work to larger datasets and larger models, using the tools and APIs they know and lo

Coiled 102 Nov 10, 2022
Exploratory data analysis

Exploratory data analysis An Exploratory data analysis APP TAPIWA CHAMBOKO 🚀 About Me I'm a full stack developer experienced in deploying artificial

tapiwa chamboko 1 Nov 07, 2021
A stock analysis app with streamlit

StockAnalysisApp A stock analysis app with streamlit. You select the ticker of the stock and the app makes a series of analysis by using the price cha

Antonio Catalano 50 Nov 27, 2022
ELFXtract is an automated analysis tool used for enumerating ELF binaries

ELFXtract ELFXtract is an automated analysis tool used for enumerating ELF binaries Powered by Radare2 and r2ghidra This is specially developed for PW

Monish Kumar 49 Nov 28, 2022
An easy-to-use feature store

A feature store is a data storage system for data science and machine-learning. It can store raw data and also transformed features, which can be fed straight into an ML model or training script.

ByteHub AI 48 Dec 09, 2022
Wafer Fault Detection - Wafer circleci with python

Wafer Fault Detection Problem Statement: Wafer (In electronics), also called a slice or substrate, is a thin slice of semiconductor, such as a crystal

Avnish Yadav 14 Nov 21, 2022
API>local_db>AWS_RDS - Disclaimer! All data used is for educational purposes only.

APIlocal_dbAWS_RDS Disclaimer! All data used is for educational purposes only. ETL pipeline diagram. Aim of project By creating a fully working pipe

0 Apr 25, 2022
Investigating EV charging data

Investigating EV charging data Introduction: Got an opportunity to work with a home monitoring technology company over the last 6 months whose goal wa

Yash 2 Apr 07, 2022