An application that maps an image of a LaTeX math equation to LaTeX code.

Overview

Image to LaTeX

Code style: black pre-commit License

An application that maps an image of a LaTeX math equation to LaTeX code.

Image to Latex streamlit app

Introduction

The problem of image-to-markup generation has been attempted by Deng et al. (2016). They provide the raw and preprocessed versions of im2latex-100K, a dataset consisting of about 100K LaTeX math equation images. Using their dataset, I trained a model that uses ResNet-18 as encoder (up to layer3) and a Transformer as decoder with cross-entropy loss.

Initially, I used the preprocessed dataset to train my model, but the preprocessing turned out to be a huge limitation. Although the model can achieve a reasonable performance on the test set, it performs poorly if the image quality, padding, or font size is different from the images in the dataset. This phenomenon has also been observed by others who have attempted the same problem using the same dataset (e.g., this project, this issue and this issue). This is most likely due to the rigid preprocessing for the dataset (e.g. heavy downsampling).

To this end, I used the raw dataset and included image augmentation (e.g. random scaling, small rotation) in my data processing pipeline to increase the diversity of the samples. Moreover, unlike Deng et al. (2016), I did not group images by size. Rather, I sampled them uniformly and padded them to the size of the largest image in the batch, to increase the generalizability of the model.

Additional problems that I found in the dataset:

  • Some latex code produces visually identical outputs (e.g. \left( and \right) look the same as ( and )), so I normalized them.
  • Some latex code is used to add space (e.g. \vspace{2px} and \hspace{0.3mm}). However, the length of the space is diffcult to judge. Also, I don't want the model generates code on blank images, so I removed them.

The best run has a character error rate (CER) of 0.17 in test set. Most errors seem to come from unnecessary horizontal spacing, e.g., \;, \, and \qquad. (I only removed \vspace and \hspace during preprocessing. I did not know that LaTeX has so many horizontal spacing commands.)

Possible improvements include:

  • Do a better job cleaning the data (e.g., removing spacing commands)
  • Train the model for more epochs (for the sake of time, I only trained the model for 15 epochs, but the validation loss is still going down)
  • Use beam search (I only implemented greedy search)
  • Use a larger model (e.g., use ResNet-34 instead of ResNet-18)
  • Do some hyperparameter tuning

I didn't do any of these, because I had limited computational resources (I was using Google Colab).

How To Use

Setup

Clone the repository to your computer and position your command line inside the repository folder:

git clone https://github.com/kingyiusuen/image-to-latex.git
cd image-to-latex

Then, create a virtual environment named venv and install required packages:

make venv
make install-dev

Data Preprocessing

Run the following command to download the im2latex-100k dataset and do all the preprocessing. (The image cropping step may take over an hour.)

python scripts/prepare_data.py

Model Training and Experiment Tracking

Model Training

An example command to start a training session:

python scripts/run_experiment.py trainer.gpus=1 data.batch_size=32

Configurations can be modified in conf/config.yaml or in command line. See Hydra's documentation to learn more.

Experiment Tracking using Weights & Biases

The best model checkpoint will be uploaded to Weights & Biases (W&B) automatically (you will be asked to register or login to W&B before the training starts). Here is an example command to download a trained model checkpoint from W&B:

python scripts/download_checkpoint.py RUN_PATH

Replace RUN_PATH with the path of your run. The run path should be in the format of //. To find the run path for a particular experiment run, go to the Overview tab in the dashboard.

For example, you can use the following command to download my best run

python scripts/download_checkpoint.py kingyiusuen/image-to-latex/1w1abmg1

The checkpoint will be downloaded to a folder named artifacts under the project directory.

Testing and Continuous Integration

The following tools are used to lint the codebase:

isort: Sorts and formats import statements in Python scripts.

black: A code formatter that adheres to PEP8.

flake8: A code linter that reports stylistic problems in Python scripts.

mypy: Performs static type checking in Python scripts.

Use the following command to run all the checkers and formatters:

make lint

See pyproject.toml and setup.cfg at the root directory for their configurations.

Similar checks are done automatically by the pre-commit framework when a commit is made. Check out .pre-commit-config.yaml for the configurations.

Deployment

An API is created to make predictions using the trained model. Use the following command to get the server up and running:

make api

You can explore the API via the generated documentation at http://0.0.0.0:8000/docs.

To run the Streamlit app, create a new terminal window and use the following command:

make streamlit

The app should be opened in your browser automatically. You can also open it by visiting http://localhost:8501. For the app to work, you need to download the artifacts of an experiment run (see above) and have the API up and running.

To create a Docker image for the API:

make docker

Acknowledgement

Remove Background from Image With Python

Install Library pypi $ pip3 install xremovebg

krypton 4 May 26, 2022
QR Code Generator

In this project, we'll be using some libraries to instantly generate authentic QR Codes and export them in various formats

Hassan Shahzad 3 Jun 02, 2022
A procedural Blender pipeline for photorealistic training image generation

BlenderProc2 A procedural Blender pipeline for photorealistic rendering. Documentation | Tutorials | Examples | ArXiv paper | Workshop paper Features

DLR-RM 1.8k Jan 02, 2023
Semi-hash-based Image Generator

pixel-planet Semi-hash-based Image Generator Utilizable for NFTs Generation Process Input is salted and hashed Colors (background, planet, stars) are

Bill Ni 3 Sep 01, 2022
MetaStalk is a tool that can be used to generate graphs from the metadata of JPEG, TIFF, and HEIC images

MetaStalk About MetaStalk is a tool that can be used to generate graphs from the metadata of JPEG, TIFF, and HEIC images, which are tested. More forma

Cyb3r Jak3 1 Jul 05, 2021
An python script to convert images to upscaled versions made out of one-colour emojis.

ABOUT This is an python script to convert png, jpg and gif(output isnt animated :( ) images to scaled versions made out of one-colour emojis. Please n

0 Oct 19, 2022
Short piece of code to create a rainbow gif of gradual contours from two shapefiles

rainbow-elevation-gif Short piece of code to create a rainbow gif of gradual con

Jess Roberts 6 Jan 17, 2022
An ascii art generator that's actually good. Does edge detection and selects the most appropriate characters.

Ascii Artist An ascii art generator that's actually good. Does edge detection and selects the most appropriate characters. Installing Installing with

18 Jan 03, 2023
Hello, this project is an example of how to generate a QR Code using python 😁

Hello, this project is an example of how to generate a QR Code using python 😁

Davi Antonaji 2 Oct 12, 2021
A little Python tool to convert a TrueType (ttf/otf) font into a PNG for use in demos.

font2png A little Python tool to convert a TrueType (ttf/otf) font into a PNG for use in demos. To use from command line it expects python3 to be at /

Rich Elmes 3 Dec 22, 2021
Plots is a graph plotting app for GNOME.

Plots is a graph plotting app for GNOME. Plots makes it easy to visualise mathematical formulae. In addition to basic arithmetic operations, it supports trigonometric, hyperbolic, exponential and log

Alex Huntley 138 Dec 14, 2022
A large-scale dataset of both raw MRI measurements and clinical MRI images

fastMRI is a collaborative research project from Facebook AI Research (FAIR) and NYU Langone Health to investigate the use of AI to make MRI scans faster. NYU Langone Health has released fully anonym

Facebook Research 907 Jan 04, 2023
Hacking github graph with a easy python script

Hacking-Github-Graph Hacking github graph with a easy python script Requirements git latest version installed. A text editor (eg: vs code, sublime tex

SENPAI LEGEND 1 Nov 01, 2021
Depix is a tool for recovering passwords from pixelized screenshots.

This implementation works on pixelized images that were created with a linear box filter. In this article I cover background information on pixelization and similar research.

23.1k Jan 04, 2023
A utility for quickly cropping large collections of images.

Crop Tool A utility for quickly cropping large collections of images. Inspired by Derrick Schultz's dataset-tools. Setup It's suggested that you use A

dusk (they/them) 6 Nov 14, 2021
Convert any binary data to a PNG image file and vice versa.

What is PngBin? The name PngBin comes from an image format file extension PNG (Portable Network Graphics) and the word Binary. An image produced by Pn

Nathan Young 87 Dec 22, 2022
ć€ččŽ‰å›¾ē‰‡ē®—ę³•ć€‘é«˜ęŸå›¾åƒåŽ‹ē¼©ē®—ę³•ļ¼ļ¼Ÿ

ć€ččŽ‰å›¾ē‰‡ē®—ę³•ć€‘é«˜ęŸå›¾åƒåŽ‹ē¼©ē®—ę³•ļ¼ļ¼Ÿ ęˆ‘åˆå‘ę˜Žå‡ŗę–°ē®—ę³•äŗ†ļ¼ čæ™ę¬”ęˆ‘å‘ę˜Žēš„ę˜Æę–°åž‹é«˜ęŸå›¾åƒåŽ‹ē¼©ē®—ę³•ā€”ā€”ččŽ‰å›¾ē‰‡ē®—ę³•ļ¼äøŗä»€ä¹ˆę˜ÆččŽ‰å›¾ē‰‡ļ¼Œčæ™ę˜Æå› äøŗå®ƒę˜Æä½æåŠØē”Øę³•ļ¼Œč®©å›¾ē‰‡å˜å°ę‰€ä»„ę˜ÆččŽ‰å›¾ē‰‡ļ¼Œå¤§å®¶äø€å®šč¦å­¦å„½čÆ­ę–‡å“¦ļ¼ åŽ‹ē¼©ę•ˆęžœ å¤Ŗē„žå„‡äŗ†ļ¼åŽ‹ē¼©ēŽ‡ē«Ÿē„¶é«˜č¾¾99.97%! äøŽåøøč§åŽ‹ē¼©ē®—ę³•åÆ¹ęÆ” åœØå›¾ē‰‡ęœ€ē»ˆå¤§å°äøŗ1KBēš„ęƒ…å†µ

黄巍 49 Oct 17, 2022
Python-based tools for document analysis and OCR

ocropy OCRopus is a collection of document analysis programs, not a turn-key OCR system. In order to apply it to your documents, you may need to do so

OCRopus 3.2k Jan 04, 2023
Change the image one color channel at a time.

Building-a-Contact-Sheet This hands-on Project is in Python 3 Programming Specialization offered by University of Michigan via Coursera. change the im

Eszter Pai 1 Jan 03, 2022
Quickly 'anonymize' all people in an image. This script will put a black bar over all eye-pairs in an image

Face-Detacher Quickly 'anonymize' all people in an image. This script will put a black bar over all eye-pairs in an image This is a small python scrip

Don Cato 1 Oct 29, 2021