[ICME 2021 Oral] CORE-Text: Improving Scene Text Detection with Contrastive Relational Reasoning

Last update: Aug 11, 2022

Related tags

Deep Learning CORE-Text

Overview

CORE-Text: Improving Scene Text Detection with Contrastive Relational Reasoning

This repository is the official PyTorch implementation of CORE-Text, and contains demo training and evaluation scripts.

Requirements

mmdetection == 2.13.0
mmcv == 1.3.5
pyclipper == 1.3.0

Training Demo

Base (Mask R-CNN)

To train Base (Mask R-CNN) on a single node with 4 gpus, run:

#!/usr/bin/env bash

GPUS=4
PORT=${PORT:-29500}
PYTHON=${PYTHON:-"python"}

CONFIG=configs/icdar2017mlt/base.py
WORK_DIR=work_dirs/mask_rcnn_r50_fpn_train_base

$PYTHON -m torch.distributed.launch --nproc_per_node=$GPUS \
                                    --nnodes=1 --node_rank=0 --master_addr="localhost" \
                                    --master_port=$PORT \
                                    tools/train.py \
                                    $CONFIG \
                                    --no-validate \
                                    --launcher pytorch \
                                    --work-dir ${WORK_DIR} \
                                    --seed 0

VRM

To train VRM on a single node with 4 gpus, run:

#!/usr/bin/env bash

GPUS=4
PORT=${PORT:-29500}
PYTHON=${PYTHON:-"python"}

CONFIG=configs/icdar2017mlt/vrm.py
WORK_DIR=work_dirs/mask_rcnn_r50_fpn_train_vrm

$PYTHON -m torch.distributed.launch --nproc_per_node=$GPUS \
                                    --nnodes=1 --node_rank=0 --master_addr="localhost" \
                                    --master_port=$PORT \
                                    tools/train.py \
                                    $CONFIG \
                                    --no-validate \
                                    --launcher pytorch \
                                    --work-dir ${WORK_DIR} \
                                    --seed 0

CORE

To train CORE (ours) on a single node with 4 gpus, run:

#!/usr/bin/env bash

GPUS=4
PORT=${PORT:-29500}
PYTHON=${PYTHON:-"python"}

# pre-training
CONFIG=configs/icdar2017mlt/core_pretrain.py
WORK_DIR=work_dirs/mask_rcnn_r50_fpn_train_core_pretrain

$PYTHON -m torch.distributed.launch --nproc_per_node=$GPUS \
                                    --nnodes=1 --node_rank=0 --master_addr="localhost" \
                                    --master_port=$PORT \
                                    tools/train.py \
                                    $CONFIG \
                                    --no-validate \
                                    --launcher pytorch \
                                    --work-dir ${WORK_DIR} \
                                    --seed 0

# training
CONFIG=configs/icdar2017mlt/core.py
WORK_DIR=work_dirs/mask_rcnn_r50_fpn_train_core

$PYTHON -m torch.distributed.launch --nproc_per_node=$GPUS \
                                    --nnodes=1 --node_rank=0 --master_addr="localhost" \
                                    --master_port=$PORT \
                                    tools/train.py \
                                    $CONFIG \
                                    --no-validate \
                                    --launcher pytorch \
                                    --work-dir ${WORK_DIR} \
                                    --seed 0

Evaluation Demo

GPUS=4
PORT=${PORT:-29500}
CONFIG=path/to/config
CHECKPOINT=path/to/checkpoint

python -m torch.distributed.launch --nproc_per_node=$GPUS --master_port=$PORT \
    ./tools/test.py $CONFIG $CHECKPOINT --launcher pytorch \
    --eval segm \
    --not-encode-mask \
    --eval-options "jsonfile_prefix=path/to/work_dir/results/eval" "gt_path=data/icdar2017mlt/icdar2017mlt_gt.zip"

Dataset Format

The structure of the dataset directory is shown as following, and we provide the COCO-format label (ICDAR2017_train.json and ICDAR2017_val.json) and the ground truth zipfile (icdar2017mlt_gt.zip) for training and evaluation.

data
└── icdar2017mlt
    ├── annotations
    |   ├── ICDAR2017_train.json
    |   └── ICDAR2017_val.json
    ├── icdar2017mlt_gt.zip
    └── image
         ├── train
         └── val

Results

Our model achieves the following performance on ICDAR 2017 MLT val set. Note that the results are slightly different (~0.1%) from what we reported in the paper, because we reimplement the code based on the open-source mmdetection.

Method	Backbone	Training set	Test set	Hmean	Precision	Recall	Download
Base (Mask R-CNN)	ResNet50	ICDAR 2017 MLT Train	ICDAR 2017 MLT Val	0.800	0.828	0.773	model \| log
VRM	ResNet50	ICDAR 2017 MLT Train	ICDAR 2017 MLT Val	0.812	0.853	0.774	model \| log
CORE (ours)	ResNet50	ICDAR 2017 MLT Train	ICDAR 2017 MLT Val	0.821	0.872	0.777	model \| log

Citation

@inproceedings{9428457,
  author={Lin, Jingyang and Pan, Yingwei and Lai, Rongfeng and Yang, Xuehang and Chao, Hongyang and Yao, Ting},
  booktitle={2021 IEEE International Conference on Multimedia and Expo (ICME)},
  title={Core-Text: Improving Scene Text Detection with Contrastive Relational Reasoning},
  year={2021},
  pages={1-6},
  doi={10.1109/ICME51207.2021.9428457}
}

[ICME 2021 Oral] CORE-Text: Improving Scene Text Detection with Contrastive Relational Reasoning

Related tags

Overview

CORE-Text: Improving Scene Text Detection with Contrastive Relational Reasoning

Requirements

Training Demo

Base (Mask R-CNN)

VRM

CORE

Evaluation Demo

Dataset Format

Results

Citation

Owner

Jingyang Lin

NuPIC Studio is an all-in-one tool that allows users create a HTM neural network from scratch

A lightweight face-recognition toolbox and pipeline based on tensorflow-lite

Recovering Brain Structure Network Using Functional Connectivity

natural image generation using ConvNets

Implement face detection, and age and gender classification, and emotion classification.

Automatically measure the facial Width-To-Height ratio and get facial analysis results provided by Microsoft Azure

Training DALL-E with volunteers from all over the Internet using hivemind and dalle-pytorch (NeurIPS 2021 demo)

The mini-AlphaStar (mini-AS, or mAS) - mini-scale version (non-official) of the AlphaStar (AS)

Stream images from a connected camera over MQTT, view using Streamlit, record to file and sqlite

Tensorflow implementation of "Learning Deconvolution Network for Semantic Segmentation"

Code for our NeurIPS 2021 paper Mining the Benefits of Two-stage and One-stage HOI Detection

Examples of how to create colorful, annotated equations in Latex using Tikz.

Pytorch Lightning code guideline for conferences

More than a hundred strange attractors

Chinese license plate recognition

In this project we predict the forest cover type using the cartographic variables in the training/test datasets.

Spontaneous Facial Micro Expression Recognition using 3D Spatio-Temporal Convolutional Neural Networks

A transformer model to predict pathogenic mutations

The code repository for EMNLP 2021 paper "Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization".

Fully convolutional deep neural network to remove transparent overlays from images

[ICME 2021 Oral] CORE-Text: Improving Scene Text Detection with Contrastive Relational Reasoning

Related tags

Overview

CORE-Text: Improving Scene Text Detection with Contrastive Relational Reasoning

Requirements

Training Demo

Base (Mask R-CNN)

VRM

CORE

Evaluation Demo

Dataset Format

Results

Citation

Owner

Jingyang Lin

NuPIC Studio is an all­-in-­one tool that allows users create a HTM neural network from scratch

A lightweight face-recognition toolbox and pipeline based on tensorflow-lite

Recovering Brain Structure Network Using Functional Connectivity

natural image generation using ConvNets

Implement face detection, and age and gender classification, and emotion classification.

Automatically measure the facial Width-To-Height ratio and get facial analysis results provided by Microsoft Azure

Training DALL-E with volunteers from all over the Internet using hivemind and dalle-pytorch (NeurIPS 2021 demo)

The mini-AlphaStar (mini-AS, or mAS) - mini-scale version (non-official) of the AlphaStar (AS)

Stream images from a connected camera over MQTT, view using Streamlit, record to file and sqlite

Tensorflow implementation of "Learning Deconvolution Network for Semantic Segmentation"

Code for our NeurIPS 2021 paper Mining the Benefits of Two-stage and One-stage HOI Detection

Examples of how to create colorful, annotated equations in Latex using Tikz.

Pytorch Lightning code guideline for conferences

More than a hundred strange attractors

Chinese license plate recognition

In this project we predict the forest cover type using the cartographic variables in the training/test datasets.

Spontaneous Facial Micro Expression Recognition using 3D Spatio-Temporal Convolutional Neural Networks

A transformer model to predict pathogenic mutations

The code repository for EMNLP 2021 paper "Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization".

Fully convolutional deep neural network to remove transparent overlays from images

NuPIC Studio is an all-in-one tool that allows users create a HTM neural network from scratch