Data Competition: automated systems that can detect whether people are not wearing masks or are wearing masks incorrectly

Overview

Table of contents

  1. Introduction
  2. Dataset
  3. Model & Metrics
  4. How to Run

DATA COMPETITION

The COVID-19 pandemic, which is caused by the SARS-CoV-2 virus, is still continuing strong, infecting hundreds of millions of people and killing millions. Face masks reduce transmission by preventing aerosols and droplets from spreading too far into the atmosphere. As a result, there is a growing demand for automated systems that can detect whether people are not wearing masks or are wearing masks incorrectly. This competition was designed in order to solve the problem mentioned above. This competition is unlike any other that has come before it. With a fixed model, participants will receive model code and configuration code that organizers use to train models. The candidate's task is to use data processing and generation techniques to improve the model's performance, then submit the dataset to the organizing team for training and evaluation on the private test set. The winner is the team with the highest score on the private test set.

Dataset

  • A dataset of 1100 images will be sent to you. This is an object detection dataset consisting of employee images at the office. The dataset has been assigned 3 labels by us which are no mask, mask, and incorrect mask, with the numbers 0,1,2 corresponding to each.

  • The dataset has been divided into three parts for you: train, valid, and public test. We have prepared a private test to be able to evaluate the candidate's model. This private test will be made public after the contest ends. In the public test, you can get a basic idea of the private test. Download the dataset here

  • To improve the model's performance, you can re-label it and employ data augmentation to generate more images (up to 3000).

The number of each label in each part is shown below:

No mask Mask incorrect mask
Train 308 882 51
Val 97 190 9
Public_test 47 95 13

Model & Metrics

  • The challenge is defined as object detection challenge. In the competition, We use YOLOv5s and also use a pre-trained model trained with easy mask dataset to greatly reduce training time.

  • We fix all hyperparameters of the model and do not use any augmentation tips in the source code. Therefore, each participant need to build the best possible dataset by relabeling incorrect labels, splitting train/val, augmentation tips, adding new dataset, etc.

  • In training process, Early Stopping method with patience setten to 100 iterations is used to keep track of validation set's [email protected]. Detail about [email protected] metric:

[email protected] = [email protected] = 0.2 * AP50_w + 0.3 * AP50_nw + 0.5 * AP50_wi

Where,
AP50_w: AP50 on valid mask boxes
AP50_nw: AP50 on non-mask boxes
AP50_wi: AP50 on invalid mask boxes

  • The [email protected] metric is also used as the main metric to evaluate participant's submission on private testing set.

How to Run

QuickStart

Click the image below

Open In Colab

Install requirements

  • All requirements are included in requirements.txt

  • Run the script below to clone and install all requirements

git clone https://github.com/fsoft-ailab/Data-Competition
cd Data-Competition
pip3 install -r requirements.txt

Training

  • Put your dataset into the Data-Competition folder. The structure of dataset folder is followed as folder structure below:
folder-name
├── images
│   ├── train
│   │   ├── train_img1.jpg
│   │   ├── train_img2.jpg
│   │   └── ...
│   │   
│   └── val
│       ├── val_img1.jpg
│       ├── val_img2.jpg
│       └── ...
│   
└── labels
    ├── train
    │   ├── train_img1.txt
    │   ├── train_img2.txt
    │   └── ...
    │   
    └── val
        ├── val_img1.txt
        ├── val_img2.txt
        └── ...
  • Change relative paths to train and val images folder in config/data_cfg.yaml file

  • train_cfg.yaml where we set up the model during training. You should not change such hyperparameters because it will result in incorrect results. The training results are saved in the results/train/ .

  • Run the script below to train the model. Specify particular name to identify your experiment:

python3 train.py --batch-size 64 --device 0 --name 
    

   

Note: If you get out of memory error, you can decrease batch-size to multiple of 2 as 32, 16.

Evaluation

  • Run script below to evaluate on particular dataset.
  • The --task's value is only one of train, val, or test, respectively evaluating on the training set, validation set, or public testing set.
  • Note: Specify relative path to images folder which you evaluate in config/data_cfg.yaml file.
python3 val.py --weights 
   
     --task test --name 
    
      --batch-size 64 --device 0
                                                 val
                                                 train

    
   
  • Results are saved at results/evaluate/ / .

Detection

  • You can use this script to make inferences on particular folder

  • Results are saved at .

python3 detect.py --weights 
   
     --source 
    
      --dir 
     
       --device 0

     
    
   
  • You can find more default arguments at detect.py

References

Owner
Thanh Dat Vu
Thanh Dat Vu
Pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more.

weightedcalcs weightedcalcs is a pandas-based Python library for calculating weighted means, medians, standard deviations, and more. Features Plays we

Jeremy Singer-Vine 98 Dec 31, 2022
Retail-Sim is python package to easily create synthetic dataset of retaile store.

Retailer's Sale Data Simulation Retail-Sim is python package to easily create synthetic dataset of retaile store. Simulation Model Simulator consists

Corca AI 7 Sep 30, 2022
PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io

PyStan PyStan is a Python interface to Stan, a package for Bayesian inference. Stan® is a state-of-the-art platform for statistical modeling and high-

Stan 229 Dec 29, 2022
The official pytorch implementation of ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias

ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias Introduction | Updates | Usage | Results&Pretrained Models | Statement | Intr

104 Nov 27, 2022
A collection of learning outcomes data analysis using Python and SQL, from DQLab.

Data Analyst with PYTHON Data Analyst berperan dalam menghasilkan analisa data serta mempresentasikan insight untuk membantu proses pengambilan keputu

6 Oct 11, 2022
Hangar is version control for tensor data. Commit, branch, merge, revert, and collaborate in the data-defined software era.

Overview docs tests package Hangar is version control for tensor data. Commit, branch, merge, revert, and collaborate in the data-defined software era

Tensorwerk 193 Nov 29, 2022
Ejercicios Panda usando Pandas

Readme Below we add configuration details to locally test your application To co

1 Jan 22, 2022
PyPSA: Python for Power System Analysis

1 Python for Power System Analysis Contents 1 Python for Power System Analysis 1.1 About 1.2 Documentation 1.3 Functionality 1.4 Example scripts as Ju

758 Dec 30, 2022
Find exposed data in Azure with this public blob scanner

BlobHunter A tool for scanning Azure blob storage accounts for publicly opened blobs. BlobHunter is a part of "Hunting Azure Blobs Exposes Millions of

CyberArk 250 Jan 03, 2023
fds is a tool for Data Scientists made by DAGsHub to version control data and code at once.

Fast Data Science, AKA fds, is a CLI for Data Scientists to version control data and code at once, by conveniently wrapping git and dvc

DAGsHub 359 Dec 22, 2022
NumPy aware dynamic Python compiler using LLVM

Numba A Just-In-Time Compiler for Numerical Functions in Python Numba is an open source, NumPy-aware optimizing compiler for Python sponsored by Anaco

Numba 8.2k Jan 07, 2023
Two phase pipeline + StreamlitTwo phase pipeline + Streamlit

Two phase pipeline + Streamlit This is an example project that demonstrates how to create a pipeline that consists of two phases of execution. In betw

Rick Lamers 1 Nov 17, 2021
COVID-19 deaths statistics around the world

COVID-19-Deaths-Dataset COVID-19 deaths statistics around the world This is a daily updated dataset of COVID-19 deaths around the world. The dataset c

Nisa Efendioğlu 4 Jul 10, 2022
Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Theano

PyMC3 is a Python package for Bayesian statistical modeling and Probabilistic Machine Learning focusing on advanced Markov chain Monte Carlo (MCMC) an

PyMC 7.2k Dec 30, 2022
Using approximate bayesian posteriors in deep nets for active learning

Bayesian Active Learning (BaaL) BaaL is an active learning library developed at ElementAI. This repository contains techniques and reusable components

ElementAI 687 Dec 25, 2022
Semi-Automated Data Processing

Perform semi automated exploratory data analysis, feature engineering and feature selection on provided dataset by visualizing every possibilities on each step and assisting the user to make a meanin

Arun Singh Babal 1 Jan 17, 2022
Detecting Underwater Objects (DUO)

Underwater object detection for robot picking has attracted a lot of interest. However, it is still an unsolved problem due to several challenges. We take steps towards making it more realistic by ad

27 Dec 12, 2022
Python implementation of Principal Component Analysis

Principal Component Analysis Principal Component Analysis (PCA) is a dimension-reduction algorithm. The idea is to use the singular value decompositio

Ignacio Darago 1 Nov 06, 2021