A PyTorch implementation of the architecture of Mask RCNN

Overview

EDIT (AS OF 4th NOVEMBER 2019):

  1. This implementation has multiple errors and as of the date 4th, November 2019 is insufficient to be utilized as a resource to understanding the architecture of Mask R-CNN. It has been pointed out to me through multiple emails and comments on HackerNews that such a faulty implementation is to the detriment of the research endeavors in the deep learning community. It was a project that I had put together quite early in my academic career and I did not realize the scale of my mistake

  2. I intend to take care of the issues (the issues filed in this repository are representative) and make this code more "readable" and embellish it with better documentation so that it fulfills the purpose for which it was made. Unfortunately, as of right now, I am busy with my academics and cannot attend to this project. I shall start working on bettering this repository by mid-January to early February 2020. Until then, I have provided links to other implementations of Mask R-CNN that I think could help serve your purpose

  3. PR's fixing any one of the issues listed are always welcome and will allow me to get a headstart on this particular task of making this repository more presentable.

Once again I would like to apologize for any inconvenience caused

LINKS

  1. https://github.com/facebookresearch/detectron2 (PyTorch implementation)
  2. https://github.com/matterport/Mask_RCNN (Tensorflow implementation). Much of this repository was built using this repository as a reference

Mask-RCNN

A PyTorch implementation of the architecture of Mask RCNN

Decription of folders

  1. model.py includes the models of ResNet and FPN which were already implemented by the authors of the papers and reproduced in this implementation
  2. nms and RoiAlign are taken from Robb Girshick's implementation of faster RCNN
  3. Focal loss has been added to this implementtaion on lieu of better results as evidenced by the paper on RetinaNets

Mask-RCNN model:

alt text

Features:

  1. The part of the network responsible for bounding box detection derives it's inspiration from the faster RCNN model having a RPN working in tandem with a ConvNet
  2. The pooling layers present in the ConvNet round down or round up to the nearest integer when the stride is not a divisor of the receptive field, which tends to either lose or assume "information" from the image respectively at the non integral points.
  3. ROI align was proposed to deal with this, wherein bilinear interpolation is used to detect the values at the non integral values of the pixels
  4. Using a more complex interpolation scheme( cubic interpolation -> 16 additional features) offers a slightly better result when this model was tested, however not enough to justify the additional complexity
  5. Cross entropy loss when summed over a huge number of proposals tends to take a huge value for proposals that have a high confidence metric thereby dwarfing the contribution from the proposals of interest. Focal Loss was proposed to do away with this problem
  6. However Focal loss gives much better results with single stage networks. This is because a two stage network has some discriminative policy to deal with this class imbalance something which the single stage networks don't enjoy.

If you find any issue in this repsoritory, feel free to fork this repository and submit a PR with the necessary changes

Owner
Sai Himal Allu
Research Assistant at CVIT-IIITH Ex: Undergrad at IIT Roorkee
Sai Himal Allu
A model which classifies reviews as positive or negative.

SentiMent Analysis In this project I built a model to classify movie reviews fromn the IMDB dataset of 50K reviews. WordtoVec : Neural networks only w

Rishabh Bali 2 Feb 09, 2022
A collection of resources on GAN Inversion.

This repo is a collection of resources on GAN inversion, as a supplement for our survey

A graph neural network (GNN) model to predict protein-protein interactions (PPI) with no sample features

A graph neural network (GNN) model to predict protein-protein interactions (PPI) with no sample features

2 Jul 25, 2022
nn_builder lets you build neural networks with less boilerplate code

nn_builder lets you build neural networks with less boilerplate code. You specify the type of network you want and it builds it. Install pip install n

Petros Christodoulou 157 Nov 20, 2022
Use unsupervised and supervised learning to predict stocks

AIAlpha: Multilayer neural network architecture for stock return prediction This project is meant to be an advanced implementation of stacked neural n

Vivek Palaniappan 1.5k Dec 26, 2022
Free-duolingo-plus - Duolingo account creator that uses your invite code to get you free duolingo plus

free-duolingo-plus duolingo account creator that uses your invite code to get yo

1 Jan 06, 2022
A Convolutional Transformer for Keyword Spotting

☢️ Audiomer ☢️ Audiomer: A Convolutional Transformer for Keyword Spotting [ arXiv ] [ Previous SOTA ] [ Model Architecture ] Results on SpeechCommands

49 Jan 27, 2022
Official implementation of our neural-network-based fast diffuse room impulse response generator (FAST-RIR)

This is the official implementation of our neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.

12 Jan 13, 2022
The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation This repository is the official implementation of CVPR 2021 paper:

9 Nov 14, 2022
Fake videos detection by tracing the source using video hashing retrieval.

Vision Transformer Based Video Hashing Retrieval for Tracing the Source of Fake Videos 🎉️ 📜 Directory Introduction VTL Trace Samples and Acc of Hash

56 Dec 22, 2022
Code for "Single-view robot pose and joint angle estimation via render & compare", CVPR 2021 (Oral).

Single-view robot pose and joint angle estimation via render & compare Yann Labbé, Justin Carpentier, Mathieu Aubry, Josef Sivic CVPR: Conference on C

Yann Labbé 51 Oct 14, 2022
CS50's Introduction to Artificial Intelligence Test Scripts

CS50's Introduction to Artificial Intelligence Test Scripts 🤷‍♂️ What's this? 🤷‍♀️ This repository contains Python scripts to automate tests for mos

Jet Kan 2 Dec 28, 2022
Learning Chinese Character style with conditional GAN

zi2zi: Master Chinese Calligraphy with Conditional Adversarial Networks Introduction Learning eastern asian language typefaces with GAN. zi2zi(字到字, me

Yuchen Tian 2.2k Jan 02, 2023
⚡ H2G-Net for Semantic Segmentation of Histopathological Images

H2G-Net This repository contains the code relevant for the proposed design H2G-Net, which was introduced in the manuscript "Hybrid guiding: A multi-re

André Pedersen 8 Nov 24, 2022
SatelliteSfM - A library for solving the satellite structure from motion problem

Satellite Structure from Motion Maintained by Kai Zhang. Overview This is a libr

Kai Zhang 190 Dec 08, 2022
RID-Noise: Towards Robust Inverse Design under Noisy Environments

This is code of RID-Noise. Reproduce RID-Noise Results Toy tasks Please refer to the notebook ridnoise.ipynb to view experiments on three toy tasks. B

Thyrix 2 Nov 23, 2022
Codes for 'Dual Parameterization of Sparse Variational Gaussian Processes'

Dual Parameterization of Sparse Variational Gaussian Processes Documentation | Notebooks | API reference Introduction This repository is the official

AaltoML 7 Dec 23, 2022
A deep neural networks for images using CNN algorithm.

Example-CNN-Project This is a simple project showing how to implement deep neural networks using CNN algorithm. The dataset is taken from this link: h

Mohammad Amin Dadgar 3 Sep 16, 2022
A high-performance anchor-free YOLO. Exceeding yolov3~v5 with ONNX, TensorRT, NCNN, and Openvino supported.

YOLOX is an anchor-free version of YOLO, with a simpler design but better performance! It aims to bridge the gap between research and industrial communities. For more details, please refer to our rep

7.7k Jan 06, 2023
[EMNLP 2020] Keep CALM and Explore: Language Models for Action Generation in Text-based Games

Contextual Action Language Model (CALM) and the ClubFloyd Dataset Code and data for paper Keep CALM and Explore: Language Models for Action Generation

Princeton Natural Language Processing 43 Dec 16, 2022