Computer Vision Script to recognize first person motion, developed as final project for the course "Machine Learning and Deep Learning"

Last update: Jul 16, 2022

Related tags

Deep Learning Imprinting-the-Motion

Overview

Overview of The Code

BaseColab/MLDL_FPAR.pdf: it contains the full explanation of our work
Base Colab: it contains the base colab used to perform all the training required for the project
Script: it contains the scripts used in Colab for calling the correct module
module.py: python files

Imprinting The Motion

Abstract

First person action recognition (FPAR) task is one of the most challenging in action recognition field. Most of the existing works address this issue with two-stream architectures, where the visual appearance and the motion information of the object of interest, are exploited. In this paper, we use as starting point the Ego-RNN architecture with the addition of the Motion Segmentation (MS) auxiliary task. We propose the injection of a new branch in the architecture, in order to employ the motion information more effectively. This leads to have better predictions.

Our Architecture

The Action Recognition Block extracts important spatial and temporal information from the video with the exploitation of the ResNet-34 (mustard), Spatial Attention Layer (yellow) and ConvLSTM (orange). Moreover, it takes advantage from the auxiliary task of the Motion Prediction Block, by embedding its knowledge inside the first layers of the backbone. This is performed with a feedback branch (blue) that takes as input the features of the motion segmentation (MS) task (green). The Motion Prediction Block takes, as input, the appearance features from the layer 4 of the ResNet and tries to identifies which parts of the image are going to move

Results

Our architecture is able to further exploit the motion information provided by the motion segmentation by merging them with the appearance features in the first layers of the backbone. The result is a model that better focuses on the relevant elements for action recognition and this lead to the correct prediction (shake tea cup instead of stir spoon cup)

Computer Vision Script to recognize first person motion, developed as final project for the course "Machine Learning and Deep Learning"

Related tags

Overview

Overview of The Code

Imprinting The Motion

Abstract

Our Architecture

Results

Owner

Simone Papicchio

A general-purpose encoder-decoder framework for Tensorflow

Implementation for "Domain-Specific Bias Filtering for Single Labeled Domain Generalization"

QR2Pass-project - A proof of concept for an alternative (passwordless) authentication system to a web server

A little software to generate and save Julia or Mandelbrot's Fractals.

This is a demo app to be used in the video streaming applications

This repository includes the official project for the paper: TransMix: Attend to Mix for Vision Transformers.

AI virtual gym is an AI program which can be used to exercise and can be used to see if we are doing the exercises

CVPRW 2021: How to calibrate your event camera

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

Code for ACM MM2021 paper "Complementary Trilateral Decoder for Fast and Accurate Salient Object Detection"

Official implementation for the paper: Multi-label Classification with Partial Annotations using Class-aware Selective Loss

Source code for our Paper "Learning in High-Dimensional Feature Spaces Using ANOVA-Based Matrix-Vector Multiplication"

A PyTorch implementation of ViTGAN based on paper ViTGAN: Training GANs with Vision Transformers.

Wider-Yolo Kütüphanesi ile Yüz Tespit Uygulamanı Yap

This is the repo for the paper "Improving the Accuracy-Memory Trade-Off of Random Forests Via Leaf-Refinement".

Neuron Merging: Compensating for Pruned Neurons (NeurIPS 2020)

Official repository of the paper Learning to Regress 3D Face Shape and Expression from an Image without 3D Supervision

My implementation of DeepMind's Perceiver

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

Unofficial PyTorch implementation of SimCLR by Google Brain