Towards Long-Form Video Understanding

Last update: Dec 26, 2022

Related tags

Deep Learning lvu

Overview

Towards Long-Form Video Understanding

Chao-Yuan Wu, Philipp Krähenbühl, CVPR 2021

[Paper] [Project Page] [Dataset]

Citation

@inproceedings{lvu2021,
  Author    = {Chao-Yuan Wu and Philipp Kr\"{a}henb\"{u}hl},
  Title     = {{Towards Long-Form Video Understanding}},
  Booktitle = {{CVPR}},
  Year      = {2021}}

Overview

This repo implements Object Transformers for long-form video understanding.

Getting Started

Please organize data/ as follows

data
|_ ava
|_ features
|_ instance_meta
|_ lvu_1.0

ava, features, and instance_meta could be found at this Google Drive folder. lvu_1.0 can be found at here.

Please also download pre-trained weights at this Google Drive folder and put them in pretrained_models/.

Pre-training

python3 -u run_pretrain.py

This pretrains on a small demo dataset data/instance_meta/instance_meta_pretrain_demo.pkl as an example. Please follow its file format if you'd like to pretrain on a larger dataset (e.g., latest full version of MovieClips).

Training and evaluating on AVA v2.2

python3 -u run_ava.py

This should achieve 31.0 mAP.

Training and evaluating on LVU tasks

python3 -u run.py [1-9]

The argument selects a task to run on. Please see run.py for details.

Acknowledgment

This implementation largely borrows from Huggingface Transformers. Please consider citing it if you use this repo.

Towards Long-Form Video Understanding

Related tags

Overview

Towards Long-Form Video Understanding

[Paper] [Project Page] [Dataset]

Citation

Overview

Getting Started

Pre-training

Training and evaluating on AVA v2.2

Training and evaluating on LVU tasks

Acknowledgment

Owner

Chao-Yuan Wu

Heterogeneous Temporal Graph Neural Network

Example scripts for the detection of lanes using the ultra fast lane detection model in Tensorflow Lite.

OMLT: Optimization and Machine Learning Toolkit

Educational 2D SLAM implementation based on ICP and Pose Graph

Implementation of the paper All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grouping and Co-training

SimpleDepthEstimation - An unified codebase for NN-based monocular depth estimation methods

2.86% and 15.85% on CIFAR-10 and CIFAR-100

Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance

functorch is a prototype of JAX-like composable function transforms for PyTorch.

OneFlow is a performance-centered and open-source deep learning framework.

Learnable Motion Coherence for Correspondence Pruning

FEDn is an open-source, modular and ML-framework agnostic framework for Federated Machine Learning

A clear, concise, simple yet powerful and efficient API for deep learning.

Forecasting Nonverbal Social Signals during Dyadic Interactions with Generative Adversarial Neural Networks

Bunch of different tools which helps visualizing and annotating images for semantic/instance segmentation tasks

This repo is a PyTorch implementation for Paper "Unsupervised Learning for Cuboid Shape Abstraction via Joint Segmentation from Point Clouds"

DeepFashion2 is a comprehensive fashion dataset.

Hunt down social media accounts by username across social networks

Library for converting from RGB / GrayScale image to base64 and back.

Voice Conversion Using Speech-to-Speech Neuro-Style Transfer