[CVPR2022] Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos

Overview

Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos

Created by Muheng Li, Lei Chen, Yueqi Duan, Zhilan Hu, Jianjiang Feng, Jie Zhou, Jiwen Lu

This repository contains PyTorch implementation for Bridge-Prompt (CVPR 2022).

We propose a prompt-based framework, Bridge-Prompt (Br-Prompt), to model the semantics across multiple adjacent correlated actions, so that it simultaneously exploits both out-of-context and contextual information from a series of ordinal actions in instructional videos. More specifically, we reformulate the individual action labels as integrated text prompts for supervision, which bridge the gap between individual action semantics. The generated text prompts are paired with corresponding video clips, and together co-train the text encoder and the video encoder via a contrastive approach. The learned vision encoder has a stronger capability for ordinal-action-related downstream tasks, e.g. action segmentation and human activity recognition.

intro

Our code is based on CLIP and ActionCLIP.

Prerequisites

Requirements

You may need ffmpeg for video data pre-processing.

The environment is also recorded in requirements.txt, which can be reproduced by

pip install -r requirements.txt

Pretrained models

We use the base model (ViT-B/16 for image encoder & text encoder) pre-trained by ActionCLIP based on Kinetics-400. The model can be downloaded in link (pwd:ilgw). The pre-trained model should be saved in ./models/.

Datasets

Raw video files are needed to train our framework. Please download the datasets with RGB videos from the official websites ( Breakfast / GTEA / 50Salads ) and save them under the folder ./data/(name_dataset). For convenience, we have used the extracted frames of the raw RGB videos as inputs. You can extract the frames from raw RGB datasets by running:

python preprocess/get_frames.py --dataset (name_dataset) --vpath (folder_to_your_videos) --fpath ./data/(name_dataset)/frames/

To be noticed, ffmpeg is needed here for frame extraction.

Furthermore, please also extract the .zip files to ./data/(name_dataset) respectively.

Training

  • To train Bridge-Prompt on Breakfast from Kinetics400 pretrained models, you can run:
bash scripts/run_train.sh  ./configs/breakfast/breakfast_ft.yaml
  • To train Bridge-Prompt on GTEA from Kinetics400 pretrained models, you can run:
bash scripts/run_train.sh  ./configs/gtea/gtea_ft.yaml
  • To train Bridge-Prompt on 50Salads from Kinetics400 pretrained models, you can run:
bash scripts/run_train.sh  ./configs/salads/salads_ft.yaml

Extracting frame features

We use the Bridge-Prompt pre-trained image encoders to extract frame-wise features for further downstream tasks (e.g. action segmentation). You can run the following command for each dataset respectively:

python extract_frame_features.py --config ./configs/(dataset_name)/(dataset_name)_exfm.yaml --dataset (dataset_name)

Since 50Salads/Breakfast are large scale datasets, we extract the frame features by window splits. To combine the splits, please run the following command:

python preprocess/combine_features.py

Please modify the variables dataset and feat_name in combine_features.py for each dataset.

Action segmentation

You can reproduce the action segmentation results using ASFormer by the previously extracted frame features.

Activity recognition

You can reproduce the activity recognition results using the command:

python ft_acti.py

based on the previously extracted frame features (Breakfast).

Ordinal action recognition

The ordinal action inferences are executed using the command:

bash scripts/run_test.sh  ./configs/(dataset_name)/(dataset_name)_test.yaml

and check the accuracies using:

bash preprocess/checknpy.py

Please modify the variables dataset in checknpy.py for each dataset.

Notes

Please modify pretrain in all config files according to your own working directions.

License

MIT License.

Owner
Graduate student of Tsinghua University. Major in Automation.
This repo generates the training data and the model for Morpheus-Deblend

Morpheus-Deblend This repo generates the training data and the model for Morpheus-Deblend. This is the active development repo for the project and as

Ryan Hausen 2 Apr 18, 2022
Official PyTorch Implementation of paper EAN: Event Adaptive Network for Efficient Action Recognition

Official PyTorch Implementation of paper EAN: Event Adaptive Network for Efficient Action Recognition

TianYuan 27 Nov 07, 2022
Code for Dual Contrastive Learning for Unsupervised Image-to-Image Translation, NTIRE, CVPRW 2021.

arXiv Dual Contrastive Learning Adversarial Generative Networks (DCLGAN) We provide our PyTorch implementation of DCLGAN, which is a simple yet powerf

119 Dec 04, 2022
Semantic Segmentation Suite in TensorFlow

Semantic Segmentation Suite in TensorFlow. Implement, train, and test new Semantic Segmentation models easily!

George Seif 2.5k Jan 06, 2023
Redash reset for python

redash-reset This will use a default REDASH_SECRET_KEY key of c292a0a3aa32397cdb050e233733900f this allows you to reset the password of the user ID bu

Robert Wiggins 5 Nov 14, 2022
AI that generate music

PianoGPT ai that generate music try it here https://share.streamlit.io/annasajkh/pianogpt/main/main.py or here https://huggingface.co/spaces/Annas/Pia

Annas 28 Nov 27, 2022
An implementation of the proximal policy optimization algorithm

PPO Pytorch C++ This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch. It uses a simple TestEnvironment t

Martin Huber 59 Dec 09, 2022
A basic neural network for image segmentation.

Unet_erythema_detection A basic neural network for image segmentation. 前期准备 1.在logs文件夹中下载h5权重文件,百度网盘链接在logs文件夹中 2.将所有原图 放置在“/dataset_1/JPEGImages/”文件夹

1 Jan 16, 2022
Distributed Arcface Training in Pytorch

Distributed Arcface Training in Pytorch

3 Nov 23, 2021
An implementation of the research paper "Retina Blood Vessel Segmentation Using A U-Net Based Convolutional Neural Network"

Retina Blood Vessels Segmentation This is an implementation of the research paper "Retina Blood Vessel Segmentation Using A U-Net Based Convolutional

Srijarko Roy 23 Aug 20, 2022
STBP is a way to train SNN with datasets by Backward propagation.

Spiking neural network (SNN), compared with depth neural network (DNN), has faster processing speed, lower energy consumption and more biological interpretability, which is expected to approach Stron

Ling Zhang 18 Dec 09, 2022
A production-ready, scalable Indexer for the Jina neural search framework, based on HNSW and PSQL

🌟 HNSW + PostgreSQL Indexer HNSWPostgreSQLIndexer Jina is a production-ready, scalable Indexer for the Jina neural search framework. It combines the

Jina AI 25 Oct 14, 2022
"Learning Free Gait Transition for Quadruped Robots vis Phase-Guided Controller"

PhaseGuidedControl The current version is developed based on the old version of RaiSim series, and possibly requires further modification. It will be

X-Mechanics 12 Oct 21, 2022
WiFi-based Multi-task Sensing

WiFi-based Multi-task Sensing Introduction WiFi-based sensing has aroused immense attention as numerous studies have made significant advances over re

zhangx289 6 Nov 24, 2022
GNN-based Recommendation Benchmark

GRecX A Fair Benchmark for GNN-based Recommendation Homepage and Documentation Homepage: Documentation: Paper: GRecX: An Efficient and Unified Benchma

73 Oct 17, 2022
A modification of Daniel Russell's notebook merged with Katherine Crowson's hq-skip-net changes

Edits made to this repo by Katherine Crowson I have added several features to this repository for use in creating higher quality generative art (featu

Paul Fishwick 10 May 07, 2022
GPU-Accelerated Deep Learning Library in Python

Hebel GPU-Accelerated Deep Learning Library in Python Hebel is a library for deep learning with neural networks in Python using GPU acceleration with

Hannes Bretschneider 1.2k Dec 21, 2022
Edge-oriented Convolution Block for Real-time Super Resolution on Mobile Devices, ACM Multimedia 2021

Codes for ECBSR Edge-oriented Convolution Block for Real-time Super Resolution on Mobile Devices Xindong Zhang, Hui Zeng, Lei Zhang ACM Multimedia 202

xindong zhang 236 Dec 26, 2022
This repository includes the official project for the paper: TransMix: Attend to Mix for Vision Transformers.

TransMix: Attend to Mix for Vision Transformers This repository includes the official project for the paper: TransMix: Attend to Mix for Vision Transf

Jie-Neng Chen 130 Jan 01, 2023