MARS: Learning Modality-Agnostic Representation for Scalable Cross-media Retrieva

Last update: Aug 24, 2022

Related tags

Deep Learning MARS_TCSVT2021

Overview

Introduction

This is the source code of our TCSVT 2021 paper "MARS: Learning Modality-Agnostic Representation for Scalable Cross-media Retrieval". Please cite the following paper if you use our code.

Yunbo Wang and Yuxin Peng, "MARS: Learning Modality-Agnostic Representation for Scalable Cross-media Retrieval", IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2021.

Preparation

We use Python 3.7.2, PyTorch 1.1.0, cuda 9.0, and evaluate on Ubuntu 16.04.12

Install anaconda downloaded from https://repo.anaconda.com/archive. And create a new environment sh Anaconda3-2018.12-Linux-x86_64.sh conda create -n MARS python=3.7.2 conda activate MARS
Run the followed commands conda install pytorch==1.1.0 torchvision==0.3.0 cudatoolkit=9.0 -c pytorch pip install -r requirements.txt

Training and evaluation

We use the Wikipedia dataset as example, and the data is placed in ./datasets/Wiki. In addition, the XMedia&XMediaNet datasets are obtiand via http://59.108.48.34/tiki/XMediaNet/. The NUS-WIDE dataset is obtained via https://lms.comp.nus.edu.sg/wp-content/uploads/2019/research/nuswide/NUS-WIDE.html.

Run the followed command for traning&evaluation, and the configure can be found in main_MARS.py. python main_MARS.py --datasets wiki --output_shape 128 --batch_size 64 --epochs 50 --lr [1e-4, 5e-4] # for Wikipedia

The common representations can be found in folder "features".

For any questions, fell free to contact us. ([email protected])

Welcome to our Laboratory Homepage for more information.

MARS: Learning Modality-Agnostic Representation for Scalable Cross-media Retrieva

Related tags

Overview

Introduction

Preparation

Training and evaluation

Owner

Code and data for the paper "Hearing What You Cannot See"

Python Wrapper for Embree

CAPITAL: Optimal Subgroup Identification via Constrained Policy Tree Search

Torch implementation of SegNet and deconvolutional network

SSPNet: Scale Selection Pyramid Network for Tiny Person Detection from UAV Images.

MIM: MIM Installs OpenMMLab Packages

COPA-SSE contains crowdsourced explanations for the Balanced COPA dataset

Code for the CVPR 2021 paper "Triple-cooperative Video Shadow Detection"

Cave Generation using metaballs in Blender. Originally created by sdfgeoff, Edited by Myself (Archie Jaskowicz).

CLIP: Connecting Text and Image (Learning Transferable Visual Models From Natural Language Supervision)

PyTorch implementation of our ICCV paper DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection.

An Approach to Explore Logistic Regression Models

This repo contains the code required to train the multivariate time-series Transformer.

Styled text-to-drawing synthesis method. Featured at the 2021 NeurIPS Workshop on Machine Learning for Creativity and Design

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code

The official repo of the CVPR 2021 paper Group Collaborative Learning for Co-Salient Object Detection .

Autoencoder - Reducing the Dimensionality of Data with Neural Network

Classify bird species based on their songs using SIamese Networks and 1D dilated convolutions.

Colab notebook and additional materials for Python-driven analysis of redlining data in Philadelphia

Patches desktop steam to look like the new steamdeck ui.