Code release for paper: The Boombox: Visual Reconstruction from Acoustic Vibrations

Last update: Nov 30, 2022

Related tags

Deep Learning boombox

Overview

The Boombox: Visual Reconstruction from Acoustic Vibrations

Boyuan Chen, Mia Chiquier, Hod Lipson, Carl Vondrick
Columbia University

Project Website | Video | Paper

Overview

This repo contains the PyTorch implementation for paper "The Boombox: Visual Reconstruction from Acoustic Vibrations".

Installation

Our code has been tested on Ubuntu 18.04 with CUDA 11.0. Create a python virtual environment and install the dependencies.

virtualenv -p /usr/bin/python3.6 env-boombox
source env-boombox/bin/activate
cd boombox
pip install -r requirements.txt

Data Preparation

Run the following commands to download the dataset (2.0G).

cd boombox
wget https://boombox.cs.columbia.edu/dataset/data.zip
unzip data.zip
rm -rf data.zip

After this step, you should see a folder named as data, and video and audio data are in cube, small_cuboid and large_cuboid subfolders.

About Configs and Logs

Before training and evaluation, we first introduce the configuration and logging structure.

Configs: all the specific parameters used for training and evaluation are indicated as individual config file. Overall, we have two training paradigms: single-shape and multiple-shape.

For single-shape, we train and evaluate on each shape separately. Their config files are named with their own shape: cube, large_cuboid and small_cuboid. For multiple-shape, we mix all the shapes together and perform training and evaluation while the shape is not known a priori. The config file folder is all.

Within each config folder, we have config file for depth prediction and image prediction. The last digit in each folder refers to the random seed. For example, if you want to train our model with all the shapes mixed to output a RGB image with random seed 3, you should refer the parameters in:
```
configs/all/2d_out_img_3
```

Logs: both the training and evaluation results will be saved in the log folder for each experiment. The last digit in the logs folder indicates the random seed. Inside the logs folder, the structure and contents are:

\logs_True_False_False_image_conv2d-encoder-decoder_True_{output_representation}_{seed}
    \lightning_logs
        \checkpoints               [saved checkpoint]
        \version_0                 [training stats]
        \version_1                 [testing stats]
    \pred_visualizations           [predicted and ground-truth images]

Training

Both training and evaluation are fast. We provide an example bash script for running our experiments in run_audio.sh. Specifically, to train our model on all shapes that outputs RGB image representations with random seed 1 and GPU 0, run the following command:

CUDA_VISIBLE_DEVICES=0 python main.py ./configs/all/2d_out_img_1/config.yaml;

Evaluation

Again, we provide an example bash script for running our experiments in run_audio.sh. Following the above example, to evaluate the trained model, run the following command:

CUDA_VISIBLE_DEVICES=0 python eval.py ./configs/all/2d_out_img_1/config.yaml ./logs_True_False_False_image_conv2d-encoder-decoder_True_pixel_1/lightning_logs/checkpoints;

License

This repository is released under the MIT license. See LICENSE for additional details.

Code release for paper: The Boombox: Visual Reconstruction from Acoustic Vibrations

Related tags

Overview

The Boombox: Visual Reconstruction from Acoustic Vibrations

Project Website | Video | Paper

Overview

Content

Installation

Data Preparation

About Configs and Logs

Training

Evaluation

License

Owner

Boyuan Chen

Awesome Graph Classification - A collection of important graph embedding, classification and representation learning papers with implementations.

MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens

Colar: Effective and Efficient Online Action Detection by Consulting Exemplars, CVPR 2022.

Deep learning based hand gesture recognition using LSTM and MediaPipie.

Developed an optimized algorithm which finds the most optimal path between 2 points in a 3D Maze using various AI search techniques like BFS, DFS, UCS, Greedy BFS and A*

Demos of essentia classifiers hosted on replicate.ai

Notebooks, slides and dataset of the CorrelAid Machine Learning Winter School

Deep Learning Interviews book: Hundreds of fully solved job interview questions from a wide range of key topics in AI.

Code for "OctField: Hierarchical Implicit Functions for 3D Modeling (NeurIPS 2021)"

Code for generating a single image pretraining dataset

The DL Streamer Pipeline Zoo is a catalog of optimized media and media analytics pipelines.

Distributed Evolutionary Algorithms in Python

The PyTorch implementation of Directed Graph Contrastive Learning (DiGCL), NeurIPS-2021

[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers

Code for the paper SphereRPN: Learning Spheres for High-Quality Region Proposals on 3D Point Clouds Object Detection, ICIP 2021.

Continuous Query Decomposition for Complex Query Answering in Incomplete Knowledge Graphs

A Python package to process & model ChEMBL data.

OntoProtein: Protein Pretraining With Ontology Embedding

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped

Text-to-SQL in the Wild: A Naturally-Occurring Dataset Based on Stack Exchange Data