Dreamitate: Real-World Visuomotor Policy Learning via Video Generation

CoRL 2024

Project Page | Paper | ArXiv

Junbang Liang^*1, Ruoshi Liu^*1, Ege Ozguroglu¹, Sruthi Sudhakar¹, Achal Dave², Pavel Tokmakov², Shuran Song³, Carl Vondrick¹

¹Columbia University, ²Toyota Research Institute, ³Stanford University
*Equal Contribution

Usage

Gradio Demo Inference

conda create -n dreamitate python=3.10
conda activate dreamitate
cd dreamitate
pip install -r requirements.txt
cd video_model
pip install .
pip install -e git+https://github.com/Stability-AI/datapipelines.git@main#egg=sdata

Download image-conditioned stable video diffusion checkpoint released by Stability AI and move checkpoints under the video_model folder:

wget https://dreamitate.cs.columbia.edu/assets/models/checkpoints.zip

Download the finetuned rotation task checkpoint and move finetuned_models under the video_model folder:

wget https://dreamitate.cs.columbia.edu/assets/models/finetuned_models.zip

Run our Gradio demo to generate videos of object rotation by using experiment photos from the video_model/rotation_examples directory as model inputs:

CUDA_VISIBLE_DEVICES=0 PYTHONPATH=. python scripts/sampling/simple_video_sample_gradio.py

Alternatively, you can use online images of object against a black background as model inputs, which is less suitable but can work for this demonstration. Note that this app uses around 70 GB of VRAM, so it may not be possible to run it on any GPU.

Training Script

Download image-conditioned stable video diffusion checkpoint released by Stability AI and move checkpoints under the video_model folder:

wget https://dreamitate.cs.columbia.edu/assets/models/checkpoints.zip

Download the rotation task dataset and move dataset under the video_model folder:

wget https://dreamitate.cs.columbia.edu/assets/models/dataset.zip

Run training command:

PYTHONPATH=. CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --base=configs/basile_svd_finetune.yaml --name=ft1 --seed=24 --num_nodes=1 --wandb=0 lightning.trainer.devices="0,1,2,3"

Note that this training script is set for an 4-GPU system, each with 80GB of VRAM. Empirically a batch size of 4 is found to produce good results for training our model, but training with a batch size of 1 can work as well.

Tool Tracking

Download the pretrained models and move megapose-models under the megapose/examples folder:

wget https://dreamitate.cs.columbia.edu/assets/models/megapose-models.zip

Set environment variables:

cd dreamitate/megapose
export MEGAPOSE_DIR=$(pwd) && export MEGAPOSE_DATA_DIR=$(pwd)/examples && export megapose_directory_path=$(pwd)/src && export PYTHONPATH="$PYTHONPATH:$megapose_directory_path"

Run tracking on left end-effector:

CUDA_VISIBLE_DEVICES=0 python -m megapose.scripts.run_video_tracking_on_rotation_example_stereo_left --data_dir "experiments/rotation/demo_005"

Run tracking on right end-effector:

CUDA_VISIBLE_DEVICES=0 python -m megapose.scripts.run_video_tracking_on_rotation_example_stereo_right --data_dir "experiments/rotation/demo_005"

Acknowledgement

This repository is based on Stable Video Diffusion, Generative Camera Dolly, and MegaPose. We would like to thank the authors of these work for publicly releasing their code. We would like to thank Basile Van Hoorick and Kyle Sargent of Generative Camera Dolly for providing the video model training code and their helpful feedback.

We would like to thank Paarth Shah and Dian Chen for many helpful discussions. This research is based on work partially supported by the Toyota Research Institute and the NSF NRI Award #2132519.

Citation

@misc{liang2024dreamitate,
      title={Dreamitate: Real-World Visuomotor Policy Learning via Video Generation}, 
      author={Junbang Liang and Ruoshi Liu and Ege Ozguroglu and Sruthi Sudhakar and Achal Dave and Pavel Tokmakov and Shuran Song and Carl Vondrick},
      year={2024},
      eprint={2406.16862},
      archivePrefix={arXiv},
      primaryClass={id='cs.RO' full_name='Robotics' is_active=True alt_name=None in_archive='cs' is_general=False description='Roughly includes material in ACM Subject Class I.2.9.'}
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
megapose		megapose
video_model		video_model
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dreamitate: Real-World Visuomotor Policy Learning via Video Generation

CoRL 2024

Project Page | Paper | ArXiv

Usage

Gradio Demo Inference

Training Script

Tool Tracking

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

cvlab-columbia/dreamitate

Folders and files

Latest commit

History

Repository files navigation

Dreamitate: Real-World Visuomotor Policy Learning via Video Generation

CoRL 2024

Project Page | Paper | ArXiv

Usage

Gradio Demo Inference

Training Script

Tool Tracking

Acknowledgement

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages