Force Prompting: Video Generation Models Can
Learn and Generalize Physics-based Control Signals

The official PyTorch implementation of the paper "Force Prompting: Video Generation Models Can
Learn and Generalize Physics-based Control Signals". Please visit our webpage for more details.

First time setup instructions

Create conda environment

This has been tested on: Driver Version: 535.129.03 CUDA Version: 12.2.

Create conda environment:

CONDA_ENV_DIR=${PWD}/conda-env
conda create -p $CONDA_ENV_DIR python=3.11
conda activate $CONDA_ENV_DIR

# install torch
pip install --prefix=$CONDA_ENV_DIR torch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 --index-url https://download.pytorch.org/whl/cu121
# it's a good idea to check that the torch installation was successful
python -c 'import torch; print(torch.cuda.is_available()); a = torch.zeros(5); a = a.to("cuda:0"); print(a)'

# install all the other requirements
pip install -r requirements.txt --prefix=$CONDA_ENV_DIR

Running inference using our checkpoints

(Optional) preprocess your own data

If you want to run inference on either the point force model or the wind force model, and you want to do it on your own images, then we recommend using the flask app that we built for data preprocessing. This app provides a unified UI which takes care of details like generating a CSV with the relevant contents, taking a screenshot of the image into the correct resolution and aspect ratio, selecting force magnitude and direction, and putting things into appropriate folders, and upscaling the prompt. And if you're using the point force model, allowing you to select the pixel coordinates to poke.

In order to use the prompt upscaling part of our Flask app, you will need an OpenAI API key; we recommend creating a .env file and adding the line OPENAI_API_KEY=<your_key>.

More details in case you're curious:

As our models are built on top of CogVideoX, the ideal input image resolution is 720x480.
Additionally, you must specify a detailed text prompt during generation (this is due to us using CogVideoX as our base model; for more details, check out their paper). For example, prompts like "the flower moves" don't work as well as detailed prompts such as "A lone dandelion stands tall against the backdrop of a vibrant sunset, its delicate seeds illuminated by the warm glow. The dandelion sways gracefully back and forth, its fluffy seeds trembling slightly with each movement. The sky transitions from deep blue to a fiery orange, casting a serene and magical atmosphere over the scene. The surrounding grass whispers softly, adding to the tranquil symphony of nature as the day slowly fades into night." Note that you'll need an OpenAI API key for prompt upscaling, unless you want to write your own very detailed prompts. Technically the model will be able to run without this prompt upscaling step, but the results are likely to be worse because it would be out of domain for CogVideoX.
If you want to skip this prompt upscaling step because you don't have an API key, your options are to 1) use a ChatGPT/Claude/etc web app to upscale for free, using the prompt in scripts/test_dataset_preprocessing/point_force/app_dataset_preprocessing.py; or 2) type your prompt directly into the "upscaled prompt" box in the Flask app.
For the point force model, the pixel coordinates are expected to be values between $720$ (horizontally) and $480$ (vertically). Our convention is that the lower left pixel value is $(0,0)$ and the upper right pixel value is $(719, 479)$.
For both models, the force magnitude is normalized to between $[0,1]$, and the force angle accepts degree values in the interval $[0,360)$, with $0$ indicating a force to the right, $90$ indicating upwards, etc.

Tip: If you're running this on a server using VSCode, then port forwarding will happen automatically and the flask app will work as intended. However, you can avoid latency issues by running locally—if you’re preprocessing a lot of data you may find the latency burdensome.

Tip: The force prompting models tend to do well at modeling physical phenomena that the base CogVideoX model can already do well at (e.g. swaying plants) and tends to do worse on things CogVideoX doesn't do so well at (e.g. collisions). If you find that the force prompting model doesn't do well on a given example, you should consider training a new Force Prompting model on a video generative model with a stronger physics prior and let us know how it goes :)

Dataset preprocessing flask app, point force model

The following flask app will output csvs to datasets/point-force/test/custom/*.csv and their corresponding images to datasets/point-force/test/custom/images/*.png. To run inferece on this csv, just use this path for the generated CSV in the inference script below.

Tip: Make sure there are no spaces in the file name for the image that you upload to the Flask app—only letters, numbers, dashes, or underscores.

python scripts/test_dataset_preprocessing/point_force/app_dataset_preprocessing.py

Dataset preprocessing flask app, wind force model

The following flask app will output csvs to datasets/wind-force/test/custom/*.csv and their corresponding images to datasets/wind-force/test/custom/images/*.png. To run inferece on this csv, just use this path for the generated CSV in the inference script below.

python scripts/test_dataset_preprocessing/wind_force/app_dataset_preprocessing.py

Download model checkpoints

If you want to run inference using either of the pretrained models, then running the following script will download both checkpoints.

python scripts/download_files/download_checkpoints.py

If download was successful, the checkpoints should be organized like this:

checkpoints/
├── step-5000-checkpoint-point-force.pt
└── step-5000-checkpoint-wind-force.pt

Point force model: inference

Running the following script will generate videos using your chosen checkpoint and image/text/force prompt. This script will output videos into the same directory as the input checkpoint. For example, if you use the checkpoint checkpoints/step-5000-checkpoint-point-force.pt, then the videos will be output into the directory checkpoints/step-5000-checkpoint-point-force/.

# this is our pretrained model; you can change to your own path
CHECKPOINT="checkpoints/step-5000-checkpoint-point-force.pt"

# you can change this to the list of csvs you want to run inference on.
IMAGE_CSVS=(
  "datasets/point-force/test/benchmark/ornament/_ornament1_obj1_prompt1.csv"
  "datasets/point-force/test/benchmark/ornament/_ornament1_obj1_prompt2.csv"
)

for image_csv in "${IMAGE_CSVS[@]}"; do
  bash scripts/inference_1_gpu.sh \
      --force_type "point_force" \
      --model_type "controlnet_with_force_control_signal" \
      --num_validation_videos 1 \
      --csv_path_val "${image_csv}" \
      --pretrained_controlnet_path "${CHECKPOINT}"
done

If you want to run inference on some preprocessed data, you can find the IMAGE_CSVS inside the directory datasets/point-force/test/benchmark/. This directory contains our benchmark test dataset, plus additional images and prompt configurations. The list of configurations for just our benchmark dataset can be found at datasets/poke-force/test/benchmark/benchmark_details.csv.

Wind force model: inference

Running the following script will generate videos using your chosen checkpoint and image/text/force prompt. This script will output videos into the same directory as the input checkpoint. For example, if you use the checkpoint checkpoints/step-5000-checkpoint-wind-force.pt, then the videos will be output into the directory checkpoints/step-5000-checkpoint-wind-force/.

# this is our pretrained model; you can change to your own path
CHECKPOINT="checkpoints/step-5000-checkpoint-wind-force.pt"

# you can change this to the list of csvs you want to run inference on.
IMAGE_CSVS=(
  "datasets/wind-force/test/benchmark/bubbles/_bubbles1_prompt1.csv"
)

for image_csv in "${IMAGE_CSVS[@]}"; do
  bash scripts/inference_1_gpu.sh \
      --force_type "wind_force" \
      --model_type "controlnet_with_force_control_signal" \
      --num_validation_videos 1 \
      --csv_path_val "${image_csv}" \
      --pretrained_controlnet_path "${CHECKPOINT}"
done

If you want to run inference on some preprocessed data, you can find the IMAGE_CSVS inside the directory datasets/wind-force/test/benchmark/. This directory contains our benchmark test dataset, plus additional images and prompt configurations. The list of configurations for just our benchmark dataset can be found at datasets/wind-force/test/benchmark/benchmark_details.csv.

Training the force prompting model

Download training datasets

If you want to train either model from scratch, then the following script will download all of our training data.

python scripts/download_files/download_datasets.py

If the download was successful, then the datasets should be organized like this:

datasets/
├── point-force/
│   └── train/
│       ├── point_force_23000/
│       │   ├── background_aerial_beach_01_4k_angle_0.4076_force_17.8989_coordx_159_coordy_407_bowling.mp4
│       │   ├── background_aerial_beach_01_4k_angle_0.4889_force_25.3516_coordx_448_coordy_186_bowling.mp4
│       │   └── ...
│       └── point_force_23000.csv
└── wind-force/
    └── train/
        ├── wind_force_15359/
        │   ├── flag_sample_0.1_0.0_321.3_0.0_background_qwantani_dusk_2_4k.mp4
        │   ├── flag_sample_1.8_0.0_77.4_0.0_background_golden_gate_hills_4k.mp4
        │   └── ...
        └── wind_force_15359.csv

Point force model: training

Train from scratch

bash scripts/train_4_gpu.sh \
    --force_type "point_force" \
    --video_root_dir "datasets/point-force/train/point_force_23000" \
    --csv_path "datasets/point-force/train/point_force_23000.csv"

Resume training from checkpoint

# replace with your checkpoint path
RESUME_FROM_CHECKPOINT="output/point_force/2025-05-08_03-46-47/step-4500-checkpoint.pt" 

bash scripts/train_4_gpu.sh \
    --force_type "point_force" \
    --video_root_dir "datasets/point-force/train/point_force_23000" \
    --csv_path "datasets/point-force/train/point_force_23000.csv" \
    --pretrained_controlnet_path $RESUME_FROM_CHECKPOINT

Wind force model: training

Train from scratch

bash scripts/train_4_gpu.sh \
    --force_type "wind_force" \
    --video_root_dir "datasets/wind-force/train/wind_force_15359" \
    --csv_path "datasets/wind-force/train/wind_force_15359.csv"

Resume training from checkpoint

# replace with your checkpoint path
RESUME_FROM_CHECKPOINT="output/wind_force/2025-05-18_21-46-01/step-2000-checkpoint.pt" 

bash scripts/train_4_gpu.sh \
    --force_type "wind_force" \
    --video_root_dir "datasets/wind-force/train/wind_force_15359" \
    --csv_path "datasets/wind-force/train/wind_force_15359.csv" \
    --pretrained_controlnet_path $RESUME_FROM_CHECKPOINT

Generate training datasets

Download Blender assets

If you want to generate synthetic training data yourself for either task, then you'll need to run the following script, which will download all of the assets that Blender needs to generate a diverse dataset.

python scripts/download_files/download_blender_textures.py

If the download was successful, then the datasets should be organized like this:

.cache/
├── football_textures/
│   ├── 1/
│   ├── 2/
│   └── …  
├── ground_textures/
│   ├── aerial_beach_01_4k.blend/
│   ├── aerial_grass_rock_4k.blend/
│   └── …  
└── HDRIs/
    ├── acoustical_shell_4k.exr
    ├── air_museum_playground_4k.exr
    └── …

Point force model: generate Blender ball rolling videos, and Physdreamer plant swaying videos

These command line blender rendering scripts were tested on Blender 4.4.

Step 1: Generate ball rolling videos using Blender

This script renders video frames to pngs. Look inside the script before running it, you'll need the first two lines (blender software, and output path).

sh scripts/build_synthetic_datasets/poke_model_rolling_balls/rolling_balls_render.sh

You might want to consider launching many of those in parallel because they can take a while. And this script concatenates the pngs to mp4s.

RENDER_DIR=~/scratch/rolling_balls/pngs
python scripts/build_synthetic_datasets/poke_model_rolling_balls/rolling_balls_png_to_mp4.py $RENDER_DIR

Step 2: Generate plant swaying videos using PhysDreamer

We used the PhysDreamer repo to do this. Our main modifications to their codebase allowed us to generate data at scale. We don't plan to release this code, but if you need it for your work please open an issue and we'll consider cleaning it up and releasing it.

Step 3: Create the csv for the training data

We're assuming that we already have a directory of videos of soccer balls moving around, and another dir of videos of plants moving around. The goal is to preprocess both of them to create a joint dataset.

# this dir is already filled with mp4s
DIR_BALLS="/users/ngillman/scratch/rolling_balls/videos"
# this dir is already filled with mp4s and jsons
DIR_PLANTS="/oscar/data/superlab/users/nates_stuff/cogvideox-controlnet/data/2025-04-07-point-force-unified-model/videos-05-11-ablation-no-bowling-balls-temp-justflowers"
# we eventually want to create this
DIR_COMBINED="datasets/point-force/train/point_force_23000_05-09"

We'll make a CSV for each of our temp directories.

# make balls csv
python scripts/build_synthetic_datasets/poke_model_rolling_balls/generate_csv_for_plants_and_balls_from_dir.py \
    --file_dir ${DIR_BALLS} \
    --file_type video \
    --output_path ${DIR_COMBINED}_balls.csv \
    --backgrounds_json_path_soccer scripts/build_synthetic_datasets/poke_model_rolling_balls/backgrounds_soccer.json \
    --backgrounds_json_path_bowling scripts/build_synthetic_datasets/poke_model_rolling_balls/backgrounds_bowling.json \
    --take_subset_size 12000

# make plants csv
python scripts/build_synthetic_datasets/poke_model_rolling_balls/generate_csv_for_plants_and_balls_from_dir.py \
    --file_dir ${DIR_PLANTS} \
    --file_type video \
    --output_path ${DIR_COMBINED}_plants.csv \
    --take_subset_size 11000

The first script uses a backgrounds.json file which contains a unique text prompt for each HDRI background (ragardless of how many balls are in the scene, and what designs they are. We used different prompts for soccer balls and bowling balls however.) We generated this using the gpt-4o API for prompt upscaling using the last frame of each video, and prompt seeds as simple as "the ball moves".

Next, we combine the two csvs into one csv.

EXP_DIR=2025-04-07-point-force-unified-model
python scripts/build_synthetic_datasets/poke_model_rolling_balls/concatenate_csvs.py \
    --input_path_csv1 ${DIR_COMBINED}_balls.csv \
    --input_path_csv2 ${DIR_COMBINED}_plants.csv \
    --output_path_csv ${DIR_COMBINED}.csv

And finally copy all the mp4s into the combined directory.

mkdir -p ${DIR_COMBINED}
cp ${DIR_BALLS}/*.mp4 ${DIR_COMBINED}
cp ${DIR_PLANTS}/*.mp4 ${DIR_COMBINED}

Wind force model: generate Blender flag waving videos

Step 1: Generate flag waving videos using Blender

This script renders video frames to pngs. Look inside the script before running it, you'll need the first two lines (blender software, and output path).

sh scripts/build_synthetic_datasets/wind_model_waving_flags/waving_flags_render.sh

You might want to consider launching many of those in parallel because they can take a while. And this script concatenates the pngs to mp4s.

RENDER_DIR=~/scratch/waving_flags/pngs
python scripts/build_synthetic_datasets/wind_model_waving_flags/waving_flags_png_to_mp4.py $RENDER_DIR

Step 2: Create the csv for the training data

RENDER_DIR=~/scratch/waving_flags
python scripts/build_synthetic_datasets/wind_model_waving_flags/generate_csv_from_dir.py \
    --file_dir ${RENDER_DIR}/videos \
    --file_type video \
    --output_path ${RENDER_DIR}/waving-flags.csv \
    --backgrounds_json_path scripts/build_synthetic_datasets/wind_model_waving_flags/backgrounds.json \
    --subset_size 10000

This script uses a backgrounds.json file which contains a unique text prompt for each HDRI background (ragardless of how many flags are in the scene, and what colors they are). We generated this using the gpt-4o API for prompt upscaling using the last frame of each video, and prompt seeds as simple as "the flag waves in the wind".

Acknowledgments

We thank the authors of the works we build upon:

Bibtex

If you find this code useful in your research, please cite:

@misc{gillman2025forcepromptingvideogeneration,
      title={Force Prompting: Video Generation Models Can Learn and Generalize Physics-based Control Signals}, 
      author={Nate Gillman and Charles Herrmann and Michael Freeman and Daksh Aggarwal and Evan Luo and Deqing Sun and Chen Sun},
      year={2025},
      eprint={2505.19386},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.19386}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
datasets		datasets
scripts		scripts
src/force-prompting		src/force-prompting
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Force Prompting: Video Generation Models Can
Learn and Generalize Physics-based Control Signals

First time setup instructions

Running inference using our checkpoints

Dataset preprocessing flask app, point force model

Dataset preprocessing flask app, wind force model

Training the force prompting model

Train from scratch

Resume training from checkpoint

Train from scratch

Resume training from checkpoint

Generate training datasets

Step 1: Generate ball rolling videos using Blender

Step 2: Generate plant swaying videos using PhysDreamer

Step 3: Create the csv for the training data

Step 1: Generate flag waving videos using Blender

Step 2: Create the csv for the training data

Acknowledgments

Bibtex

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

brown-palm/force-prompting

Folders and files

Latest commit

History

Repository files navigation

Force Prompting: Video Generation Models CanLearn and Generalize Physics-based Control Signals

First time setup instructions

Running inference using our checkpoints

Dataset preprocessing flask app, point force model

Dataset preprocessing flask app, wind force model

Training the force prompting model

Train from scratch

Resume training from checkpoint

Train from scratch

Resume training from checkpoint

Generate training datasets

Step 1: Generate ball rolling videos using Blender

Step 2: Generate plant swaying videos using PhysDreamer

Step 3: Create the csv for the training data

Step 1: Generate flag waving videos using Blender

Step 2: Create the csv for the training data

Acknowledgments

Bibtex

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Force Prompting: Video Generation Models Can
Learn and Generalize Physics-based Control Signals

Packages