Skip to content

HorizonRobotics/EmbodiedGen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

39 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

EmbodiedGen: Towards a Generative 3D World Engine for Embodied Intelligence

๐ŸŒ Project Page ๐Ÿ“„ arXiv ๐ŸŽฅ Video ๐Ÿค— Hugging Face ๐Ÿค— Hugging Face ๐Ÿค— Hugging Face ไธญๆ–‡ไป‹็ป

๐Ÿค— Hugging Face

EmbodiedGen is a generative engine to create diverse and interactive 3D worlds composed of high-quality 3D assets(mesh & 3DGS) with plausible physics, leveraging generative AI to address the challenges of generalization in embodied intelligence related research. It composed of six key modules: Image-to-3D, Text-to-3D, Texture Generation, Articulated Object Generation, Scene Generation and Layout Generation.

Overall Framework


โœจ Table of Contents of EmbodiedGen

๐Ÿš€ Quick Start

โœ… Setup Environment

git clone https://github.com/HorizonRobotics/EmbodiedGen.git
cd EmbodiedGen
git checkout v0.1.5
git submodule update --init --recursive --progress
conda create -n embodiedgen python=3.10.13 -y # recommended to use a new env.
conda activate embodiedgen
bash install.sh basic

โœ… Starting from Docker

We provide a pre-built Docker image on Docker Hub with a configured environment for your convenience. For more details, please refer to Docker documentation.

Note: Model checkpoints are not included in the image, they will be automatically downloaded on first run. You still need to set up the GPT Agent manually.

IMAGE=wangxinjie/embodiedgen:env_v0.1.x
CONTAINER=EmbodiedGen-docker-${USER}
docker pull ${IMAGE}
docker run -itd --shm-size="64g" --gpus all --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --privileged --net=host --name ${CONTAINER} ${IMAGE}
docker exec -it ${CONTAINER} bash

โœ… Setup GPT Agent

Update the API key in file: embodied_gen/utils/gpt_config.yaml.

You can choose between two backends for the GPT agent:

  • gpt-4o (Recommended) โ€“ Use this if you have access to Azure OpenAI.
  • qwen2.5-vl โ€“ An alternative with free usage via OpenRouter, apply a free key here and update api_key in embodied_gen/utils/gpt_config.yaml (50 free requests per day)

๐Ÿ“ธ Directly use EmbodiedGen All-Simulators-Ready Assets

Explore EmbodiedGen generated assets in ๐Ÿค— Hugging Face that are ready for simulation across any simulators (SAPIEN, Isaac Sim, MuJoCo, PyBullet, Genesis, Isaac Gym etc.). Details in chapter any-simulators.


๐Ÿ–ผ๏ธ Image-to-3D

๐Ÿค— Hugging Face Generate physically plausible 3D asset URDF from single input image, offering high-quality support for digital twin systems. (HF space is a simplified demonstration. For the full functionality, please refer to img3d-cli.)

Image to 3D

โ˜๏ธ Service

Run the image-to-3D generation service locally. Models downloaded automatically on first run, please be patient.

# Run in foreground
python apps/image_to_3d.py
# Or run in the background
CUDA_VISIBLE_DEVICES=0 nohup python apps/image_to_3d.py > /dev/null 2>&1 &

โšก API

Generate physically plausible 3D assets from image input via the command-line API.

img3d-cli --image_path apps/assets/example_image/sample_04.jpg apps/assets/example_image/sample_19.jpg \
    --n_retry 2 --output_root outputs/imageto3d

# See result(.urdf/mesh.obj/mesh.glb/gs.ply) in ${output_root}/sample_xx/result

๐Ÿ“ Text-to-3D

๐Ÿค— Hugging Face Create 3D assets from text descriptions for a wide range of geometry and styles. (HF space is a simplified demonstration. For the full functionality, please refer to text3d-cli.)

Text to 3D

โ˜๏ธ Service

Deploy the text-to-3D generation service locally.

Text-to-image model based on the Kolors model, supporting Chinese and English prompts. Models downloaded automatically on first run, please be patient.

python apps/text_to_3d.py

โšก API

Text-to-image model based on SD3.5 Medium, English prompts only. Usage requires agreement to the model license(click accept), models downloaded automatically.

For large-scale 3D assets generation, set --n_pipe_retry=2 to ensure high end-to-end 3D asset usability through automatic quality check and retries. For more diverse results, do not set --seed_img.

text3d-cli --prompts "small bronze figurine of a lion" "A globe with wooden base" "wooden table with embroidery" \
    --n_image_retry 2 --n_asset_retry 2 --n_pipe_retry 1 --seed_img 0 \
    --output_root outputs/textto3d

Text-to-image model based on the Kolors model.

bash embodied_gen/scripts/textto3d.sh \
    --prompts "small bronze figurine of a lion" "A globe with wooden base and latitude and longitude lines" "ๆฉ™่‰ฒ็”ตๅŠจๆ‰‹้’ป๏ผŒๆœ‰็ฃจๆŸ็ป†่Š‚" \
    --output_root outputs/textto3d_k

ps: models with more permissive licenses found in embodied_gen/models/image_comm_model.py


๐ŸŽจ Texture Generation

๐Ÿค— Hugging Face Generate visually rich textures for 3D mesh.

Texture Gen

โ˜๏ธ Service

Run the texture generation service locally. Models downloaded automatically on first run, see download_kolors_weights, geo_cond_mv.

python apps/texture_edit.py

โšก API

Support Chinese and English prompts.

texture-cli --mesh_path "apps/assets/example_texture/meshes/robot_text.obj" \
"apps/assets/example_texture/meshes/horse.obj" \
--prompt "ไธพ็€็‰Œๅญ็š„ๅ†™ๅฎž้ฃŽๆ ผๆœบๅ™จไบบ๏ผŒๅคง็œผ็›๏ผŒ็‰ŒๅญไธŠๅ†™็€โ€œHelloโ€็š„ๆ–‡ๅญ—" \
"A gray horse head with flying mane and brown eyes" \
--output_root "outputs/texture_gen" \
--seed 0

๐ŸŒ 3D Scene Generation

scene3d

โšก API

Run bash install.sh extra to install additional requirements if you need to use scene3d-cli.

It takes ~30mins to generate a color mesh and 3DGS per scene.

CUDA_VISIBLE_DEVICES=0 scene3d-cli \
--prompts "Art studio with easel and canvas" \
--output_dir outputs/bg_scenes/ \
--seed 0 \
--gs3d.max_steps 4000 \
--disable_pano_check

โš™๏ธ Articulated Object Generation

๐Ÿšง Coming Soon

articulate


๐Ÿž๏ธ Layout(Interactive 3D Worlds) Generation

๐Ÿ’ฌ Generate Layout from task description

layout1 layout2
layout3 layout4

Text-to-image model based on SD3.5 Medium, usage requires agreement to the model license. All models auto-downloaded at the first run.

You can generate any desired room as background using scene3d-cli. As each scene takes approximately 30 minutes to generate, we recommend pre-generating them for efficiency and adding them to outputs/bg_scenes/scene_list.txt.

We provided some sample background assets created with scene3d-cli. Download them(~4G) using hf download xinjjj/scene3d-bg --repo-type dataset --local-dir outputs.

Generating one interactive 3D scene from task description with layout-cli takes approximately 30 minutes.

layout-cli --task_descs "Place the pen in the mug on the desk" "Put the fruit on the table on the plate" \
--bg_list "outputs/bg_scenes/scene_list.txt" --output_root "outputs/layouts_gen" --insert_robot
Iscene_demo1 Iscene_demo2

Run multiple tasks defined in task_list.txt in the backend. Remove --insert_robot if you don't consider the robot pose in layout generation.

CUDA_VISIBLE_DEVICES=0 nohup layout-cli \
--task_descs "apps/assets/example_layout/task_list.txt" \
--bg_list "outputs/bg_scenes/scene_list.txt" \
--output_root "outputs/layouts_gens" --insert_robot > layouts_gens.log &

Using compose_layout.py, you can recompose the layout of the generated interactive 3D scenes.

python embodied_gen/scripts/compose_layout.py \
--layout_path "outputs/layouts_gens/task_0000/layout.json" \
--output_dir "outputs/layouts_gens/task_0000/recompose" --insert_robot

We provide sim-cli, that allows users to easily load generated layouts into an interactive 3D simulation using the SAPIEN engine (will support for more simulators in future updates).

sim-cli --layout_path "outputs/layouts_gen/task_0000/layout.json" \
--output_dir "outputs/layouts_gen/task_0000/sapien_render" --insert_robot

Example: generate multiple parallel simulation envs with gym.make and record sensor and trajectory data.

parallel_sim1 parallel_sim2
python embodied_gen/scripts/parallel_sim.py \
--layout_file "outputs/layouts_gen/task_0000/layout.json" \
--output_dir "outputs/parallel_sim/task_0000" \
--num_envs 16

๐Ÿ–ผ๏ธ Real-to-Sim Digital Twin

real2sim_mujoco


๐ŸŽฎ Any Simulators

Use EmbodiedGen-generated assets with correct physical collisions and consistent visual effects in any simulator (isaacsim, mujoco, genesis, pybullet, isaacgym, sapien). Example in tests/test_examples/test_asset_converter.py.

Simulator Conversion Class
isaacsim MeshtoUSDConverter
mujoco MeshtoMJCFConverter
genesis / sapien / isaacgym / pybullet EmbodiedGen generated .urdf can be used directly

simulators_collision


For Developer

pip install -e .[dev] && pre-commit install
python -m pytest # Pass all unit-test are required.

๐Ÿ“š Citation

If you use EmbodiedGen in your research or projects, please cite:

@misc{wang2025embodiedgengenerative3dworld,
      title={EmbodiedGen: Towards a Generative 3D World Engine for Embodied Intelligence},
      author={Xinjie Wang and Liu Liu and Yu Cao and Ruiqi Wu and Wenkang Qin and Dehui Wang and Wei Sui and Zhizhong Su},
      year={2025},
      eprint={2506.10600},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2506.10600},
}

๐Ÿ™Œ Acknowledgement

EmbodiedGen builds upon the following amazing projects and models: ๐ŸŒŸ Trellis | ๐ŸŒŸ Hunyuan-Delight | ๐ŸŒŸ Segment Anything | ๐ŸŒŸ Rembg | ๐ŸŒŸ RMBG-1.4 | ๐ŸŒŸ Stable Diffusion x4 | ๐ŸŒŸ Real-ESRGAN | ๐ŸŒŸ Kolors | ๐ŸŒŸ ChatGLM3 | ๐ŸŒŸ Aesthetic Score | ๐ŸŒŸ Pano2Room | ๐ŸŒŸ Diffusion360 | ๐ŸŒŸ Kaolin | ๐ŸŒŸ diffusers | ๐ŸŒŸ gsplat | ๐ŸŒŸ QWEN-2.5VL | ๐ŸŒŸ GPT4o | ๐ŸŒŸ SD3.5 | ๐ŸŒŸ ManiSkill


โš–๏ธ License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

About

Towards a Generative 3D World Engine for Embodied Intelligence

Resources

License

Stars

Watchers

Forks