The Text2Bricks project aims to create buildable LEGO sets in the 3D LEGO representation format (LDRAW) based on natural language input. This involves using a diffusion model to generate a 3D shape from the input and reinforcement learning (RL) to construct the LEGO set based on the generated shape.
- Natural Language Input: User provides a description of the desired LEGO model.
- Diffusion Model: Converts the input into a 3D shape.
- Text2Brick Reinforcement Learning Model: Builds the LEGO model by:
- Utilizing a gym environment.
- Leveraging a combination of CNN and GNN for processing.
- Representing the LEGO world as a graph.
- Output: A buildable LEGO set in LDRAW format.
Pipeline:
Natural Language Input → Diffusion Model (3D Shape) → Text2Brick RL Model (Gym + CNN + GNN) → Buildable LEGO Set (LDRAW Format)
Rebuild MNIST digits in LEGO LDRAW format. This simplified approach focuses on 2D reconstruction (ignoring the z-dimension) to reduce complexity in the initial stages of the project.
- Target Image: MNIST digit to rebuild.
- Current Build LEGO Shape: Converted to grayscale image at each epoch.
- Reward Function:
- Reward = α * brick_validity + β * IoU
- IoU: Intersection-over-Union between the target image and the current LEGO shape.
- brick_validity: Boolean indicating whether the brick placement is legal (e.g., no flying bricks).
- Reward = α * brick_validity + β * IoU
- Model Type: TBD (Possibly Q-Learning).
- Components:
- CNN: Processes the target and current build images (using a backbone from a pretrained model).
- GNN: Processes the graph representation of the LEGO world.
- Fusion and Attention Layer: TBD – should we include this?
- Output: Predicts the next LEGO node in the graph (Brick Class) or its x, y coordinates (to be determined).
LEGO sets are generated in LDRAW format. For details, see the official specification:
LDRAW File Format Documentation
-
Brick by Brick:
NeurIPS 2021 Paper -
Learning to Build by Building Your Own Instructions:
arXiv 2410.01111