2025.07.30 - #44 - Feed-forward 3D recon survey, ThinkAct, ROMAN, UA-MPC, TurboClique

# Interesting papers

## Advances in Feed-Forward 3D Reconstruction and View Synthesis: A Survey

- https://www.arxiv.org/abs/2507.14501
- NeRF, Pointmap, 3D Gaussian Splatting, Mesh / Occupancy / SDF, 3D representation-free 모델 기법들의 계보 설명

<img width="625" height="705" alt="Image" src="https://github.com/user-attachments/assets/c0743a40-278f-477b-8d04-b126b8737357" />

## ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning

- https://jasper0314-huang.github.io/thinkact-vla/
- 기존의 VLA 모델은 단순히 Input->output 맵핑 형태임
- ThinkAct는 Multi-modal reasoning LLM을 사용해서 trajectory/action에 대한 latent variable을 생성하는데, 이걸 downstream에 있는 action model에 넣으면 trajectory가 나온다.
- 여기서 Multi-modal reasoning LLM이 latent variable을 잘 생성해줘야 좋은 trajectory가 나올텐데, 이건 visual 데이터로 평가하는 goal reward + trajectory reward를 이용한 강화학습 (GRPO)를 이용해서 LLM을 학습한다.

![](https://jasper0314-huang.github.io/thinkact-vla/static/images/teaser.png)

<img width="1041" height="605" alt="Image" src="https://github.com/user-attachments/assets/ac4637bb-bf19-4ceb-bcbe-611f2da0f901" />

## ROMAN

- https://github.com/mit-acl/roman
- Open-set detection object들을 이용해서 localization 

![](https://github.com/mit-acl/roman/raw/main/media/opposite_view_loop_closure.jpg)

![](https://github.com/mit-acl/roman/raw/main/media/system_diagram.jpg)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

2025.07.30 - #44 - Feed-forward 3D recon survey, ThinkAct, ROMAN, UA-MPC, TurboClique #46

Interesting papers

Advances in Feed-Forward 3D Reconstruction and View Synthesis: A Survey

ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning

ROMAN

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

2025.07.30 - #44 - Feed-forward 3D recon survey, ThinkAct, ROMAN, UA-MPC, TurboClique #46

Description

Interesting papers

Advances in Feed-Forward 3D Reconstruction and View Synthesis: A Survey

ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning

ROMAN

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions