2024.12.18 - #20 - MASt3R-SLAM, DiTER++, CAT4D, pixelSplat, LiveScene, Talking to DINO, MV-DUSt3R, Diorama, MegaSaM, NaVILA

# Interesting papers

## CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models

- https://arxiv.org/pdf/2411.18613
- https://cat-4d.github.io/
- 4D scene generation, multi-view video diffusion model, deformable 3D Gaussian

![Screen Recording 2024-12-18 at 8 43 02 PM](https://github.com/user-attachments/assets/b14d8d99-08b1-471b-903a-93d21f294552)

## pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction

- https://openaccess.thecvf.com//content/CVPR2024/papers/Charatan_pixelSplat_3D_Gaussian_Splats_from_Image_Pairs_for_Scalable_Generalizable_CVPR_2024_paper.pdf
- https://davidcharatan.com/pixelsplat/

![Screen Recording 2024-12-18 at 8 45 58 PM](https://github.com/user-attachments/assets/6e641c52-f03b-41d2-83fe-19ae6febbf55)

## LiveScene: Language Embedding Interactive Radiance Fields for Physical Scene Rendering and Control

- https://openreview.net/pdf/db46ca38beed8e31670315500fdc6d0bf0bf5757.pdf

![Screen Recording 2024-12-18 at 8 47 48 PM](https://github.com/user-attachments/assets/19e17045-2f0c-478a-b2eb-d58ad66fe814)

## Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation

- https://arxiv.org/pdf/2411.19331
- https://lorebianchi98.github.io/Talk2DINO/

![image](https://github.com/user-attachments/assets/c136f97e-5a07-4cee-a951-962760c17e86)

## Navigation World Models

- https://www.amirbar.net/nwm/index.html

![teaser](https://github.com/user-attachments/assets/e670da38-3f4c-476a-b2b1-b814f36b0124)

## MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds

- https://arxiv.org/pdf/2412.06974
- https://mv-dust3rp.github.io/

<img width="729" alt="image" src="https://github.com/user-attachments/assets/36af61f1-9d32-4d4e-aa5a-e9cd0c22e419" />

## MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors

- https://edexheim.github.io/mast3r-slam/mast3r-slam.pdf
- https://edexheim.github.io/mast3r-slam/

![image](https://github.com/user-attachments/assets/fe2a5487-d98f-47a7-8eba-57a5d53ad257)

![office](https://github.com/user-attachments/assets/12dec543-1107-4513-8e59-20fbcbee0dd7)

## Diorama: Unleashing Zero-shot Single-view 3D Scene Modeling

- https://arxiv.org/pdf/2411.19492
- https://3dlg-hcvc.github.io/diorama/

<img width="672" alt="image" src="https://github.com/user-attachments/assets/598d316f-2862-4ba6-9999-b6bda1232527" />

![image](https://github.com/user-attachments/assets/a17bd8f7-d911-468f-96ee-8985ee00c815)

## MegaSaM: Accurate, Fast, and Robust Structure and Motion from Casual Dynamic Videos

- https://arxiv.org/pdf/2412.04463
- https://mega-sam.github.io/
- deep visual SLAM framework
- Check out website for better examples

![Teaser-Website-wide](https://github.com/user-attachments/assets/667647c8-081d-4564-85e4-a4ceff75715e)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

2024.12.18 - #20 - MASt3R-SLAM, DiTER++, CAT4D, pixelSplat, LiveScene, Talking to DINO, MV-DUSt3R, Diorama, MegaSaM, NaVILA #22

Interesting papers

CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models

pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction

LiveScene: Language Embedding Interactive Radiance Fields for Physical Scene Rendering and Control

Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation

Navigation World Models

MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds

MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors

Diorama: Unleashing Zero-shot Single-view 3D Scene Modeling

MegaSaM: Accurate, Fast, and Robust Structure and Motion from Casual Dynamic Videos

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

2024.12.18 - #20 - MASt3R-SLAM, DiTER++, CAT4D, pixelSplat, LiveScene, Talking to DINO, MV-DUSt3R, Diorama, MegaSaM, NaVILA #22

Description

Interesting papers

CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models

pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction

LiveScene: Language Embedding Interactive Radiance Fields for Physical Scene Rendering and Control

Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation

Navigation World Models

MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds

MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors

Diorama: Unleashing Zero-shot Single-view 3D Scene Modeling

MegaSaM: Accurate, Fast, and Robust Structure and Motion from Casual Dynamic Videos

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions