Upload GeometryForcing files
Browse files- .gitattributes +1 -0
- DFoT_16f_state_dict.ckpt +3 -0
- README.md +91 -3
- geometry_forcing_state_dict.ckpt +3 -0
- geometry_forcing_with_dino_state_dict.ckpt +3 -0
- main.png +3 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
main.png filter=lfs diff=lfs merge=lfs -text
|
DFoT_16f_state_dict.ckpt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:601e395b40948d05ce446bb65352ceceb7ca46de0954823d032b000a6815d09e
|
| 3 |
+
size 1835447729
|
README.md
CHANGED
|
@@ -1,3 +1,91 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<div align="center">
|
| 2 |
+
|
| 3 |
+
<h1>Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling </h1>
|
| 4 |
+
<a href="https://www.arxiv.org/abs/2507.07982">
|
| 5 |
+
<img src='https://img.shields.io/badge/arxiv-geometryforcing-darkred' alt='Paper PDF'></a>
|
| 6 |
+
<a href="https://geometryforcing.github.io/">
|
| 7 |
+
<img src='https://img.shields.io/badge/Project-Website-orange' alt='Project Page'></a>
|
| 8 |
+
|
| 9 |
+
|
| 10 |
+
[Haoyu Wu](https://cintellifusion.github.io/)$^{1*}$, Diankun Wu $^{2*}$, Tianyu He $^{1†}$, Junliang Guo $^{1}$, Yang Ye $^{1}$, Yueqi Duan $^{2}$, Jiang Bian $^{1}$
|
| 11 |
+
|
| 12 |
+
$^1$ Microsoft Research $^2$ Tsinghua University
|
| 13 |
+
|
| 14 |
+
($^*$ Equal Contribution. † Project Lead)
|
| 15 |
+
|
| 16 |
+
</div>
|
| 17 |
+
|
| 18 |
+
# Reference
|
| 19 |
+
|
| 20 |
+
```
|
| 21 |
+
@article{wu2025geometryforcing,
|
| 22 |
+
title={Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling},
|
| 23 |
+
author={Wu, Haoyu and Wu, Diankun and He, Tianyu and Guo, Junliang and Ye, Yang and Duan, Yueqi and Bian, Jiang},
|
| 24 |
+
journal={arXiv preprint arXiv:2507.07982},
|
| 25 |
+
year={2025}
|
| 26 |
+
}
|
| 27 |
+
```
|
| 28 |
+
|
| 29 |
+
<!-- include asset/main.png -->
|
| 30 |
+
|
| 31 |
+
# Overview
|
| 32 |
+

|
| 33 |
+
**Geometry Forcing (GF) Overview.**
|
| 34 |
+
(a) Our proposed GF paradigm enhances video diffusion models by aligning with geometric features from VGGT~\citep{wang2025vggt}.
|
| 35 |
+
(b) Compared to DFoT~\citep{dfot}, our method generates more temporally and geometrically consistent videos.
|
| 36 |
+
(c) While baseline features fail to reconstruct meaningful 3D geometry, GF-learned features enable accurate 3D reconstruction.
|
| 37 |
+
|
| 38 |
+
# 🚀News
|
| 39 |
+
|
| 40 |
+
- [2025/9/24] We release code and checkpoint.
|
| 41 |
+
- [2025/9/22] [Geometry Forcing](https://geometryforcing.github.io/) is accepted to [NeurIPS 2025 NextVid Workshop](https://what-makes-good-video.github.io/) as an Oral!
|
| 42 |
+
- [2025/7/10] We release the paper and the project.
|
| 43 |
+
|
| 44 |
+
# 💪Get Started
|
| 45 |
+
|
| 46 |
+
## Setup Environments
|
| 47 |
+
|
| 48 |
+
```shell
|
| 49 |
+
conda create -n geometryforcing python=3.10 -y
|
| 50 |
+
conda activate geometryforcing
|
| 51 |
+
pip install -r requirements.txt
|
| 52 |
+
```
|
| 53 |
+
|
| 54 |
+
## Connect to Weights & Biases:
|
| 55 |
+
|
| 56 |
+
We use Weights & Biases for logging. [Sign up](https://wandb.ai/login?signup=true) if you don't have an account, and *modify `wandb.entity` in `config.yaml` to your user/organization name*.
|
| 57 |
+
|
| 58 |
+
## Download Checkpoints and Data
|
| 59 |
+
1. Download pretrained checkpiont using huggingface:
|
| 60 |
+
```shell
|
| 61 |
+
bash scripts/hf_download_checkpoints.sh
|
| 62 |
+
```
|
| 63 |
+
|
| 64 |
+
|
| 65 |
+
2. Download pretrained checkpiont using modelscope:
|
| 66 |
+
|
| 67 |
+
```shell
|
| 68 |
+
bash scripts/ms_download_checkpoints.sh
|
| 69 |
+
```
|
| 70 |
+
|
| 71 |
+
3. Download and process RealEstate10k dataset to `data/real-estate-10k`
|
| 72 |
+
|
| 73 |
+
## Generating Videos with Pretrained Models
|
| 74 |
+
|
| 75 |
+
### 1. Single Image to Long Video (256 Frames):
|
| 76 |
+
|
| 77 |
+
```shell
|
| 78 |
+
bash scripts/eval_geometry_forcing.sh
|
| 79 |
+
```
|
| 80 |
+
|
| 81 |
+
### 2. Single Image to Rotation Video (16 Frames):
|
| 82 |
+
|
| 83 |
+
```shell
|
| 84 |
+
bash scripts/eval_geometry_forcing_rotation.sh
|
| 85 |
+
```
|
| 86 |
+
|
| 87 |
+
## Training Geometry Forcing
|
| 88 |
+
|
| 89 |
+
```shell
|
| 90 |
+
bash scripts/train_geometry_forcing.sh
|
| 91 |
+
```
|
geometry_forcing_state_dict.ckpt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:b7b36898e6565abe2de281aa9acfba5eeade9c0071e01f35a90ae7fcae41efcd
|
| 3 |
+
size 1835451817
|
geometry_forcing_with_dino_state_dict.ckpt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:025f20677ebd5b84041987c9e5594303f5feb1a89f76b301881f665d7b619a4a
|
| 3 |
+
size 1835456847
|
main.png
ADDED
|
Git LFS Details
|