Haoyuwu commited on
Commit
ae30709
·
verified ·
1 Parent(s): c4d2c70

Upload GeometryForcing files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ main.png filter=lfs diff=lfs merge=lfs -text
DFoT_16f_state_dict.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:601e395b40948d05ce446bb65352ceceb7ca46de0954823d032b000a6815d09e
3
+ size 1835447729
README.md CHANGED
@@ -1,3 +1,91 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div align="center">
2
+
3
+ <h1>Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling </h1>
4
+ <a href="https://www.arxiv.org/abs/2507.07982">
5
+ <img src='https://img.shields.io/badge/arxiv-geometryforcing-darkred' alt='Paper PDF'></a>
6
+ <a href="https://geometryforcing.github.io/">
7
+ <img src='https://img.shields.io/badge/Project-Website-orange' alt='Project Page'></a>
8
+
9
+
10
+ [Haoyu Wu](https://cintellifusion.github.io/)$^{1*}$, Diankun Wu $^{2*}$, Tianyu He $^{1†}$, Junliang Guo $^{1}$, Yang Ye $^{1}$, Yueqi Duan $^{2}$, Jiang Bian $^{1}$
11
+
12
+ $^1$ Microsoft Research $^2$ Tsinghua University
13
+
14
+ ($^*$ Equal Contribution. † Project Lead)
15
+
16
+ </div>
17
+
18
+ # Reference
19
+
20
+ ```
21
+ @article{wu2025geometryforcing,
22
+ title={Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling},
23
+ author={Wu, Haoyu and Wu, Diankun and He, Tianyu and Guo, Junliang and Ye, Yang and Duan, Yueqi and Bian, Jiang},
24
+ journal={arXiv preprint arXiv:2507.07982},
25
+ year={2025}
26
+ }
27
+ ```
28
+
29
+ <!-- include asset/main.png -->
30
+
31
+ # Overview
32
+ ![](main.png)
33
+ **Geometry Forcing (GF) Overview.**
34
+ (a) Our proposed GF paradigm enhances video diffusion models by aligning with geometric features from VGGT~\citep{wang2025vggt}.
35
+ (b) Compared to DFoT~\citep{dfot}, our method generates more temporally and geometrically consistent videos.
36
+ (c) While baseline features fail to reconstruct meaningful 3D geometry, GF-learned features enable accurate 3D reconstruction.
37
+
38
+ # 🚀News
39
+
40
+ - [2025/9/24] We release code and checkpoint.
41
+ - [2025/9/22] [Geometry Forcing](https://geometryforcing.github.io/) is accepted to [NeurIPS 2025 NextVid Workshop](https://what-makes-good-video.github.io/) as an Oral!
42
+ - [2025/7/10] We release the paper and the project.
43
+
44
+ # 💪Get Started
45
+
46
+ ## Setup Environments
47
+
48
+ ```shell
49
+ conda create -n geometryforcing python=3.10 -y
50
+ conda activate geometryforcing
51
+ pip install -r requirements.txt
52
+ ```
53
+
54
+ ## Connect to Weights & Biases:
55
+
56
+ We use Weights & Biases for logging. [Sign up](https://wandb.ai/login?signup=true) if you don't have an account, and *modify `wandb.entity` in `config.yaml` to your user/organization name*.
57
+
58
+ ## Download Checkpoints and Data
59
+ 1. Download pretrained checkpiont using huggingface:
60
+ ```shell
61
+ bash scripts/hf_download_checkpoints.sh
62
+ ```
63
+
64
+
65
+ 2. Download pretrained checkpiont using modelscope:
66
+
67
+ ```shell
68
+ bash scripts/ms_download_checkpoints.sh
69
+ ```
70
+
71
+ 3. Download and process RealEstate10k dataset to `data/real-estate-10k`
72
+
73
+ ## Generating Videos with Pretrained Models
74
+
75
+ ### 1. Single Image to Long Video (256 Frames):
76
+
77
+ ```shell
78
+ bash scripts/eval_geometry_forcing.sh
79
+ ```
80
+
81
+ ### 2. Single Image to Rotation Video (16 Frames):
82
+
83
+ ```shell
84
+ bash scripts/eval_geometry_forcing_rotation.sh
85
+ ```
86
+
87
+ ## Training Geometry Forcing
88
+
89
+ ```shell
90
+ bash scripts/train_geometry_forcing.sh
91
+ ```
geometry_forcing_state_dict.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b7b36898e6565abe2de281aa9acfba5eeade9c0071e01f35a90ae7fcae41efcd
3
+ size 1835451817
geometry_forcing_with_dino_state_dict.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:025f20677ebd5b84041987c9e5594303f5feb1a89f76b301881f665d7b619a4a
3
+ size 1835456847
main.png ADDED

Git LFS Details

  • SHA256: 276dfc332582d624dd52903c150c38cded7f673470a29b460b4d30b5580bcbfe
  • Pointer size: 132 Bytes
  • Size of remote file: 4.46 MB