Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds Paper • 2511.08892 • Published Nov 12 • 201
InternVL3.5 Collection This collection includes all released checkpoints of InternVL3.5, covering different training stages (e.g., Pretraining, SFT, MPO, Cascade RL). • 54 items • Updated Sep 28 • 103
DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction Paper • 2505.21473 • Published May 27 • 16
UniTok: A Unified Tokenizer for Visual Generation and Understanding Paper • 2502.20321 • Published Feb 27 • 31
Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis Paper • 2412.04431 • Published Dec 5, 2024 • 18
Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models Paper • 2406.09416 • Published Jun 13, 2024 • 29
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models Paper • 2404.13013 • Published Apr 19, 2024 • 31
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction Paper • 2404.02905 • Published Apr 3, 2024 • 74