See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning Paper • 2512.22120 • Published 6 days ago • 12
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance Paper • 2512.08765 • Published 23 days ago • 128
From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model Paper • 2510.19871 • Published Oct 22, 2025 • 29
Generative Universal Verifier as Multimodal Meta-Reasoner Paper • 2510.13804 • Published Oct 15, 2025 • 25
LongLive: Real-time Interactive Long Video Generation Paper • 2509.22622 • Published Sep 26, 2025 • 184
Reconstruction Alignment Improves Unified Multimodal Models Paper • 2509.07295 • Published Sep 8, 2025 • 40
AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning Paper • 2507.12841 • Published Jul 17, 2025 • 41
SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation Paper • 2507.09862 • Published Jul 14, 2025 • 49
MMaDA: Multimodal Large Diffusion Language Models Paper • 2505.15809 • Published May 21, 2025 • 97
Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening Paper • 2502.12146 • Published Feb 17, 2025 • 16
HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation Paper • 2502.12148 • Published Feb 17, 2025 • 17
Improving Video Generation with Human Feedback Paper • 2501.13918 • Published Jan 23, 2025 • 52
EMO2: End-Effector Guided Audio-Driven Avatar Video Generation Paper • 2501.10687 • Published Jan 18, 2025 • 15
Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis Paper • 2412.04431 • Published Dec 5, 2024 • 17
IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation Paper • 2410.07171 • Published Oct 9, 2024 • 43
A Survey on the Honesty of Large Language Models Paper • 2409.18786 • Published Sep 27, 2024 • 31