StableVLA: Towards Robust Vision-Language-Action Models without Extra Data Paper • 2605.18287 • Published 4 days ago • 13
Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation Paper • 2605.15141 • Published 8 days ago • 91
AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation Paper • 2605.13724 • Published 9 days ago • 96
WorldReasonBench: Human-Aligned Stress Testing of Video Generators as Future World-State Predictors Paper • 2605.10434 • Published 11 days ago • 30
HumanNet: Scaling Human-centric Video Learning to One Million Hours Paper • 2605.06747 • Published 15 days ago • 51
Efficient Training on Multiple Consumer GPUs with RoundPipe Paper • 2604.27085 • Published 23 days ago • 40
Enhancing Spatial Understanding in Image Generation via Reward Modeling Paper • 2602.24233 • Published Feb 27 • 60
Light Forcing: Accelerating Autoregressive Video Diffusion via Sparse Attention Paper • 2602.04789 • Published Feb 4 • 3
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head Paper • 2601.07832 • Published Jan 12 • 53