LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation Paper • 2605.18739 • Published 6 days ago • 108
MMSkills: Towards Multimodal Skills for General Visual Agents Paper • 2605.13527 • Published 10 days ago • 117
HyperEyes: Dual-Grained Efficiency-Aware Reinforcement Learning for Parallel Multimodal Search Agents Paper • 2605.07177 • Published 16 days ago • 61
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model Paper • 2604.20796 • Published Apr 22 • 240
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents Paper • 2604.07429 • Published Apr 8 • 121
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond Paper • 2604.22748 • Published 30 days ago • 226
VP-VLA: Visual Prompting as an Interface for Vision-Language-Action Models Paper • 2603.22003 • Published Mar 23 • 12
MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling Paper • 2511.11793 • Published Nov 14, 2025 • 195
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning Paper • 2507.13348 • Published Jul 17, 2025 • 79