GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning Paper • 2602.12099 • Published 9 days ago • 56
SALAD: Achieve High-Sparsity Attention via Efficient Linear Attention Tuning for Video Diffusion Transformer Paper • 2601.16515 • Published 30 days ago • 15
PhysRVG: Physics-Aware Unified Reinforcement Learning for Video Generative Models Paper • 2601.11087 • Published Jan 16 • 11
VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice Paper • 2601.05175 • Published Jan 8 • 36
Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning Paper • 2601.06943 • Published Jan 11 • 212
VINCIE: Unlocking In-context Image Editing from Video Paper • 2506.10941 • Published Jun 12, 2025 • 4
view article Article Generalist Robot Policy Evaluation in Simulation with NVIDIA Isaac Lab-Arena and LeRobot Jan 5 • 19
ProEdit: Inversion-based Editing From Prompts Done Right Paper • 2512.22118 • Published Dec 26, 2025 • 18
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation Paper • 2512.23576 • Published Dec 29, 2025 • 65
Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models Paper • 2512.20557 • Published Dec 23, 2025 • 50
view article Article SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data +7 Jun 3, 2025 • 321