Actial: Activate Spatial Reasoning Ability of Multimodal Large Language Models Paper • 2511.01618 • Published Nov 3, 2025 • 10
Agentic Jigsaw Interaction Learning for Enhancing Visual Perception and Reasoning in Vision-Language Models Paper • 2510.01304 • Published Oct 1, 2025 • 10
Interleaving Reasoning for Better Text-to-Image Generation Paper • 2509.06945 • Published Sep 8, 2025 • 14
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models Paper • 2503.06749 • Published Mar 9, 2025 • 31
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning Paper • 2504.07956 • Published Apr 10, 2025 • 46