SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents Paper • 2512.22322 • Published 21 days ago • 38
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning Paper • 2511.22570 • Published Nov 27, 2025 • 88
V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models Paper • 2511.16668 • Published Nov 20, 2025 • 54
Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning Paper • 2509.22601 • Published Sep 26, 2025 • 29
From Uniform to Heterogeneous: Tailoring Policy Optimization to Every Token's Nature Paper • 2509.16591 • Published Sep 20, 2025 • 2
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published Sep 2, 2025 • 228
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent Paper • 2508.05748 • Published Aug 7, 2025 • 141
OpenMathReasoning Collection Models and datasets from "AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset" • 7 items • Updated about 7 hours ago • 46