THINKSAFE: Self-Generated Safety Alignment for Reasoning Models Paper • 2601.23143 • Published 4 days ago • 38
ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas Paper • 2601.21558 • Published 5 days ago • 53
Beyond Imitation: Reinforcement Learning for Active Latent Planning Paper • 2601.21598 • Published 5 days ago • 9
Agent Lightning: Train ANY AI Agents with Reinforcement Learning Paper • 2508.03680 • Published Aug 5, 2025 • 131
Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation Paper • 2601.20614 • Published 6 days ago • 115
daVinci-Dev: Agent-native Mid-training for Software Engineering Paper • 2601.18418 • Published 8 days ago • 123
Scaling Embeddings Outperforms Scaling Experts in Language Models Paper • 2601.21204 • Published 6 days ago • 96
ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation Paper • 2601.21420 • Published 5 days ago • 40
Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning Paper • 2601.19280 • Published 7 days ago • 9
Guided Self-Evolving LLMs with Minimal Human Supervision Paper • 2512.02472 • Published Dec 2, 2025 • 54
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability Paper • 2601.18778 • Published 8 days ago • 39
Towards Pixel-Level VLM Perception via Simple Points Prediction Paper • 2601.19228 • Published 8 days ago • 16