🧠Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 24 items • Updated May 19, 2025 • 181
view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) +2 Dec 9, 2022 • 390
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B Paper • 2511.06221 • Published Nov 9, 2025 • 132
nvidia/Nemotron-Research-Reasoning-Qwen-1.5B Text Generation • 2B • Updated Nov 21, 2025 • 1.21k • 235
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning Paper • 2509.25760 • Published Sep 30, 2025 • 55
Qwen/Qwen3-VL-30B-A3B-Instruct Image-Text-to-Text • 31B • Updated Nov 26, 2025 • 1.25M • • 501