Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering Paper • 2411.11504 • Published Nov 18, 2024 • 24
SAISA: Towards Multimodal Large Language Models with Both Training and Inference Efficiency Paper • 2502.02458 • Published Feb 4 • 1
Scalable Oversight for Superhuman AI via Recursive Self-Critiquing Paper • 2502.04675 • Published Feb 7 • 1
Expanding the Boundaries of Vision Prior Knowledge in Multi-modal Large Language Models Paper • 2503.18034 • Published Mar 23 • 1
ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers Paper • 2504.00502 • Published Apr 1 • 26
Coupled Variational Reinforcement Learning for Language Model General Reasoning Paper • 2512.12576 • Published 11 days ago • 2
Coupled Variational Reinforcement Learning for Language Model General Reasoning Paper • 2512.12576 • Published 11 days ago • 2
Coupled Variational Reinforcement Learning for Language Model General Reasoning Paper • 2512.12576 • Published 11 days ago • 2
SciCode: A Research Coding Benchmark Curated by Scientists Paper • 2407.13168 • Published Jul 18, 2024 • 16
view article Article Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face +3 Jul 29 • 205
PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning Paper • 2508.21104 • Published Aug 28 • 35
Critic-CoT: Boosting the reasoning abilities of large language model via Chain-of-thoughts Critic Paper • 2408.16326 • Published Aug 29, 2024 • 1
Scalable Oversight for Superhuman AI via Recursive Self-Critiquing Paper • 2502.04675 • Published Feb 7 • 1
Cheems: A Practical Guidance for Building and Evaluating Chinese Reward Models from Scratch Paper • 2502.17173 • Published Feb 24
On-Policy Self-Alignment with Fine-grained Knowledge Feedback for Hallucination Mitigation Paper • 2406.12221 • Published Jun 18, 2024
Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree? Paper • 2410.05584 • Published Oct 8, 2024