Pre-Trained Policy Discriminators are General Reward Models Paper • 2507.05197 • Published Jul 7 • 39
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments Paper • 2406.04151 • Published Jun 6, 2024 • 24