PlanningBench: Generating Scalable and Verifiable Planning Data for Evaluating and Training Large Language Models Paper • 2605.20873 • Published 12 days ago • 31
Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence Paper • 2604.18292 • Published Apr 20 • 85
MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning Paper • 2603.03379 • Published Mar 3 • 32
LaSER: Internalizing Explicit Reasoning into Latent Space for Dense Retrieval Paper • 2603.01425 • Published Mar 2 • 7
LaSER: Internalizing Explicit Reasoning into Latent Space for Dense Retrieval Paper • 2603.01425 • Published Mar 2 • 7
DeepImageSearch: Benchmarking Multimodal Agents for Context-Aware Image Retrieval in Visual Histories Paper • 2602.10809 • Published Feb 11 • 59
LawThinker: A Deep Research Legal Agent in Dynamic Environments Paper • 2602.12056 • Published Feb 12 • 35
GISA: A Benchmark for General Information-Seeking Assistant Paper • 2602.08543 • Published Feb 9 • 26