Collections
Discover the best community collections!
Collections including paper arxiv:2602.09856
-
SWE-Universe: Scale Real-World Verifiable Environments to Millions
Paper • 2602.02361 • Published • 60 -
LongCodeZip: Compress Long Context for Code Language Models
Paper • 2510.00446 • Published • 107 -
Code2World: A GUI World Model via Renderable Code Generation
Paper • 2602.09856 • Published • 200 -
Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces
Paper • 2601.11868 • Published • 34
-
THINKSAFE: Self-Generated Safety Alignment for Reasoning Models
Paper • 2601.23143 • Published • 39 -
PaperBanana: Automating Academic Illustration for AI Scientists
Paper • 2601.23265 • Published • 218 -
Agentic Reasoning for Large Language Models
Paper • 2601.12538 • Published • 201 -
BabyVision: Visual Reasoning Beyond Language
Paper • 2601.06521 • Published • 200
-
Towards Scalable Pre-training of Visual Tokenizers for Generation
Paper • 2512.13687 • Published • 106 -
MMGR: Multi-Modal Generative Reasoning
Paper • 2512.14691 • Published • 119 -
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
Paper • 2512.23447 • Published • 98 -
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation
Paper • 2512.23576 • Published • 65
-
CoRAG: Collaborative Retrieval-Augmented Generation
Paper • 2504.01883 • Published • 9 -
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning
Paper • 2504.08600 • Published • 33 -
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL
Paper • 2503.23157 • Published • 10 -
AI Agents: Evolution, Architecture, and Real-World Applications
Paper • 2503.12687 • Published • 2
-
Code2World: A GUI World Model via Renderable Code Generation
Paper • 2602.09856 • Published • 200 -
How2Everything: Mining the Web for How-To Procedures to Evaluate and Improve LLMs
Paper • 2602.08808 • Published • 8 -
Thinking Makes LLM Agents Introverted: How Mandatory Thinking Can Backfire in User-Engaged Agents
Paper • 2602.07796 • Published • 7 -
QP-OneModel: A Unified Generative LLM for Multi-Task Query Understanding in Xiaohongshu Search
Paper • 2602.09901 • Published • 6
-
Towards Pixel-Level VLM Perception via Simple Points Prediction
Paper • 2601.19228 • Published • 18 -
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Paper • 2601.19895 • Published • 25 -
Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision
Paper • 2601.19798 • Published • 42 -
OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models
Paper • 2601.21639 • Published • 50
-
The Debugging Decay Index: Rethinking Debugging Strategies for Code LLMs
Paper • 2506.18403 • Published • 3 -
ReCode: Updating Code API Knowledge with Reinforcement Learning
Paper • 2506.20495 • Published • 10 -
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution
Paper • 2507.23348 • Published • 12 -
LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering
Paper • 2509.09614 • Published • 7
-
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper • 2501.18585 • Published • 61 -
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!
Paper • 2502.07374 • Published • 40 -
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Paper • 2502.06703 • Published • 152 -
S*: Test Time Scaling for Code Generation
Paper • 2502.14382 • Published • 63
-
Code2World: A GUI World Model via Renderable Code Generation
Paper • 2602.09856 • Published • 200 -
How2Everything: Mining the Web for How-To Procedures to Evaluate and Improve LLMs
Paper • 2602.08808 • Published • 8 -
Thinking Makes LLM Agents Introverted: How Mandatory Thinking Can Backfire in User-Engaged Agents
Paper • 2602.07796 • Published • 7 -
QP-OneModel: A Unified Generative LLM for Multi-Task Query Understanding in Xiaohongshu Search
Paper • 2602.09901 • Published • 6
-
SWE-Universe: Scale Real-World Verifiable Environments to Millions
Paper • 2602.02361 • Published • 60 -
LongCodeZip: Compress Long Context for Code Language Models
Paper • 2510.00446 • Published • 107 -
Code2World: A GUI World Model via Renderable Code Generation
Paper • 2602.09856 • Published • 200 -
Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces
Paper • 2601.11868 • Published • 34
-
THINKSAFE: Self-Generated Safety Alignment for Reasoning Models
Paper • 2601.23143 • Published • 39 -
PaperBanana: Automating Academic Illustration for AI Scientists
Paper • 2601.23265 • Published • 218 -
Agentic Reasoning for Large Language Models
Paper • 2601.12538 • Published • 201 -
BabyVision: Visual Reasoning Beyond Language
Paper • 2601.06521 • Published • 200
-
Towards Pixel-Level VLM Perception via Simple Points Prediction
Paper • 2601.19228 • Published • 18 -
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Paper • 2601.19895 • Published • 25 -
Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision
Paper • 2601.19798 • Published • 42 -
OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models
Paper • 2601.21639 • Published • 50
-
Towards Scalable Pre-training of Visual Tokenizers for Generation
Paper • 2512.13687 • Published • 106 -
MMGR: Multi-Modal Generative Reasoning
Paper • 2512.14691 • Published • 119 -
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
Paper • 2512.23447 • Published • 98 -
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation
Paper • 2512.23576 • Published • 65
-
The Debugging Decay Index: Rethinking Debugging Strategies for Code LLMs
Paper • 2506.18403 • Published • 3 -
ReCode: Updating Code API Knowledge with Reinforcement Learning
Paper • 2506.20495 • Published • 10 -
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution
Paper • 2507.23348 • Published • 12 -
LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering
Paper • 2509.09614 • Published • 7
-
CoRAG: Collaborative Retrieval-Augmented Generation
Paper • 2504.01883 • Published • 9 -
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning
Paper • 2504.08600 • Published • 33 -
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL
Paper • 2503.23157 • Published • 10 -
AI Agents: Evolution, Architecture, and Real-World Applications
Paper • 2503.12687 • Published • 2
-
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper • 2501.18585 • Published • 61 -
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!
Paper • 2502.07374 • Published • 40 -
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Paper • 2502.06703 • Published • 152 -
S*: Test Time Scaling for Code Generation
Paper • 2502.14382 • Published • 63