In a Training Loop 🔄

1 59 145

Peng Wang

stillarrow

https://peter-peng-w.github.io/

AI & ML interests

None yet

Recent Activity

upvoted a paper about 6 hours ago

TradingAgents: Multi-Agents LLM Financial Trading Framework

upvoted a paper about 8 hours ago

You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories

upvoted a paper about 8 hours ago

MetaAgent-X : Breaking the Ceiling of Automatic Multi-Agent Systems via End-to-End Reinforcement Learning

View all activity

Organizations

None yet

upvoted a paper about 6 hours ago

TradingAgents: Multi-Agents LLM Financial Trading Framework

Paper • 2412.20138 • Published Dec 28, 2024 • 78

upvoted 2 papers about 8 hours ago

You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories

Paper • 2605.21468 • Published 2 days ago • 41

MetaAgent-X : Breaking the Ceiling of Automatic Multi-Agent Systems via End-to-End Reinforcement Learning

Paper • 2605.14212 • Published 8 days ago • 16

upvoted a paper 14 days ago

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Paper • 2601.05242 • Published Jan 8 • 232

upvoted a paper 16 days ago

Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning

Paper • 2602.10090 • Published Feb 10 • 53

upvoted 3 papers about 1 month ago

HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds

Paper • 2604.14268 • Published Apr 15 • 121

Heterogeneous Agent Collaborative Reinforcement Learning

Paper • 2603.02604 • Published Mar 3 • 195

Self-Distilled RLVR

Paper • 2604.03128 • Published Apr 3 • 176

upvoted a collection about 2 months ago

Qwen2.5-Coder

Collection

Code-specific model series based on Qwen2.5 • 38 items • Updated Mar 2 • 367

upvoted 2 papers 2 months ago

PaperBanana: Automating Academic Illustration for AI Scientists

Paper • 2601.23265 • Published Jan 30 • 227

MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification

Paper • 2603.15726 • Published Mar 16 • 186

upvoted a collection 2 months ago

NeMo Gym

Collection

Collection of RL verifiable data for NeMo Gym • 22 items • Updated 2 days ago • 59

upvoted a collection 3 months ago

BFS-Prover

Collection

LLM Step-Provers in Lean4 • 5 items • Updated Oct 7, 2025 • 8

upvoted 3 papers 3 months ago

Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation

Paper • 2502.16707 • Published Feb 23, 2025 • 14

Learning to Repair Lean Proofs from Compiler Feedback

Paper • 2602.02990 • Published Feb 3 • 29

Experiential Reinforcement Learning

Paper • 2602.13949 • Published Feb 15 • 75

upvoted a paper 4 months ago

Scaling Embeddings Outperforms Scaling Experts in Language Models

Paper • 2601.21204 • Published Jan 29 • 103

upvoted an article 4 months ago

Article

Open Responses: What you need to know

evalstate, burtenshaw, merve, pcuenq

•

Jan 15

• 111

upvoted 2 papers 4 months ago

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Paper • 2502.14739 • Published Feb 20, 2025 • 110

Your Group-Relative Advantage Is Biased

Paper • 2601.08521 • Published Jan 13 • 158

Peng Wang

AI & ML interests

Recent Activity

Organizations

stillarrow's activity

Open Responses: What you need to know