1 38 124

Peng Wang

stillarrow

https://peter-peng-w.github.io/

AI & ML interests

None yet

Recent Activity

liked a dataset about 3 hours ago

HuggingFaceH4/ultrafeedback_binarized

upvoted a collection about 9 hours ago

🧠 Reasoning datasets

liked a dataset 10 days ago

m-a-p/SuperGPQA

View all activity

Organizations

None yet

liked a dataset about 3 hours ago

HuggingFaceH4/ultrafeedback_binarized

Viewer • Updated Oct 16, 2024 • 187k • 8.83k • 317

upvoted a collection about 9 hours ago

🧠 Reasoning datasets

Collection

Datasets with reasoning traces for math and code released by the community • 24 items • Updated May 19, 2025 • 181

liked a dataset 10 days ago

m-a-p/SuperGPQA

Viewer • Updated Apr 30, 2025 • 26.5k • 5.92k • 80

liked a dataset 14 days ago

LLM360/guru-RL-92k

Viewer • Updated Aug 20, 2025 • 91.9k • 1.14k • 42

upvoted an article 29 days ago

Article

From GRPO to DAPO and GSPO: What, Why, and How

Aug 9, 2025

•

upvoted an article about 2 months ago

Article

Illustrating Reinforcement Learning from Human Feedback (RLHF)

Dec 9, 2022

•

390

liked a dataset about 2 months ago

zwhe99/DeepMath-103K

Viewer • Updated May 29, 2025 • 103k • 5.32k • 288

liked a model about 2 months ago

deepseek-ai/DeepSeek-Math-V2

Text Generation • 685B • Updated Nov 27, 2025 • 2.59k • 676

upvoted a paper about 2 months ago

Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B

Paper • 2511.06221 • Published Nov 9, 2025 • 132

liked a model about 2 months ago

WeiboAI/VibeThinker-1.5B

Text Generation • 2B • Updated Nov 24, 2025 • 2.01k • 509

liked a model 2 months ago

nvidia/Nemotron-Research-Reasoning-Qwen-1.5B

Text Generation • 2B • Updated Nov 21, 2025 • 1.21k • 235

liked a dataset 2 months ago

open-r1/DAPO-Math-17k-Processed

Viewer • Updated Nov 10, 2025 • 34.8k • 5.2k • 54

upvoted 2 papers 3 months ago

ExGRPO: Learning to Reason from Experience

Paper • 2510.02245 • Published Oct 2, 2025 • 80

TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning

Paper • 2509.25760 • Published Sep 30, 2025 • 55

liked 3 models 3 months ago

liked a dataset 3 months ago

jupyter-agent/jupyter-agent-dataset

Viewer • Updated Sep 10, 2025 • 95.8k • 807 • 155

liked a model 3 months ago

jinaai/jina-embeddings-v4

Visual Document Retrieval • 4B • Updated Sep 2, 2025 • 77.4k • 451

upvoted a collection 3 months ago

Qwen3-VL