3 14 14

wenxueru

Aunderline

https://github.com/wenxueru

Aunderline

AI & ML interests

None yet

Recent Activity

upvoted a paper 6 days ago

Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering

upvoted a paper 6 days ago

SAISA: Towards Multimodal Large Language Models with Both Training and Inference Efficiency

upvoted a paper 6 days ago

Scalable Oversight for Superhuman AI via Recursive Self-Critiquing

View all activity

Organizations

None yet

upvoted 6 papers 6 days ago

Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering

Paper • 2411.11504 • Published Nov 18, 2024 • 24

SAISA: Towards Multimodal Large Language Models with Both Training and Inference Efficiency

Paper • 2502.02458 • Published Feb 4 • 1

Scalable Oversight for Superhuman AI via Recursive Self-Critiquing

Paper • 2502.04675 • Published Feb 7 • 1

Expanding the Boundaries of Vision Prior Knowledge in Multi-modal Large Language Models

Paper • 2503.18034 • Published Mar 23 • 1

ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers

Paper • 2504.00502 • Published Apr 1 • 26

Coupled Variational Reinforcement Learning for Language Model General Reasoning

Paper • 2512.12576 • Published 11 days ago • 2

authored a paper 6 days ago

Coupled Variational Reinforcement Learning for Language Model General Reasoning

Paper • 2512.12576 • Published 11 days ago • 2

submitted a paper to Daily Papers 6 days ago

Coupled Variational Reinforcement Learning for Language Model General Reasoning

Paper • 2512.12576 • Published 11 days ago • 2

New activity in nex-agi/agent-sft 17 days ago

Has the agent's trajectory data been verified/validated?

#1 opened 24 days ago by

Aunderline

New activity in SciCode1/SciCode 20 days ago

why 341 subproblems?

#4 opened 20 days ago by

Aunderline

upvoted a paper about 1 month ago

SciCode: A Research Coding Benchmark Curated by Scientists

Paper • 2407.13168 • Published Jul 18, 2024 • 16

upvoted a collection about 1 month ago

OLMo 2

Collection

Artifacts for the OLMo 2 release. • 35 items • Updated 1 day ago • 151

upvoted an article 3 months ago

Article

Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face

Jul 29

•

205

upvoted a paper 4 months ago

PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning

Paper • 2508.21104 • Published Aug 28 • 35

upvoted a paper 6 months ago

Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 263

authored 5 papers 7 months ago

Critic-CoT: Boosting the reasoning abilities of large language model via Chain-of-thoughts Critic

Paper • 2408.16326 • Published Aug 29, 2024 • 1

Scalable Oversight for Superhuman AI via Recursive Self-Critiquing

Paper • 2502.04675 • Published Feb 7 • 1

Cheems: A Practical Guidance for Building and Evaluating Chinese Reward Models from Scratch

Paper • 2502.17173 • Published Feb 24

On-Policy Self-Alignment with Fine-grained Knowledge Feedback for Hallucination Mitigation

Paper • 2406.12221 • Published Jun 18, 2024

Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree?

Paper • 2410.05584 • Published Oct 8, 2024

wenxueru

AI & ML interests

Recent Activity

Organizations

Aunderline's activity

Has the agent's trajectory data been verified/validated?

why 341 subproblems?

Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face