liu zh

morphism42

AI & ML interests

None yet

Recent Activity

upvoted a paper 1 day ago

JudgeRLVR: Judge First, Generate Second for Efficient Reasoning

upvoted a paper 4 months ago

On Predictability of Reinforcement Learning Dynamics for Large Language Models

upvoted a paper 5 months ago

Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference

View all activity

Organizations

None yet

upvoted a paper 1 day ago

JudgeRLVR: Judge First, Generate Second for Efficient Reasoning

Paper • 2601.08468 • Published 2 days ago • 6

upvoted a paper 4 months ago

On Predictability of Reinforcement Learning Dynamics for Large Language Models

Paper • 2510.00553 • Published Oct 1, 2025 • 8

upvoted a paper 5 months ago

Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference

Paper • 2508.02193 • Published Aug 4, 2025 • 134

liked a Space 8 months ago

LLM训练终极指南 | The Ultra-Scale Playbook

🔥

252

了解LLM训练的方方面面

upvoted a paper 11 months ago

Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search

Paper • 2502.02508 • Published Feb 4, 2025 • 22

upvoted 5 articles over 1 year ago

Article

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

Sep 18, 2024

•

272

Article

How NuminaMath Won the 1st AIMO Progress Prize

Jul 11, 2024

•

124

Article

Illustrating Reinforcement Learning from Human Feedback (RLHF)

Dec 9, 2022

•

391

Article

Fine-tune Llama 3 with ORPO

Apr 22, 2024

•

241

Article

Personal Copilot: Train Your Own Coding Assistant

Oct 27, 2023

•

liu zh

AI & ML interests

Recent Activity

Organizations

morphism42's activity

LLM训练终极指南 | The Ultra-Scale Playbook

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

How NuminaMath Won the 1st AIMO Progress Prize

Illustrating Reinforcement Learning from Human Feedback (RLHF)

Fine-tune Llama 3 with ORPO

Personal Copilot: Train Your Own Coding Assistant