19 43

kang

qiyue

AI & ML interests

None yet

Recent Activity

upvoted an article 2 months ago

Mixture of Experts (MoEs) in Transformers

upvoted an article 3 months ago

Introducing SyGra Studio

upvoted a paper 3 months ago

Improving Data and Reward Design for Scientific Reasoning in Large Language Models

View all activity

Organizations

None yet

upvoted an article 2 months ago

Article

Mixture of Experts (MoEs) in Transformers

ariG23498, pcuenq, merve, IlyasMoutawwakil, ArthurZ, sergiopaniego, Molbap

•

Feb 26

• 159

upvoted an article 3 months ago

Article

Introducing SyGra Studio

ServiceNow-AI

•

Feb 5

• 27

upvoted 2 papers 3 months ago

Improving Data and Reward Design for Scientific Reasoning in Large Language Models

Paper • 2602.08321 • Published Feb 9 • 43

A2Eval: Agentic and Automated Evaluation for Embodied Brain

Paper • 2602.01640 • Published Feb 2 • 8

liked a Space 5 months ago

Evaluation Guidebook

📝

317

Explore LLM benchmark trends over time

upvoted an article 5 months ago

Article

makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch

AviSoori1x

•

May 7, 2024

• 119

upvoted a paper 5 months ago

Kimi Linear: An Expressive, Efficient Attention Architecture

Paper • 2510.26692 • Published Oct 30, 2025 • 132

upvoted 2 articles 6 months ago

Article

Diffusers welcomes FLUX-2

YiYiXu, dg845, sayakpaul, OzzyGT, dn6, ariG23498, linoyts, multimodalart

•

Nov 25, 2025

• 190

Article

What makes good reasoning data

MiniMax-AI

•

Oct 30, 2025

• 44

upvoted a paper 7 months ago

Knocking-Heads Attention

Paper • 2510.23052 • Published Oct 27, 2025 • 30

upvoted an article over 1 year ago

Article

Open R1: Update #2

open-r1

•

Feb 10, 2025

• 218

liked a model over 1 year ago

deepseek-ai/DeepSeek-V3

Text Generation • 685B • Updated Mar 27, 2025 • 1.14M • • 4.07k

upvoted an article over 1 year ago

Article

Hugging Face welcomes the Aya Expanse family of multilingual models

ariG23498

•

Oct 24, 2024

• 10

upvoted a paper over 1 year ago

Training Language Models to Self-Correct via Reinforcement Learning

Paper • 2409.12917 • Published Sep 19, 2024 • 140

liked a model over 1 year ago

mistralai/Mistral-Small-Instruct-2409

Updated Jul 28, 2025 • 8.2k • 393

upvoted an article over 1 year ago

Article

Improving Hugging Face Training Efficiency Through Packing with Flash Attention 2

RQlee, ArthurZ, achikundu, lwtr, rganti, mayank-mishra

•

Aug 21, 2024

• 41

upvoted a paper almost 2 years ago

Understanding Reference Policies in Direct Preference Optimization

Paper • 2407.13709 • Published Jul 18, 2024 • 17

upvoted 2 articles almost 2 years ago

Article

RegMix: Data Mixture as Regression for Language Model Pre-training

SivilTaram

•

Jul 11, 2024

• 15

Article

The Rise of Agentic Data Generation

mlabonne

•

Jul 15, 2024

• 89

liked a dataset almost 2 years ago

tasksource/tasksource_dpo_pairs

Viewer • Updated Jul 1, 2024 • 5.13M • 401 • 21

kang

AI & ML interests

Recent Activity

Organizations

qiyue's activity

Mixture of Experts (MoEs) in Transformers

Introducing SyGra Studio

Evaluation Guidebook

makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch

Diffusers welcomes FLUX-2

What makes good reasoning data

Open R1: Update #2

Hugging Face welcomes the Aya Expanse family of multilingual models

Improving Hugging Face Training Efficiency Through Packing with Flash Attention 2

RegMix: Data Mixture as Regression for Language Model Pre-training

The Rise of Agentic Data Generation