In a Training Loop 🔄

Quentin Gallouédec PRO

qgallouedec

AI & ML interests

None yet

Recent Activity

updated a dataset about 17 hours ago

hf-doc-build/doc-build

updated a bucket about 17 hours ago

hf-doc-build/doc

updated a model 1 day ago

qgallouedec/Qwen3-4B-Thinking-2507-noisy

View all activity

Organizations

updated a dataset about 17 hours ago

hf-doc-build/doc-build

Updated 38 minutes ago • 306k • 38

updated a bucket about 17 hours ago

hf-doc-build/doc

138 GB

updated a model 1 day ago

qgallouedec/Qwen3-4B-Thinking-2507-noisy

Text Generation • 4B • Updated 1 day ago • 36

published a model 1 day ago

qgallouedec/Qwen3-4B-Thinking-2507-noisy

Text Generation • 4B • Updated 1 day ago • 36

updated a bucket 1 day ago

hf-doc-build/doc-dev

140 GB

liked a dataset 1 day ago

MathArena/aime_2026

Benchmark • Updated 8 days ago • 30 • 13.2k • 38

New activity in kernels-community/flash-attn2 1 day ago

`metadata.json` missing required fields on torch 2.7/2.8/2.9 build variants — breaks `kernels>=0.14`

#5 opened 1 day ago by

qgallouedec

liked a model 1 day ago

kernels-community/flash-attn2

Updated 5 days ago • 26.5k • 32

posted an update 3 days ago

Post

9588

Shipped hf-sandbox! 🥡

🧪 Running an eval that executes model-generated C on a few thousand prompts? You probably don't want any of that on your laptop.
Just shipped hf-sandbox, a Modal-style sandbox API on top of Hugging Face Jobs. Spin up an isolated, ephemeral container, run untrusted code, get the result back. No Docker on your laptop, no infra to manage.

Just pip install hf-sandbox.

Early days (v0.1); feedback and issues very welcome:
👉 https://github.com/huggingface/hf-sandbox

1 reply

New activity in trl-internal-testing/tiny-Qwen2_5_VLForConditionalGeneration 4 days ago

Upload Qwen2_5_VLForConditionalGeneration

#11 opened 4 days ago by

qgallouedec

Upload Qwen2_5_VLForConditionalGeneration

#10 opened 4 days ago by

qgallouedec

Upload Qwen2_5_VLForConditionalGeneration

#9 opened 4 days ago by

qgallouedec

posted an update 4 days ago

Post

174

**TRL v1.4 is out 🚀** Chunked NLL loss for SFT and a first-class **OpenReward** integration.

**Chunked NLL loss for SFT — drops peak VRAM by up to 14×**

Standard SFT materializes a full [batch × seq × vocab] logits tensor before computing cross-entropy, which dominates peak memory at long context lengths. The new loss_type="chunked_nll" path drops ignored-label tokens before the lm_head matmul and computes cross-entropy in checkpointed chunks of 256.

Peak GPU memory, AdamW fp32:
- Qwen3-14B, 8×H100 FSDP2, 16k seq: 58.9 GB → 38.9 GB
- Qwen3-4B, 1×H100 80GB, 16k seq: OOM → 63.8 GB
- Qwen3-32B, 8×H100 FSDP2, 8k seq: OOM → 71.2 GB

End-to-end it's consistently as fast or faster than nll, and unlocks sequence lengths that don't fit at all under the standard path.

SFTConfig(loss_type="chunked_nll")

Works with PEFT and VLMs out of the box.

**Open Reward Standard environment adapter**

The new trl.experimental.openreward adapter plugs any environment speaking the [Open Reward Standard](https://openrewardstandard.io) protocol into any TRL trainer that takes an environment_factory. One string — a catalog name or a URL — wires the dataset, factory, and reward_func slots; tools are bound dynamically from JSON Schema, no per-env wrapper code:

from trl import GRPOTrainer
from trl.experimental.openreward import OpenRewardSpec

spec = OpenRewardSpec("Eigent/SETA", num_tasks=64)

trainer = GRPOTrainer(
    ...,
    train_dataset=spec.train_dataset,
    environment_factory=spec.environment_factory,
    reward_funcs=spec.reward_funcs,
)

v1.4 also brings MFU helpers for dense + MoE models, GRPO support for Liger 0.8.0 (delta clipping + VESPO + KL bias correction), Tülu 3's length-normalized DPO loss, four more training chat templates (Cohere, Cohere2, Gemma 3, Qwen3-2507), and a 5+ GB CUDA memory leak fix in activation offloading.

Full release notes: https://github.com/huggingface/trl/releases/tag/v1.4.0