Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

qgallouedecΒ 
posted an update 2 days ago
view post
Post
7530

TRL v1.3 ships day-one training support for Qwen 3.6 πŸš€

The new Qwen 3.6 family (Qwen/Qwen3.6-27B, Qwen/Qwen3.6-35B-A3B) reuses the Qwen3.5-MoE architecture but ships a slightly different chat template, so we updated the stack end-to-end: new training template with {% generation %} markers, tool-call response schema routing, tiny test models for the VLM matrix.

SFT with assistant-only loss works out of the box:

from trl import SFTConfig, SFTTrainer

trainer = SFTTrainer(
    model="Qwen/Qwen3.6-27B",
    args=SFTConfig(assistant_only_loss=True),
    train_dataset=dataset,
)
trainer.train()


So does GRPO tool-calling β€” just hand tools=[...] to GRPOTrainer.

v1.3 also brings a new experimental TPO trainer (Triple Preference Optimization), speculative decoding in trl vllm-serve (Qwen3 MTP / Eagle3 drafts), 12 more KTO ↔ DPO alignment PRs (KTO promotion to stable is now in reach), three more {% generation %} chat templates (Gemma/Gemma 2, Phi-3, GLM-4-MoE), and a chunky SFT entropy bug fix.

Full release notes: https://github.com/huggingface/trl/releases/tag/v1.3.0
EnderchefΒ 
posted an update 1 day ago
view post
Post
5172
Hi, everyone!
Please follow, like, and support the work of
CompactAI-O
!
Spread the word!
projectlosangelesΒ 
posted an update 2 days ago
view post
Post
10292
πŸ”₯Check out first-of-its-kind SOTA Orpheus Morpheus preview!πŸ”₯

projectlosangeles/Orpheus-Morpheus

Easily generate variations or similar compositions from any MIDI!

Please ❀️if you enjoyed Orpheus Morpheus!

Sincerely,

Alex

SeaWolf-AIΒ 
posted an update 4 days ago
view post
Post
8622
🧬 Introducing Darwin-9B-NEG β€” the first model with Native Entropy Gating (NEG)

πŸ”— Try it now: FINAL-Bench/Darwin-9B-NEG
πŸ”— Q4 bit : FINAL-Bench/Darwin-9B-MFP4

We're thrilled to release Darwin-9B-NEG, a 9B-parameter reasoning model
that embeds an architecturally-internalised sense of self-confidence directly
into the transformer β€” our proprietary Native Entropy Gating (NEG) technology.

πŸ“Š GPQA Diamond (198 PhD-level questions):

β–Έ Baseline Darwin-9B (no NEG) β†’ 51.01 %
β–Έ Pure NEG (greedy Β· 1Γ— cost) β†’ 63.64 % πŸ”₯ +12.63 %p
β–Έ + Permutation (4Γ— cost) β†’ 76.26 %
β–Έ + Ensemble Refinement (~20Γ—) β†’ 84.34 % πŸ†

With only 9 billion parameters and 1Γ— inference cost, Pure NEG jumps
+12.63 %p over the same model without NEG. Going all-in with ensemble
refinement pushes it to 84.34 % β€” surpassing the published Qwen3.5-9B
leaderboard score (81.7 %) by +2.64 %p.

πŸ”¬ What makes NEG different from Multi-Turn Iteration (MTI)?

Classical MTI needs 3-8Γ— extra inference passes. NEG instead lives
INSIDE the single decoding loop. Two tiny modules ride with the
transformer: NEG-Head predicts per-token entropy from the last hidden
state, and NEG-Gate conditionally restricts the top-k choice when
confidence is low. The gate activates in only 4.36 % of tokens β€”
essentially free at inference time.

✨ Key differentiators
β€’ Architecturally internalised β€” model file *is* the feature
β€’ 1Γ— inference cost (vs. 3-8Γ— for MTI)
β€’ Drop-in with vLLM / SGLang / TGI / transformers β€” no extra engine
β€’ +12.63 %p reasoning at zero latency overhead
β€’ Single-file deployment, Apache 2.0 licensed

🧬 Lineage
Qwen/Qwen3.5-9B β†’ Darwin-9B-Opus (V7 evolutionary merge) β†’ Darwin-9B-NEG (V8 + NEG training)

#Darwin #NEG #NativeEntropyGating #GPQA #Reasoning #LLM #OpenSource #Apache2
prometechincΒ 
posted an update about 3 hours ago
view post
Post
11
pthinc/BCE-Prettybird-Nano-Parrot-v0.2

This dataset is a bilingual (Turkish-English mixed) comedic text collection designed for training and fine-tuning conversational AI models with humor awareness, sarcasm detection, and cultural nuance understanding. It includes short joke-style prompts, observational comedy snippets, and absurd dialogue fragments that blend everyday Turkish expressions with English punchlines, reflecting real-world code-switching behavior. The dataset aims to improve model creativity, timing, and informal language fluency while capturing the rhythm of stand-up comedy and internet humor across multilingual contexts.

It is made from synthetic in AI. There is irony and humor, some jokes might be a bit stale. 🀣

600 jokes and ironies in different languages have been added. Styles of various comedians are included.
yuriyvnvΒ 
posted an update about 22 hours ago
view post
Post
122
πŸ”Š Four Qwen3-ASR (0.6B and 1.7B) Fine-Tunes for Portuguese and Dutch.

Both the 1.7B and 0.6B variants of Alibaba's Qwen3-ASR, fine-tuned for European Portuguese and Dutch and bundled in a single collection.

πŸ”— Collection: https://huggingface.co/collections/yuriyvnv/qwen-asr-for-portuguese-and-dutch-17b-and-06b

Headline numbers β€” Common Voice 22 test, with the zero-shot baseline.
πŸ‡΅πŸ‡Ή Qwen3-ASR-1.7B-PT β€” 12.91% β†’ 8.50% WER (-34%)
πŸ‡΅πŸ‡Ή Qwen3-ASR-0.6B-PT β€” 18.26% β†’ 11.85% WER (-35%)
πŸ‡³πŸ‡± Qwen3-ASR-1.7B-NL β€” 6.68% β†’ 5.28% WER (-21%)
πŸ‡³πŸ‡± Qwen3-ASR-0.6B-NL β€” 12.46% β†’ 8.31% WER (-33%)

The 0.6B variants are the more interesting half of the release. They give up only a few WER points compared to the 1.7B at a third of the parameters β€” relevant for edge hardware, CPU inference, or anywhere keeping inference cost down. The Dutch 0.6B in particular lands at 8.3% WER on CV22, competitive with much larger systems.

The Dutch 1.7B started from a strong 6.7% zero-shot, so the absolute gain is smaller β€” Qwen already handles Dutch well, and the fine-tune mostly sharpens it on Common Voice's casing and punctuation conventions.

Training stuck close to Qwen's official SFT recipe (lr 2e-5, linear schedule, 2% warmup, bf16, gradient checkpointing on a single H100). The data is the differentiator: Common Voice 22 train + validation augmented with synthetic OpenAI-TTS speech, filtered by the WAVe multimodal embedding model that scores clips at the word level and drops the ones that don't align well with their transcripts.

πŸ“¦ Full pipeline β€” synthetic data generation, WAVe filtering, training scripts, evaluation protocol β€” is open-source:
github.com/yuriyvnv/TTS-Augmented-ASR
@hf-audio .
#asr #speech #parakeet #nvidia #nemo #multilingual #fine-tuning #commonvoice
HaChazalΒ 
posted an update about 23 hours ago
evalstateΒ 
posted an update 1 day ago
view post
Post
99
Hugging Face MCP Server v0.3.9
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Users with a bucket named mcp will get an additional list_files tool that returns the public URL of contained files. This is primarily intended for use with Gradio Spaces that need URLs as inputs.
mlabonneΒ 
posted an update 1 day ago
view post
Post
239
Big update to llm-datasets, my curated list of datasets and tools for post-training LLMs.

> Added many new datasets
> New "thinking" column
> Refreshed recommended tools.

Thanks to everyone who told me they used it for their research at ICLR, you motivated this update!
  • 1 reply
Β·
kanaria007Β 
posted an update 1 day ago
view post
Post
135
βœ… Article highlight: *Continuous Audit Pipeline: Making Evidence Bundles Routine* (art-60-107, v0.1)

TL;DR:
This article argues that evidence bundles should not be an incident-only ritual.

If reconstructability matters only after something goes wrong, it is already too late. SI turns audit into a *continuous pipeline*: routine sealed bundles, immediate verification, retention-safe omissions, and automatic escalation when governance SLOs are breached.

Read:
kanaria007/agi-structural-intelligence-protocols

Why it matters:
β€’ makes β€œcourtroom-grade reconstructability” a routine byproduct of normal ops
β€’ turns governance SLO breaches into explicit state transitions, not dashboard trivia
β€’ separates stable audit spine from payload store, so erasure removes access without destroying proof
β€’ prevents incident-time improvisation from breaking determinism, chain-of-custody, or export integrity

What’s inside:
β€’ the operating model: *Audit Spine vs Payload Store*
β€’ three routine bundle tiers: daily governance bundles, weekly compliance bundles, and triggered incident-ready bundles
β€’ trigger rules where CAS / ACR / RBL / EOH breaches automatically emit bundles and degrade governance state
β€’ an end-to-end pipeline: collect β†’ shape/omit β†’ canonicalize β†’ digest β†’ resolve refs β†’ seal β†’ sign β†’ verify β†’ retain
β€’ a governed run record for continuous audit itself, including policy, trust, canonicalization, reason-code-set, and registry snapshot bindings

Key idea:
Do not wait until an incident to β€œprepare evidence.”

Make evidence production continuous, sealed, and self-verifyingβ€”so when something breaks, you select the window instead of inventing the proof.

*Continuous audit is not paperwork. It is a control loop on admissibility and autonomy.*