OWG (ONNXConfig for all)

lewtun

submitted a paper to Daily Papers 4 days ago

Single-minus gluon tree amplitudes are nonzero

Paper • 2602.12176 • Published 5 days ago • 6

lewtun

submitted a paper to Daily Papers 5 days ago

Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL

Paper • 2602.03773 • Published 14 days ago • 9

IlyasMoutawwakil

posted an update 21 days ago

Post

3005

Transformers v5 just landed! 🚀
It significantly unifies and reduces modeling code across architectures, while opening the door to a whole new class of performance optimizations.

My favorite new feature? 🤔
The new dynamic weight loader + converter. Here’s why 👇

Over the last few months, the core Transformers maintainers built an incredibly fast weight loader, capable of converting tensors on the fly while loading them in parallel threads. This means we’re no longer constrained by how parameters are laid out inside the safetensors weight files.

In practice, this unlocks two big things:
- Much more modular modeling code. You can now clearly see how architectures build on top of each other (DeepSeek v2 → v3, Qwen v2 → v3 → MoE, etc.). This makes shared bottlenecks obvious and lets us optimize the right building blocks once, for all model families.
- Performance optimizations beyond what torch.compile can do alone. torch.compile operates on the computation graph, but it can’t change parameter layouts. With the new loader, we can restructure weights at load time: fusing MoE expert projections, merging attention QKV projections, and enabling more compute-dense kernels that simply weren’t possible before.

Personally, I'm honored to have contributed in this direction, including the work on optimizing MoE implementations and making modeling code more torch-exportable, so these optimizations can be ported cleanly across runtimes.

Overall, Transformers v5 is a strong signal of where the community and industry are converging: Modularity and Performance, without sacrificing Flexibility.

Transformers v5 makes its signature from_pretrained an entrypoint where you can mix and match:
- Parallelism
- Quantization
- Custom kernels
- Flash/Paged attention
- Continuous batching
- ...

Kudos to everyone involved! I highly recommend the:
Release notes: https://github.com/huggingface/transformers/releases/tag/v5.0.0
Blog post: https://huggingface.co/blog/transformers-v5

3 replies

·

IlyasMoutawwakil

posted an update 26 days ago

Post

2363

After 2 months of refinement, I'm happy to announce that a lot of Transformers' modeling code is now significantly more torch-compile & export-friendly 🔥

Why it had to be done 👇
PyTorch's Dynamo compiler is increasingly becoming the default interoperability layer for ML systems. Anything that relies on torch.export or torch.compile, from model optimization to cross-framework integrations, benefits directly when models can be captured as a single dynamo-traced graph !

Transformers models are now easier to:
⚙️ Compile end-to-end with torch.compile backends
📦 Export reliably via torch.export and torch.onnx.export
🚀 Deploy to ONNX / ONNX Runtime, Intel Corporation's OpenVINO, NVIDIA AutoDeploy (TRT-LLM), AMD's Quark, Meta's Executorch and more hardware-specific runtimes.

This work aims at unblocking entire TorchDynamo-based toolchains that rely on exporting Transformers across runtimes and accelerators.

We are doubling down on Transformers commitment to be a first-class citizen of the PyTorch ecosystem, more exportable, more optimizable, and easier to deploy everywhere.

There are definitely some edge-cases that we still haven't addressed so don't hesitate to try compiling / exporting your favorite transformers and to open issues / PRs.

PR in the comments ! More updates coming coming soon !

1 reply

·

Aurelien-Morgan

posted an update 2 months ago

Post

347

Hey, I went to Hangzhou to talk about retrain-pipelines at the GOSIM Foundation's conference last september.
The recording just got released. Go check it out !
https://www.youtube.com/watch?v=nmrMachM5aM
Slides are there :
https://docs.google.com/presentation/d/1hnAzHJ0SbeAOtGJir-iH84RBtXT1OxVT/

2 replies

·

vumichien

authored 2 papers 4 months ago

MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources

Paper • 2509.25531 • Published Sep 29, 2025 • 9

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

Paper • 2510.08697 • Published Oct 9, 2025 • 39

louisbrulenaudet

posted an update 6 months ago

Post

6287

Supercharge Apple’s Shortcuts using Cloudflare Workers and Gemini within minutes (and for free, up to 1,500 requests per day) ☁️✨

Hello everyone, last week, while experimenting for fun, I created an API that allows you to easily access AI models (in this case, Google's) from the Shortcut app in order to analyze data from my apps and make the most of it thanks to the generative capabilities of advanced models.

It costs me nothing, and I think it might be good to share it so that others can build on it.

In README.md, you will find everything you need to get started and put your own microservice into production, which you can call from the app’s HTTP request features.

You will simply be asked to have a free Cloudflare account and an API key obtained from Google's AI Studio.

Feel free to take a look and get back to me if you encounter any problems during deployment.

Here is the GitHub repo where you can find all the source code and run it on your own: https://github.com/louisbrulenaudet/genai-api

louisbrulenaudet

posted an update 6 months ago

Post

763

Although more and more code editors are aligning themselves with the AGENTS.md file standard, some still use specific nomenclatures that can make it difficult to maintain different configuration files when several people are working on the same project with different agents.

Bodyboard addresses this by generating canonical instructions for code helpers from a single AGENTS.md file, thereby streamlining the production of adapter outputs for Gemini CLI, Copilot, Cline, Claude, Rules, Windsurf, and OpenAI Codex integrations.

You just have to:

npm install -g bodyboard

Then run, at the root of your project:

bodyboard all

Link to npm: https://www.npmjs.com/package/bodyboard
Link to the GitHub repo: https://github.com/louisbrulenaudet/bodyboard

It's a very simple project, but it addresses certain issues I've encountered, so why not make it available to everyone...

If you have other ideas for adapters to create, feel free to open a PR on the GitHub repo.

IlyasMoutawwakil

posted an update 7 months ago

Post

3513

🚀 Optimum: The Last v1 Release 🚀
Optimum v1.27 marks the final major release in the v1 series. As we close this chapter, we're laying the groundwork for a more modular and community-driven future:
- Optimum v2: A lightweight core package for porting Transformers, Diffusers, or Sentence-Transformers to specialized AI hardware/software/accelerators..
- Optimum‑ONNX: A dedicated package where the ONNX/ONNX Runtime ecosystem lives and evolves, faster-moving and decoupled from the Optimum core.

🎯 Why this matters:
- A clearer governance path for ONNX, fostering stronger community collaboration and improved developer experience..
- Enable innovation at a faster pace in a more modular, open-source environment.

💡 What this means:
- More transparency, broader participation, and faster development driven by the community and key actors in the ONNX ecosystem (PyTorch, Microsoft, Joshua Lochner 👀, ...)
- A cleaner, more maintainable core Optimum, focused on extending HF libraries to special AI hardware/software/accelerators tooling and used by our partners (Intel Corporation, Amazon Web Services (AWS), AMD, NVIDIA, FuriosaAI, ...)

🛠️ Major updates I worked on in this release:
✅ Added support for Transformers v4.53 and SmolLM3 in ONNX/ONNXRuntime.
✅ Solved batched inference/generation for all supported decoder model architectures (LLMs).

✨ Big shoutout to @echarlaix for leading the refactoring work that cleanly separated ONNX exporter logic and enabled the creation of Optimum‑ONNX.

📝 Release Notes: https://lnkd.in/gXtE_qji
📦 Optimum : https://lnkd.in/ecAezNT6
🎁 Optimum-ONNX: https://lnkd.in/gzjyAjSi
#Optimum #ONNX #OpenSource #HuggingFace #Transformers #Diffusers

AtAndDev

posted an update 7 months ago

Post

621

Qwen 3 Coder is a personal attack to k2, and I love it.
It achieves near SOTA on LCB while not having reasoning.
Finally people are understanding that reasoning isnt necessary for high benches...

Qwen ftw!

DECENTRALIZE DECENTRALIZE DECENTRALIZE

louisbrulenaudet

posted an update 7 months ago

Post

2860

Because hackathons are often the starting point for many AI projects, I've created a Python-backend template incorporating my feedback to streamline collaboration and urgent deployments 🏎️

Within a year, I had the opportunity to participate in hackathons organized by Mistral, OpenAI, and DeepMind and this GitHub template is structured around several fundamental building blocks and recommendations I offer developers eager to participate in their first hackathon, whether as part of a team or individually. Its emphasis is on rapid setup and deployment through:
- uv as a package manager, simplifying usage via a series of pre-configured make commands.
- FastAPI for API management, structured in a modular architecture designed to minimize branch conflicts during merges to main branches (using minimal health-check and ping routes to verify Docker’s proper execution and backend accessibility on the local network).
- Pydantic for validation and type handling, which simplifies debugging and enhances understanding of data objects.
- A set of custom instructions tailored for agents (Cline and GitHub Copilot), aimed at improving overall comprehension of the application and optimizing the vibe-coding experience.

This template includes unit tests with a 100% success rate and test coverage, as well as a minimal CI file ensuring that the FastAPI application runs correctly. Thus, merging code that breaks the server into production becomes impossible ⛔️

In general, I would reiterate an essential piece of advice: your two main adversaries are branch conflicts—particularly when the same file is modified concurrently within a brief period, especially if your architecture isn’t built for scalability—and deployment issues under urgent circumstances ⏱️

Link to GitHub: https://github.com/louisbrulenaudet/hackathon-backend

Simply issue these commands and you can ship your code at the speed of light:

make init
make dev

louisbrulenaudet

posted an update 8 months ago

Post

1244

🌐 Clinical Trials Dataset now available on Hugging Face! 🧬

I’ve just released a comprehensive, ML-ready dataset featuring 500,000+ clinical trial records sourced directly from ClinicalTrials.gov for biomedical NLP, healthcare analytics, and clinical research applications 🤗

I wanted to produce the most complete and up-to-date dump with all raw data partially flattened to simplify extraction, self-querying and processing.

Do you have any ideas about what we can do with it? Using descriptions to enhance specialized embedding models?

louisbrulenaudet/clinical-trials

AtAndDev

posted an update 9 months ago

Post

3109

deepseek-ai/DeepSeek-R1-0528

This is the end

1 reply

·

Aurelien-Morgan

posted an update 9 months ago

Post

462

Hey, I'll be presenting @retrain-pipelines and almighty function-calling at the Hugging Face Paris HQ, you guys.
Monday evening. Lightning-talk style. With AI Tinkerers.

Come hang !

https://paris.aitinkerers.org/p/ai-tinkerers-paris-ai21-labs-takeover-on-may-19th

https://huggingface.co/blog/Aurelien-Morgan/the-almighty-function-caller

lewtun

authored a paper 9 months ago

Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning

Paper • 2504.11354 • Published Apr 15, 2025 • 6

mkluczek

posted an update 10 months ago

Post

389

Expansion of Global and Dense Open Embeddings Dataset of Earth 🌍

We updated our previous embeddings release with three models MMEarth and DeCUR-S2, DeCUR-S1 of the Major TOM embeddings dataset, developed in collaboration with CloudFerro S.A. asterisk labs and Φ-lab, European Space Agency - ESA. Together with @mikonvergence , Jędrzej S. Bojanowski, we extend the open-access collection of open dataset of Copernicus embeddings built at global scale, providing dense coverage across the entire acquisition area of Sentinel-1 and Sentinel-2 sensors.

Total embedding resources after the update:
- 51 TB of AI-embeddings generated from processed Sentinel data,
- over 40 billion embedding vectors,
- processing of 147 TB of raw satellite data,
- analysis covering more than 15 million Sentinel-1 and Sentinel-2 scenes and more than 16 trillion pixels.

This project delivers open and free vectorized expansions of Major TOM datasets available on CREODIAS and Hugging Face, setting a new standard for embedding releases and enabling lightweight, scalable ingestion of Earth Observation (EO) data for countless applications.

Datasets:
Major-TOM/Core-S2L2A-MMEarth
Major-TOM/Core-S2L1C-DeCUR
Major-TOM/Core-S1RTC-DeCUR

#EarthObservation #AI #CloudFerro #asterisklabs #ESA

Aurelien-Morgan

posted an update 10 months ago

Post

3156

The Almighty function-caller

How would you like to build smart GenAi infrastructure ?
Give extensive tools memory to your edge agentic system,
And optimize the resources it takes to run yet a high-performance set of agents ?

We came up with a novel approach to function-calling at scale for smart companies and corporate-grade use-cases.

Read our full-fledged blog article on this here on Hugging Face :
https://huggingface.co/blog/Aurelien-Morgan/the-almighty-function-caller

Aurelien-Morgan

posted an update 10 months ago

Post

677

retrain-pipelines 0.1.2 finally dropped. It comes with a hot Hugging Face Hub integration. Go check it out. We have 2 articles about it coming up. One already fully written so, be on the lookout !
@retrain-pipelines

Also, I'll be volunteering at GOSIM AI Paris 2025. If you're interested in chatting, hmu.

lewtun

authored a paper 11 months ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7, 2025 • 205

ONNXConfig for all

AI & ML interests

Recent Activity

Single-minus gluon tree amplitudes are nonzero

Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL

MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning

SmolVLM: Redefining small and efficient multimodal models

AI & ML interests

Recent Activity

Team members 164

OWG's activity