Model Overview
Model Summary
SmolLM3 is a state-of-the-art, 3-billion parameter open multimodal language model developed by Hugging Face, designed to deliver enterprise-grade capabilities in a lightweight, edge-optimized package.It redefines the efficiency frontier by outperforming comparable 3B models and rivaling larger 4B architectures in reasoning, coding, and multilingual tasks.
#Core Architecture & Innovations
Built on a highly optimized decoder-only transformer architecture, SmolLM3 incorporates several advanced design choices to maximize performance-per-parameter:
Grouped Query Attention (GQA) Replaces standard multi-head attention with 4 key/value heads against 16 query heads.This significantly reduces the KV cache size, lowering memory bandwidth requirements and speeding up inference, particularly for long sequences.
No Positional Encoding (NoPE): Eliminates traditional rotational embeddings in specific layers to better generalize across long contexts without the perplexity degradation often seen in standard RoPE implementations.
Long-Context Processing: Natively trained with a 64k token context window, it utilizes YaRN (Yet another RoPE extension) to extrapolate effectively up to 128,000 tokens.This allows it to process entire books, massive codebases, and extensive retrieval documents in a single pass.
#Dual-Mode Reasoning & Agentic Capabilities
A standout feature of the instruction-tuned variant is its Dual-Mode Reasoning, giving developers flexibility based on task complexity:
Non-Thinking Mode: Optimized for speed and direct answers, suitable for chatbots and simple queries.
Deep-Thinking Mode: Activates a chain-of-thought process for complex logic, math, and multi-step problem solving, allowing the model to reason before responding.
The model is also fine-tuned for robust Tool Calling, supporting both standard XML-based tool definitions and Pythonic function calls.This makes it an ideal backbone for lightweight autonomous agents that need to interact with external APIs or environments reliably.
#Training & Multilingual Proficiency
SmolLM3 was trained on a massive 11.2 trillion token corpus, using a rigorous three-stage curriculum that progressed from general web data to high-quality math, code, and synthetic reasoning datasets.It features native proficiency in six languages: English, French, Spanish, German, Italian, and Portuguese, ensuring consistent performance across diverse linguistic tasks.
#Use Cases & Deployment
With its compact size and permissive Apache 2.0 license, SmolLM3 is uniquely positioned for:
Edge AI: Running entirely on-device (mobile phones, laptops) with low latency and high privacy.
RAG Systems: acting as a powerful reasoning engine for Retrieval-Augmented Generation without the cost of calling massive API-based models.
Coding Assistants: providing low-latency code completion and debugging support locally.
For more details, please refer to SmolLM3 GitHub.
Weights are released under the Apache 2 License . Keras model code is released under the Apache 2 License.
Links
- [SmolLM3 Quickstart Notebook](Coming soon..!)
- SmolLM3 API Documentation
- SmolLM3Model Card
- KerasHub Beginner Guide
- KerasHub Model Publishing Guide
Installation
Keras and KerasHub can be installed with:
pip install -U -q keras-hub
pip install -U -q keras
Jax, TensorFlow, and Torch come preinstalled in Kaggle Notebooks. For instructions on installing them in another environment see the Keras Getting Started page.
Available SmolLM3 Presets.
The following model checkpoints are provided by the Keras team. Full code examples for each are available below.
| Preset | Parameters | Description |
|---|---|---|
smollm3_3b_en |
3B | This preset has 3 billion total parameters. Built on 36 layers and utilizes 16 query and 4 key/value attention heads. |
- Downloads last month
- 8