File size: 7,774 Bytes
bdd9de5 f35ce4b bdd9de5 7a7885d bdd9de5 f35ce4b bdd9de5 15e66a8 bdd9de5 f35ce4b bdd9de5 7a7885d bdd9de5 7a7885d bdd9de5 7a7885d bdd9de5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 |
---
language:
- en
license: apache-2.0
tags:
- math
- reasoning
- agent
- qwen
- grpo
- reinforcement-learning
base_model: Qwen/Qwen3-4B-Thinking-2507
datasets:
- nvidia/OpenMathReasoning
metrics:
- accuracy
library_name: transformers
pipeline_tag: text-generation
---
# DeepMath: A Lightweight Math Reasoning Agent
<img src="https://huggingface.co/proxy/cdn-uploads.huggingface.co/production/uploads/62d93cd728f9c86a4031562e/ndb_WmPavW1MONAjsGpYT.jpeg" style="width:600px" alt="An LLM is using a calculator to answer questions." />
## Model Description
**DeepMath** is a 4B parameter mathematical reasoning model that combines a fine-tuned LLM with a sandboxed Python executor. Built on [Qwen3-4B Thinking](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507) and trained with **GRPO (Group Relative Policy Optimization)**, DeepMath generates concise Python snippets for computational steps instead of verbose text explanations, significantly reducing errors and output length.
- **Developed by:** Intel AI Labs
- **Model type:** Causal language model with agent capabilities
- **Language:** English
- **Base model:** [Qwen3-4B Thinking](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507)
- **License:** Apache 2.0
- **Blog:**: π <https://huggingface.co/blog/intel-deepmath>
- **Repository:** π» [https://github.com/IntelLabs/DeepMath](https://github.com/IntelLabs/DeepMath)
## Key Features
β
**Code-driven reasoning:** Generates short Python snippets for intermediate computational steps
β
**Sandboxed execution:** No file I/O, no network calls, strict timeouts
β
**Improved accuracy:** Offloading computation reduces arithmetic errors
β
**Reduced verbosity:** Up to 66% shorter outputs compared to baseline
β
**Safe and auditable:** Deterministic execution with readable code snippets
## Model Architecture
DeepMath uses a LoRA adapter fine-tuned on top of Qwen3-4B Thinking with the following components:
- **Agent Interface:** Outputs special tokens for Python code execution during reasoning
- **Executor:** Sandboxed Python environment with allow-listed modules
- **Safety Constraints:** Per-snippet timeouts, no file/network access
- **Training Method:** GRPO with accuracy and code generation rewards
<figure>
<img src="https://huggingface.co/proxy/cdn-uploads.huggingface.co/production/uploads/62d93cd728f9c86a4031562e/zOcvJ2DY61QZyozarsKbT.png" style="width:400px" alt="Changes to vLLM client and server in TRL library." />
<figcaption><p><em>Figure 1: The vLLM client and server were modified to use the DeepMath agent in generating the candidates, while using the vLLM backend.</em></p></figcaption>
</figure>
## Training Details
### Training Data
- **Dataset:** [OpenMathReasoning](https://huggingface.co/datasets/nvidia/OpenMathReasoning) (tool-usage subset)
- **Note:** GRPO training only uses problems, not solutions
- **In-context Learning:** 4 solved examples demonstrating agent call syntax and patterns
### Training Procedure
**GRPO (Group Relative Policy Optimization)** fine-tuning with:
- **Accuracy Reward:** +1 for correct answers
- **Code Generation Reward:** +1 for using code snippets (weighted 10:1 vs. accuracy)
- **Length Constraint:** GRPO completions limited to 5k tokens
- **Temperature Scheduling:** Linear schedule from T=1.2 β T=0.7 during training
- **Infrastructure:** Modified TRL library's vLLM client and server
### Training Infrastructure
- Base inference engine: [vLLM](https://github.com/vllm-project/vllm)
- Agent framework: Based on [SmolAgents](https://github.com/huggingface/smolagents/)
- Training framework: Modified [TRL](https://github.com/huggingface/trl) GRPO trainer
## Performance
### Benchmark Results
We evaluated DeepMath on four mathematical reasoning datasets using **majority@16** and mean output length metrics:
<img src="https://huggingface.co/proxy/cdn-uploads.huggingface.co/production/uploads/62d93cd728f9c86a4031562e/mBuINzNvjDKdZEuIqzJeO.png" style="width:800px" alt="Main results table showing performance across MATH500, AIME, HMMT, and HLE datasets."/>
**Key Findings:**
- **Accuracy:** Improved performance on challenging datasets (AIME, HMMT, HLE)
- **Efficiency:** Up to **66% reduction** in output length
- **Robustness:** Consistent improvements when combining agent + GRPO training
### Evaluation Datasets
- **MATH500:** Subset of the MATH dataset
- **AIME:** American Invitational Mathematics Examination problems
- **HMMT:** Harvard-MIT Mathematics Tournament problems
- **HLE:** High-level exam problems
<figure>
<img src="https://huggingface.co/proxy/cdn-uploads.huggingface.co/production/uploads/62d93cd728f9c86a4031562e/a-kn3oHdlxTP_L-63N9LX.png" style="width:700px" alt="Output example showing Python code generation and execution." />
<figcaption><p><em>Figure 2: Example output where Python code is generated, evaluated, and the result is inserted into the reasoning trace.</em></p></figcaption>
</figure>
## Usage
### Installation
```bash
# Install uv package manager
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone repository
git clone https://github.com/IntelLabs/DeepMath.git
cd DeepMath
# Install dependencies
uv pip install -r requirements.txt
uv pip install -e .
```
### Basic Inference
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Intel/deepmath-v1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Example problem
problem = "What is the sum of the first 100 positive integers?"
inputs = tokenizer(problem, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=3000)
print(tokenizer.decode(outputs[0]))
```
### Inference with Agent
For full agent capabilities with sandboxed Python execution:
```bash
python inference.py \
+model.use_vllm=true \
+model.math_agent=true \
+model.examples=deep_math/fewshot.txt \
model.generation.max_new_tokens=3000 \
+model.max_agent_output=20000 \
+model.max_steps=50 \
model.model_name_or_path=Intel/deepmath-v1 \
hf_tag=HuggingFaceH4/MATH-500 \
generated_file=output.jsonl
```
See the [repository](https://github.com/IntelLabs/DeepMath) for complete usage examples.
## Limitations and Biases
### Limitations
- **Scope:** Optimized for mathematical reasoning tasks; may not generalize to other domains
- **Problem Types:** Evaluated on contest-style math problems; performance on open-ended mathematical creativity or formal proofs is unknown
- **Model Size:** 4B parameters may limit reasoning depth on extremely complex problems
- **Code Execution:** Requires sandboxed environment for full agent capabilities
### Safety Considerations
β οΈ **Code Execution Risk:** This model generates and executes Python code. While DeepMath uses strict sandboxing and resource limits, any deployment should:
- Carefully manage attack surfaces
- Enforce rate limits
- Use proper isolation (containers, VMs)
- Monitor resource usage
- Validate generated code before execution in production
### Ethical Considerations
- The model is trained on mathematical problem-solving datasets and should not be used for decision-making in critical applications without human oversight
- Generated code should be reviewed before execution in production environments
- The model may reflect biases present in the training data
## Citation
If you use DeepMath in your research, please cite:
```bibtex
@software{deepmath2025,
author = {Fleischer, Daniel and Berchansky, Moshe and Wasserblat, Moshe},
title = {DeepMath: A Lightweight Math Reasoning Agent for LLMs},
year = {2025},
publisher = {Intel AI Labs},
url = {https://github.com/IntelLabs/DeepMath}
}
```
## Model Card Contact
For questions or issues, please open an issue on the [GitHub repository](https://github.com/IntelLabs/DeepMath).
|