Instructions to use unsloth/DeepScaleR-1.5B-Preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use unsloth/DeepScaleR-1.5B-Preview with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="unsloth/DeepScaleR-1.5B-Preview") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("unsloth/DeepScaleR-1.5B-Preview") model = AutoModelForCausalLM.from_pretrained("unsloth/DeepScaleR-1.5B-Preview") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use unsloth/DeepScaleR-1.5B-Preview with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "unsloth/DeepScaleR-1.5B-Preview" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "unsloth/DeepScaleR-1.5B-Preview", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/unsloth/DeepScaleR-1.5B-Preview
- SGLang
How to use unsloth/DeepScaleR-1.5B-Preview with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "unsloth/DeepScaleR-1.5B-Preview" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "unsloth/DeepScaleR-1.5B-Preview", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "unsloth/DeepScaleR-1.5B-Preview" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "unsloth/DeepScaleR-1.5B-Preview", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio new
How to use unsloth/DeepScaleR-1.5B-Preview with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for unsloth/DeepScaleR-1.5B-Preview to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for unsloth/DeepScaleR-1.5B-Preview to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for unsloth/DeepScaleR-1.5B-Preview to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="unsloth/DeepScaleR-1.5B-Preview", max_seq_length=2048, ) - Docker Model Runner
How to use unsloth/DeepScaleR-1.5B-Preview with Docker Model Runner:
docker model run hf.co/unsloth/DeepScaleR-1.5B-Preview
See our collection for versions of Deepseek-R1 including GGUF & 4-bit formats.
Unsloth's DeepSeek-R1 1.58-bit + 2-bit Dynamic Quants is selectively quantized, greatly improving accuracy over standard 1-bit/2-bit.
Finetune your own Reasoning model like R1 with Unsloth!
We have a free Google Colab notebook for turning Llama 3.1 (8B) into a reasoning model: https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb
✨ Finetune for Free
All notebooks are beginner friendly! Add your dataset, click "Run All", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face.
| Unsloth supports | Free Notebooks | Performance | Memory use |
|---|---|---|---|
| GRPO with Phi-4 (14B) | ▶️ Start on Colab | 2x faster | 80% less |
| Llama-3.2 (3B) | ▶️ Start on Colab | 2.4x faster | 58% less |
| Llama-3.2 (11B vision) | ▶️ Start on Colab | 2x faster | 60% less |
| Qwen2 VL (7B) | ▶️ Start on Colab | 1.8x faster | 60% less |
| Qwen2.5 (7B) | ▶️ Start on Colab | 2x faster | 60% less |
| Llama-3.1 (8B) | ▶️ Start on Colab | 2.4x faster | 58% less |
| Phi-3.5 (mini) | ▶️ Start on Colab | 2x faster | 50% less |
| Gemma 2 (9B) | ▶️ Start on Colab | 2.4x faster | 58% less |
| Mistral (7B) | ▶️ Start on Colab | 2.2x faster | 62% less |
- This Llama 3.2 conversational notebook is useful for ShareGPT ChatML / Vicuna templates.
- This text completion notebook is for raw text. This DPO notebook replicates Zephyr.
- * Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster.
DeepScaleR Overview
DeepScaleR-1.5B-Preview is a language model fine-tuned from DeepSeek-R1-Distilled-Qwen-1.5B using distributed reinforcement learning (RL) to scale up to long context lengths. The model achieves 43.1% Pass@1 accuracy on AIME 2024, representing a 15% improvement over the base model (28.8%) and surpassing OpenAI's O1-Preview performance with just 1.5B parameters.
Data
Our training dataset consists of approximately 40,000 unique problem-answer pairs compiled from:
- AIME problems (1984-2023)
- AMC problems (prior to 2023)
- Omni-MATH dataset
- Still dataset
Training Recipe
We employ Deepseek's Group Relative Policy Optimization (GRPO), a simplified RL algorithm that extends PPO by:
- Normalizing advantage function over all samples generated from the same prompt.
- Applying KL divergence regularization on top of PPO's surrogate loss to prevent significant policy drift.
Reward Function: Our reward function is simple but effective:
- 1 for correct answers passing LaTeX/Sympy checks
- 0 for incorrect or improperly formatted answers
- Note: No partial rewards (such as PRMs) or intermediate feedback.
Iterative Context Lengthening: A key challenge in scaling RL for reasoning is compute cost. Our approach trains models with progressively longer contexts as the model improves, thus saving monetary costs and end2end training time:
- Initial 8K Context (0-1040 steps):
- 22.9% -> 33% Pass@1 on AIME 2024
- Trained on 8 A100-80GB GPUs, BS= (Prompts) * (Samples/Prompt) = 128 * 8 = 1024
- Extended to 16K (steps 1040-1520):
- 33% -> 43% Pass@1 on AIME 2024
- Trained on 32 A100-80GB GPUs, BS= (Prompts) * (Samples/Prompt) = 128 * 16 = 2048
- Further extended to 24K (step 1520+):
- 38% -> 43% Pass@1 on AIME 2024
- Trained on 32 A100-80GB GPUs, BS= (Prompts) * (Samples/Prompt) = 128 * 16 = 2048
- Significant improvements within <200 steps
A more detailed description of the training recipe can be found in our blog post.
Evaluation
We report Pass@1 accuracy averaged over 16 samples for each problem.
| Model | AIME 2024 | MATH 500 | AMC 2023 | Minerva Math | OlympiadBench | Avg. |
|---|---|---|---|---|---|---|
| 2.5-7B-Instruct | 13.3 | 79.8 | 50.6 | 34.6 | 40.7 | 43.8 |
| rStar-Math-7B | 26.7 | 78.4 | 47.5 | - | 47.1 | - |
| Eurus-2-7B-PRIME | 26.7 | 79.2 | 57.8 | 38.6 | 42.1 | 48.9 |
| Qwen2.5-7B-SimpleRL | 26.7 | 82.4 | 62.5 | 39.7 | 43.3 | 50.9 |
| DeepSeek-R1-Distill-Qwen-1.5B | 28.8 | 82.8 | 62.9 | 26.5 | 43.3 | 48.9 |
| Still-1.5B | 32.5 | 84.4 | 66.7 | 29.0 | 45.4 | 51.6 |
| DeepScaleR-1.5B-Preview | 43.1 | 87.8 | 73.6 | 30.2 | 50.0 | 57.0 |
| O1-Preview | 40.0 | 81.4 | - | - | - | - |
Serving DeepScaleR
Our model can be served using popular high-performance inference systems:
- vLLM
- Hugging Face Text Generation Inference (TGI)
- SGLang
- TensorRT-LLM
All these systems support the OpenAI Chat Completions API format.
License
This project is released under the MIT License, reflecting our commitment to open and accessible AI development. We believe in democratizing AI technology by making our work freely available for anyone to use, modify, and build upon. This permissive license ensures that researchers, developers, and enthusiasts worldwide can leverage and extend our work without restrictions, fostering innovation and collaboration in the AI community.
Acknowledgement
- Our training experiments are powered by our heavily modified fork of Verl, an open-source RLHF library.
- Our model is trained on top of
DeepSeek-R1-Distill-Qwen-1.5B. - Our work is done as part of Berkeley Sky Computing Lab and Berkeley AI Research.
Citation
@misc{deepscaler2025,
title={DeepScaleR: Surpassing O1-Preview with a 1.5B Model by Scaling RL},
author={Michael Luo and Sijun Tan and Justin Wong and Xiaoxiang Shi and William Tang and Manan Roongta and Colin Cai and Jeffrey Luo and Tianjun Zhang and Erran Li and Raluca Ada Popa and Ion Stoica},
year={2025},
howpublished={\url{https://pretty-radio-b75.notion.site/DeepScaleR-Surpassing-O1-Preview-with-a-1-5B-Model-by-Scaling-RL-19681902c1468005bed8ca303013a4e2}},
note={Notion Blog}
year={2025}
}
- Downloads last month
- 52
Model tree for unsloth/DeepScaleR-1.5B-Preview
Base model
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B