(Dataset based on Pinkstack/syngen-reasoning-0.6b-dataset)
This is a slightly refined version of qingy2024/SynGen-14B with DPO training on qingy2024/SynGen-Antiloop-DPO. This should reduce repetitions and improve quality of generated reasoning traces. See the original model card for a description of what it can do.
Notes:
- Everything (training configs, datasets, model weights) is open source.
- This model is specifically optimized for R1's reasoning style but GPT-OSS may still work fine (I haven't tested yet).
- It's not guaranteed that the model generates perfect CoT every time, but it should not be too hard of a task given that it knows the final answer already.
- For sampler settings:
temp = 0.7, top_p = 0.95, pretty much default works.
Prompt Format
System Message
<reasoning_style>deepseek_r1</reasoning_style> # Can replace deepseek_r1 with gpt_oss
<system_prompt>Original System Prompt</system_prompt>
Prompt Message
<user>User Message Here</user>
<assistant>Assistant Final Response Here (without reasoning)</assistant>
Output Format
<think>Generated Reasoning</think>
Training Details
- Base Model: qingy2024/SynGen-14B
- Training Epochs: 1
- Learning Rate: 2e-6
- Batch Size: 64
- Training Method: 16-bit LoRA (rank 64, alpha 128)
- Training Hardware: H200
- Training Platform: Modal
- Total Cost: $20.43 USD
- Seed: 42
- As of January 1, 2026, this is the biggest model ever trained for reasoning generation!
