(Dataset based on Pinkstack/syngen-reasoning-0.6b-dataset)

This is a slightly refined version of qingy2024/SynGen-14B with DPO training on qingy2024/SynGen-Antiloop-DPO. This should reduce repetitions and improve quality of generated reasoning traces. See the original model card for a description of what it can do.

Notes:

Everything (training configs, datasets, model weights) is open source.
This model is specifically optimized for R1's reasoning style but GPT-OSS may still work fine (I haven't tested yet).
It's not guaranteed that the model generates perfect CoT every time, but it should not be too hard of a task given that it knows the final answer already.
For sampler settings: temp = 0.7, top_p = 0.95, pretty much default works.

Prompt Format

System Message

<reasoning_style>deepseek_r1</reasoning_style> # Can replace deepseek_r1 with gpt_oss
<system_prompt>Original System Prompt</system_prompt>

Prompt Message

<user>User Message Here</user>
<assistant>Assistant Final Response Here (without reasoning)</assistant>

Output Format

<think>Generated Reasoning</think>

Training Details

Base Model: qingy2024/SynGen-14B
Training Epochs: 1
Learning Rate: 2e-6
Batch Size: 64
Training Method: 16-bit LoRA (rank 64, alpha 128)
Training Hardware: H200
Training Platform: Modal
Total Cost: $20.43 USD
Seed: 42
As of January 1, 2026, this is the biggest model ever trained for reasoning generation!