Model Card for ealexeev/TheDrummer-Snowpiercer-15B-v4-NVFP4
This is an NVFP4 quantization of TheDrummer/Snowpiercer-15B-v4.
Quantization Details
Used https://github.com/ealexeev/llm-quantization script.
Calibration dataset size: 256 Calibration data:
- HuggingFaceH4/ultrachat_200k
- allenai/c4_en
- mrcedric98/fiction_books_v8
These were shuffled and mixed at a ratio of 3:2:3
Procedure
python ./quantize_nvfp4.py --model TheDrummer/Snowpiercer-15B-v4 --output ./TheDrummer/Snowpiercer-15B-v4-NVFP4 --size 256 --seed 42 --ultra_chat 3 --c4_en 2 --fiction_v8 3
I had read in VLLM docs that NVFP4 quantization needs very few samples. I ran multiple quants of 32, 64, 128, 256, and 512 samples. This 256 version hit the sweet spot in these particular evals.
Quantization Evals
| Metric | Base Model (BF16) | NVFP4 (Quantized) | Delta |
|---|---|---|---|
| ARC Challenge (Logic/Reasoning) | 58.19% | 58.28% | +0.09% |
| IFEval (Strict Instruction Following) | 36.04% | 38.45% | +2.41% |
| HellaSwag (Flow/Common Sense) | 81.39% | 80.65% | -0.74% |
| Winogrande (Ambiguity Resolution) | 73.95% | 72.93% | -1.02% |
| Lambada (Perplexity) | 3.42 | 3.65 | +0.23 |
Bias, Risks, and Limitations
This is already a creative fine-tune. It was quantized with that usecase in mind. Probably not gonna pass any leet-coder challenges with this one.
How To Use
bash
vllm serve ealexeev/TheDrummer-Snowpiercer-15B-v4-NVFP4 \
--tensor-parallel-size 1 \ # 1 GPU
--gpu-memory-utilization 0.8 \ # Else it will take it all for KV
- Downloads last month
- 19
Model tree for ealexeev/TheDrummer-Snowpiercer-15B-v4-NVFP4
Finetuned
TheDrummer/Snowpiercer-15B-v4