Model Card for ealexeev/TheDrummer-Snowpiercer-15B-v4-NVFP4

This is an NVFP4 quantization of TheDrummer/Snowpiercer-15B-v4.

Quantization Details

Used https://github.com/ealexeev/llm-quantization script.

Calibration dataset size: 256 Calibration data:

HuggingFaceH4/ultrachat_200k
allenai/c4_en
mrcedric98/fiction_books_v8

These were shuffled and mixed at a ratio of 3:2:3

Procedure

python ./quantize_nvfp4.py --model TheDrummer/Snowpiercer-15B-v4 --output ./TheDrummer/Snowpiercer-15B-v4-NVFP4 --size 256 --seed 42 --ultra_chat 3 --c4_en 2 --fiction_v8 3

I had read in VLLM docs that NVFP4 quantization needs very few samples. I ran multiple quants of 32, 64, 128, 256, and 512 samples. This 256 version hit the sweet spot in these particular evals.

Quantization Evals

Metric	Base Model (BF16)	NVFP4 (Quantized)	Delta
ARC Challenge (Logic/Reasoning)	58.19%	58.28%	+0.09%
IFEval (Strict Instruction Following)	36.04%	38.45%	+2.41%
HellaSwag (Flow/Common Sense)	81.39%	80.65%	-0.74%
Winogrande (Ambiguity Resolution)	73.95%	72.93%	-1.02%
Lambada (Perplexity)	3.42	3.65	+0.23

Bias, Risks, and Limitations

This is already a creative fine-tune. It was quantized with that usecase in mind. Probably not gonna pass any leet-coder challenges with this one.

How To Use

bash
vllm serve ealexeev/TheDrummer-Snowpiercer-15B-v4-NVFP4 \
    --tensor-parallel-size 1 \      # 1 GPU
    --gpu-memory-utilization 0.8 \  # Else it will take it all for KV

Downloads last month: 19

Safetensors

Model size

9B params

Tensor type

BF16

F32

F8_E4M3

Model tree for ealexeev/TheDrummer-Snowpiercer-15B-v4-NVFP4

Base model

SillyTilly/ServiceNow-AI-Apriel-Nemotron-15b-Thinker-Chatml

Finetuned

TheDrummer/Snowpiercer-15B-v4

Quantized

(13)

this model