Model Card for ealexeev/TheDrummer-Snowpiercer-15B-v4-NVFP4

This is an NVFP4 quantization of TheDrummer/Snowpiercer-15B-v4.

Quantization Details

Used https://github.com/ealexeev/llm-quantization script.

Calibration dataset size: 256 Calibration data:

  • HuggingFaceH4/ultrachat_200k
  • allenai/c4_en
  • mrcedric98/fiction_books_v8

These were shuffled and mixed at a ratio of 3:2:3

Procedure

python ./quantize_nvfp4.py --model TheDrummer/Snowpiercer-15B-v4 --output ./TheDrummer/Snowpiercer-15B-v4-NVFP4 --size 256 --seed 42 --ultra_chat 3 --c4_en 2 --fiction_v8 3

I had read in VLLM docs that NVFP4 quantization needs very few samples. I ran multiple quants of 32, 64, 128, 256, and 512 samples. This 256 version hit the sweet spot in these particular evals.

Quantization Evals

Metric Base Model (BF16) NVFP4 (Quantized) Delta
ARC Challenge (Logic/Reasoning) 58.19% 58.28% +0.09%
IFEval (Strict Instruction Following) 36.04% 38.45% +2.41%
HellaSwag (Flow/Common Sense) 81.39% 80.65% -0.74%
Winogrande (Ambiguity Resolution) 73.95% 72.93% -1.02%
Lambada (Perplexity) 3.42 3.65 +0.23

Bias, Risks, and Limitations

This is already a creative fine-tune. It was quantized with that usecase in mind. Probably not gonna pass any leet-coder challenges with this one.

How To Use

bash
vllm serve ealexeev/TheDrummer-Snowpiercer-15B-v4-NVFP4 \
    --tensor-parallel-size 1 \      # 1 GPU
    --gpu-memory-utilization 0.8 \  # Else it will take it all for KV
Downloads last month
19
Safetensors
Model size
9B params
Tensor type
BF16
F32
F8_E4M3
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for ealexeev/TheDrummer-Snowpiercer-15B-v4-NVFP4