Update MXFP4 format to compressed-tensors

#3
by mgoin - opened

Hey @GadflyII great work with all your checkpoints and features on vLLM!

I wanted to let you know specifically for mxfp4 that I think we'd like to keep mxfp4.py specific for gpt-oss in upstream for a few reasons, but mostly since that model is the only one using it at the moment.

We have support for mxfp4 w4a16 the same as gpt-oss but generalized through the compressed-tensors pathway (https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a16_mxfp4.py) , see this example to make a model of your own https://github.com/vllm-project/llm-compressor/blob/main/examples/quantization_w4a16_fp4/mxfp4/qwen3_example.py

See the uploaded model here which is tested in CI https://huggingface.co/nm-testing/Qwen3-30B-A3B-MXFP4A16

Remaking your checkpoint in that format may help you run this model on upstream vLLM as-is. LMK what you think

Hey, thanks for reaching out. I will absolutely do that and see how it goes; I am deep in a training run so it will be a few days at least, I need more hardware 😞

Sign up or log in to comment