embeddinggemma-300m-GGUF-with-dense-modules

Original model: google/embeddinggemma-300m

Original research into numerical stability of embedding models between llama-cpp and Ollama

Motivation

When migrating our inference environment from Ollama to llama-cpp we noticed that the currently available GGUF conversions for this model were missing the "dense modules" resulting in vastly different output.

Also, the original GGUF files from Ollama were incompatible with llama-cpp as the model architecture deviated slightly.

We thus decided to create a custom model derived from the original model.

Process

uv run python ../llama.cpp/convert_hf_to_gguf.py ../embeddinggemma-300m --outfile ../embeddinggemma-300m-GGUF-with-dense-modules/embeddinggemma-300M-BF16-with-dense.gguf --outtype bf16 --sentence-transformers-dense-modules

Where ../llama.cpp is a local clone of the llama.cpp repository and ../embeddinggemma-300m is a local clone of the original model repository google/embeddinggemma-300m and ../embeddinggemma-300m-GGUF-with-dense-modules is the target directory for this repository.

Downloads last month
18
GGUF
Model size
0.3B params
Architecture
gemma-embedding
Hardware compatibility
Log In to add your hardware

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for flyingcircusio/embeddinggemma-300m-GGUF-with-dense-modules

Quantized
(40)
this model