embeddinggemma-300m-GGUF-with-dense-modules

Original model: google/embeddinggemma-300m

Original research into numerical stability of embedding models between llama-cpp and Ollama

Motivation

When migrating our inference environment from Ollama to llama-cpp we noticed that the currently available GGUF conversions for this model were missing the "dense modules" resulting in vastly different output.

Also, the original GGUF files from Ollama were incompatible with llama-cpp as the model architecture deviated slightly.

We thus decided to create a custom model derived from the original model.

Process

uv run python ../llama.cpp/convert_hf_to_gguf.py ../embeddinggemma-300m --outfile ../embeddinggemma-300m-GGUF-with-dense-modules/embeddinggemma-300M-BF16-with-dense.gguf --outtype bf16 --sentence-transformers-dense-modules

Where ../llama.cpp is a local clone of the llama.cpp repository and ../embeddinggemma-300m is a local clone of the original model repository google/embeddinggemma-300m and ../embeddinggemma-300m-GGUF-with-dense-modules is the target directory for this repository.

Downloads last month: 18

GGUF

Model size

0.3B params

Architecture

gemma-embedding

Hardware compatibility

16-bit

Model tree for flyingcircusio/embeddinggemma-300m-GGUF-with-dense-modules

Base model

google/embeddinggemma-300m

Quantized

(40)

this model