Help me run it with llama.cpp and 128K context size

#46

by JLouisBiz - opened Dec 27, 2025

Discussion

JLouisBiz

Dec 27, 2025

•

edited Dec 27, 2025

I have tried to run it as:

    echo "Starting LOCAL model: $MODEL"
    $LLAMA_SERVER --jinja -fa on -c "$CONTEXT" -ngl "$NGL" -v \
          --log-timestamps --host "$HOST" \
          --rope-scaling yarn --yarn-orig-ctx 65536 \
          -m "$MODEL" >> "$LOG" 2>&1 &

yet in llama.cpp I can just see this:

Is there way to run SmolLM3-3B with 128K and with llama.cpp?

fdwork

8 days ago

I ran it up with 128K context with:
./llama-server -t 4 -ngl 0 -c 131072 -dev none --host 0.0.0.0 -m /home/fjd/Documents/Primary/Resources/AI_Models/text_generation/SmolLM3-3B-128K.gguf --no-jinja

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment