TranslateGemma 4B IT — Android / Google AI Edge Bundles

On-device translation model for Android using Google AI Edge. Converts google/translategemma-4b-it (55 languages, 4B params) into formats that run locally on Android without internet or cloud APIs.

Google only publishes WebGPU-only TFLite files. This repo bridges that gap with CPU/XNNPACK-compatible .litertlm bundles (LiteRT-LM format) with embedded chat template, including a multimodal image+text Android bundle for PrivateAITranslate.

Files

File	Size	Notes
`artifacts/int4-generic/translategemma-4b-it-int4-generic.litertlm`	~2 GB	INT4 blockwise quant — faster, lower RAM
`artifacts/dynamic_int8-generic/translategemma-4b-it-dynamic_int8-generic.litertlm`	~4 GB	Dynamic INT8 — better quality
`artifacts/int4-multimodal/translategemma-4b-it-int4-multimodal.litertlm`	~2.76 GB	INT4 multimodal image+text bundle for structured text translation and LiteRT-LM vision image translation

Start with INT4 if you're unsure — it loads faster and uses less RAM. Use dynamic_int8 for better text translation quality.

The multimodal artifact is the bundle expected by PrivateAITranslate for image translation. If artifacts/int4-multimodal/translategemma-4b-it-int4-multimodal.litertlm is not present in this Hugging Face repo, the app download URL for that model will return 404.

Quick Start — Google AI Edge Gallery (Android)

Download a .litertlm file above
Open Google AI Edge Gallery
Import the model → select your .litertlm file
Use AI Chat mode

Input format

The embedded template supports structured input for any language pair:

<src>LANG</src><dst>LANG</dst><text>YOUR TEXT HERE</text>

Examples:

<src>he</src><dst>en</dst><text>שלום עולם</text>

<src>en</src><dst>he</dst><text>good morning</text>

<src>en</src><dst>fr</dst><text>hello world</text>

<src>ja</src><dst>en</dst><text>ありがとうございます</text>

Use standard ISO 639-1 language codes: en, he, fr, es, de, ar, zh, ja, ko, ru, pt, etc.

Plain text (no tags) is also accepted — the model will attempt translation based on context.

Image translation

The multimodal .litertlm bundle uses LiteRT-LM vision support for image+text translation flows in PrivateAITranslate. It is intended for structured text translation and image translation, not general image captioning.

Device Requirements

Spec	Minimum
RAM	6 GB free (INT4) / 8 GB free (dynamic_int8)
Storage	2 GB (INT4) / 4 GB (dynamic_int8)
OS	Android 10+
Runtime	Google AI Edge Gallery or LiteRT-LM SDK

CPU real-device validation passed on Pixel 10 and Galaxy S22 / S22 Ultra-class targets. GPU execution currently fails initialization in validation and should be treated as experimental and not validated.

What's Different From Google's Official Files

Google's official TranslateGemma TFLite files target WebGPU only — they don't work with MediaPipe LLM inference on Android CPU.

This repo's files use native conversion via litert-torch with a custom build_translategemma_4b() builder that:

Produces proper prefill + decode signatures with KV cache (required by LiteRT-LM)
Uses the correct architecture: 34 layers, 2560 dim, 8 heads, 4 KV heads, sliding-window + global every 6th layer
Fixes qkv_fused_interleaved=False (critical — wrong default caused garbage output in all early builds)
Handles the language_model. weight prefix in TranslateGemma's multimodal safetensors
Embeds a generic Jinja chat template for any language pair via <src>/<dst>/<text> tags

Conversion Scripts

The scripts/ folder contains the full conversion pipeline:

Script	Purpose
`scripts/convert_translategemma_android.py`	Single-quant conversion via litert-torch native strategy
`scripts/bundle_litertlm.py`	Bundle a TFLite + SentencePiece tokenizer into `.litertlm` with embedded Jinja template
`scripts/multi_quant_build_upload.py`	Batch conversion + HuggingFace upload

Reproduce a build

Requirements: 96 GB minimum system RAM, 128 GB preferred, Python 3.12, litert-torch==0.8.0.

Observed multimodal export peak RSS was about 73.9 GiB, so 96 GB is the practical floor once Python, model cache, filesystem cache, and conversion overhead are included. Use 128 GB when running multiple quantization attempts or keeping extra build artifacts.

Ideal Vast.ai image:

Vast.ai PyTorch image or an Ubuntu-based NVIDIA/PyTorch CUDA image, not a bare CUDA runtime image.
Python 3.12 with uv/pip, Git, Git LFS, Hugging Face CLI, and build tools available.
CUDA/PyTorch wheel support matching the rented GPU architecture; use CUDA 12.8+ PyTorch wheels on Blackwell GPUs.
96+ GB RAM and at least 80 GB free disk; 150+ GB disk is safer for source checkpoint, caches, exported TFLite files, and .litertlm bundles.
Persistent /workspace volume if the instance may be stopped/recycled before upload.

# Clone LiteRT-LM builder (needed by bundle_litertlm.py)
git clone --depth=1 https://github.com/google-ai-edge/LiteRT-LM /tmp/litert-lm

pip install litert-torch==0.8.0 mediapipe transformers huggingface-hub

# Download model
huggingface-cli download google/translategemma-4b-it --local-dir ./translategemma-4b-it

# Convert to TFLite with KV cache (~30-60 min, needs 96 GB minimum RAM)
python scripts/convert_translategemma_android.py \
  --model-dir ./translategemma-4b-it \
  --tflite-dir ./tflite_output/dynamic_int8 \
  --output-dir ./output \
  --task-file ./output/translategemma-4b-it-dynamic_int8.task \
  --quantize dynamic_int8 \
  --prefill-seq-len 1024 --kv-cache-max-len 1024 --allow-no-token

# Bundle as .litertlm
python scripts/bundle_litertlm.py \
  --tflite ./tflite_output/dynamic_int8/*.tflite \
  --tokenizer ./translategemma-4b-it/tokenizer.model \
  --output ./output/translategemma-4b-it-dynamic_int8-generic.litertlm \
  --quant dynamic_int8

Supported Languages

TranslateGemma supports 55 languages including Arabic, Chinese, French, German, Hebrew, Hindi, Japanese, Korean, Portuguese, Russian, Spanish, and more. See google/translategemma-4b-it for the full list.

License

Model weights: Google Gemma Terms of Use
Conversion scripts: Apache 2.0

Downloads last month: -

Model tree for barakplasma/translategemma-4b-it-android-task-quantized

Base model

google/translategemma-4b-it

Finetuned

(27)

this model