Instructions to use barakplasma/translategemma-4b-it-android-task-quantized with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LiteRT
How to use barakplasma/translategemma-4b-it-android-task-quantized with LiteRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
TranslateGemma 4B IT β Android / Google AI Edge Bundles
On-device translation model for Android using Google AI Edge. Converts google/translategemma-4b-it (55 languages, 4B params) into formats that run locally on Android without internet or cloud APIs.
Google only publishes WebGPU-only TFLite files. This repo bridges that gap with CPU/XNNPACK-compatible .litertlm bundles (LiteRT-LM format) with embedded chat template, including a multimodal image+text Android bundle for PrivateAITranslate.
Files
| File | Size | Notes |
|---|---|---|
artifacts/int4-generic/translategemma-4b-it-int4-generic.litertlm |
~2 GB | INT4 blockwise quant β faster, lower RAM |
artifacts/dynamic_int8-generic/translategemma-4b-it-dynamic_int8-generic.litertlm |
~4 GB | Dynamic INT8 β better quality |
artifacts/int4-multimodal/translategemma-4b-it-int4-multimodal.litertlm |
~2.76 GB | INT4 multimodal image+text bundle for structured text translation and LiteRT-LM vision image translation |
Start with INT4 if you're unsure β it loads faster and uses less RAM. Use dynamic_int8 for better text translation quality.
The multimodal artifact is the bundle expected by PrivateAITranslate for image translation. If artifacts/int4-multimodal/translategemma-4b-it-int4-multimodal.litertlm is not present in this Hugging Face repo, the app download URL for that model will return 404.
Quick Start β Google AI Edge Gallery (Android)
- Download a
.litertlmfile above - Open Google AI Edge Gallery
- Import the model β select your
.litertlmfile - Use AI Chat mode
Input format
The embedded template supports structured input for any language pair:
<src>LANG</src><dst>LANG</dst><text>YOUR TEXT HERE</text>
Examples:
<src>he</src><dst>en</dst><text>Χ©ΧΧΧ Χ’ΧΧΧ</text>
<src>en</src><dst>he</dst><text>good morning</text>
<src>en</src><dst>fr</dst><text>hello world</text>
<src>ja</src><dst>en</dst><text>γγγγ¨γγγγγΎγ</text>
Use standard ISO 639-1 language codes: en, he, fr, es, de, ar, zh, ja, ko, ru, pt, etc.
Plain text (no tags) is also accepted β the model will attempt translation based on context.
Image translation
The multimodal .litertlm bundle uses LiteRT-LM vision support for image+text translation flows in PrivateAITranslate. It is intended for structured text translation and image translation, not general image captioning.
Device Requirements
| Spec | Minimum |
|---|---|
| RAM | 6 GB free (INT4) / 8 GB free (dynamic_int8) |
| Storage | 2 GB (INT4) / 4 GB (dynamic_int8) |
| OS | Android 10+ |
| Runtime | Google AI Edge Gallery or LiteRT-LM SDK |
CPU real-device validation passed on Pixel 10 and Galaxy S22 / S22 Ultra-class targets. GPU execution currently fails initialization in validation and should be treated as experimental and not validated.
What's Different From Google's Official Files
Google's official TranslateGemma TFLite files target WebGPU only β they don't work with MediaPipe LLM inference on Android CPU.
This repo's files use native conversion via litert-torch with a custom build_translategemma_4b() builder that:
- Produces proper prefill + decode signatures with KV cache (required by LiteRT-LM)
- Uses the correct architecture: 34 layers, 2560 dim, 8 heads, 4 KV heads, sliding-window + global every 6th layer
- Fixes
qkv_fused_interleaved=False(critical β wrong default caused garbage output in all early builds) - Handles the
language_model.weight prefix in TranslateGemma's multimodal safetensors - Embeds a generic Jinja chat template for any language pair via
<src>/<dst>/<text>tags
Conversion Scripts
The scripts/ folder contains the full conversion pipeline:
| Script | Purpose |
|---|---|
scripts/convert_translategemma_android.py |
Single-quant conversion via litert-torch native strategy |
scripts/bundle_litertlm.py |
Bundle a TFLite + SentencePiece tokenizer into .litertlm with embedded Jinja template |
scripts/multi_quant_build_upload.py |
Batch conversion + HuggingFace upload |
Reproduce a build
Requirements: 96 GB minimum system RAM, 128 GB preferred, Python 3.12, litert-torch==0.8.0.
Observed multimodal export peak RSS was about 73.9 GiB, so 96 GB is the practical floor once Python, model cache, filesystem cache, and conversion overhead are included. Use 128 GB when running multiple quantization attempts or keeping extra build artifacts.
Ideal Vast.ai image:
- Vast.ai PyTorch image or an Ubuntu-based NVIDIA/PyTorch CUDA image, not a bare CUDA runtime image.
- Python 3.12 with
uv/pip, Git, Git LFS, Hugging Face CLI, and build tools available. - CUDA/PyTorch wheel support matching the rented GPU architecture; use CUDA 12.8+ PyTorch wheels on Blackwell GPUs.
- 96+ GB RAM and at least 80 GB free disk; 150+ GB disk is safer for source checkpoint, caches, exported TFLite files, and
.litertlmbundles. - Persistent
/workspacevolume if the instance may be stopped/recycled before upload.
# Clone LiteRT-LM builder (needed by bundle_litertlm.py)
git clone --depth=1 https://github.com/google-ai-edge/LiteRT-LM /tmp/litert-lm
pip install litert-torch==0.8.0 mediapipe transformers huggingface-hub
# Download model
huggingface-cli download google/translategemma-4b-it --local-dir ./translategemma-4b-it
# Convert to TFLite with KV cache (~30-60 min, needs 96 GB minimum RAM)
python scripts/convert_translategemma_android.py \
--model-dir ./translategemma-4b-it \
--tflite-dir ./tflite_output/dynamic_int8 \
--output-dir ./output \
--task-file ./output/translategemma-4b-it-dynamic_int8.task \
--quantize dynamic_int8 \
--prefill-seq-len 1024 --kv-cache-max-len 1024 --allow-no-token
# Bundle as .litertlm
python scripts/bundle_litertlm.py \
--tflite ./tflite_output/dynamic_int8/*.tflite \
--tokenizer ./translategemma-4b-it/tokenizer.model \
--output ./output/translategemma-4b-it-dynamic_int8-generic.litertlm \
--quant dynamic_int8
Supported Languages
TranslateGemma supports 55 languages including Arabic, Chinese, French, German, Hebrew, Hindi, Japanese, Korean, Portuguese, Russian, Spanish, and more. See google/translategemma-4b-it for the full list.
License
Model weights: Google Gemma Terms of Use
Conversion scripts: Apache 2.0
- Downloads last month
- -
Model tree for barakplasma/translategemma-4b-it-android-task-quantized
Base model
google/translategemma-4b-it