Mana Persian Piper (fa-IR)
This repository hosts a Persian (fa-IR) Piper TTS model trained for low-latency, high-quality speech synthesis.
The model is a medium-sized Piper checkpoint, fine-tuned on the Mana-TTS dataset to produce natural and intelligible Persian speech while remaining suitable for real-time and on-device inference.
Model Description
- Architecture: Piper (medium)
- Language: Persian (fa-IR)
- Base Checkpoint: https://huggingface.co/SadeghK/persian-text-to-speech/tree/main/farsi/amir
- Fine-tuning: ~1000 epochs on Mana-TTS
- Training Dataset: https://huggingface.co/datasets/MahtaFetrat/Mana-TTS
This model was trained as part of a broader effort to build efficient Persian TTS systems that integrate well with lightweight and context-aware phonemization pipelines.
Inference
Install Piper
pip install piper-tts
Download the Model
git clone https://huggingface.co/MahtaFetrat/Mana-Persian-Piper
Run Inference (Python)
import wave
from piper import PiperVoice
voice = PiperVoice.load("/content/Mana-Persian-Piper/fa_IR-mana-medium.onnx")
with wave.open("test.wav", "wb") as wav_file:
voice.synthesize_wav("سلام به همگی!", wav_file)
This will generate a test.wav file containing synthesized Persian speech.
Model Files
fa_IR-mana-medium.onnx– Piper acoustic modelfa_IR-mana-medium.onnx.json– Model configuration and metadata
Recommended Usage
This model is best used in conjunction with context-aware phonemization, as proposed in the paper:
Beyond Unified Models: A Service-Oriented Approach to Low-Latency, Context-Aware Phonemization for Real-Time TTS
In particular, combining this Piper model with:
- Lightweight G2P
- Ezafe-aware context disambiguation
results in improved pronunciation accuracy while preserving real-time performance.
The full system implementation is available in the companion repository associated with the paper.
Citation
If you use this model in your research or applications, please cite the following paper:
@misc{fetrat2025servicetts,
title={Beyond Unified Models: A Service-Oriented Approach to Low Latency, Context Aware Phonemization for Real Time TTS},
author={Mahta Fetrat and Donya Navabi and Zahra Dehghanian and Morteza Abolghasemi and Hamid R. Rabiee},
year={2025},
eprint={2512.08006},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2512.08006},
}
Model tree for MahtaFetrat/Mana-Persian-Piper
Base model
rhasspy/piper-voices