Instructions to use staka/fugumt-en-ja with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use staka/fugumt-en-ja with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "translation" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("translation", model="staka/fugumt-en-ja")# Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("staka/fugumt-en-ja") model = AutoModelForSeq2SeqLM.from_pretrained("staka/fugumt-en-ja") - Inference
- Notebooks
- Google Colab
- Kaggle
FuguMT
This is a translation model using Marian-NMT. For more details, please see my repository.
- source language: en
- target language: ja
How to use
This model uses transformers and sentencepiece.
!pip install transformers sentencepiece
You can use this model directly with a pipeline:
from transformers import pipeline
fugu_translator = pipeline('translation', model='staka/fugumt-en-ja')
fugu_translator('This is a cat.')
If you want to translate multiple sentences, we recommend using pySBD.
!pip install transformers sentencepiece pysbd
import pysbd
seg_en = pysbd.Segmenter(language="en", clean=False)
from transformers import pipeline
fugu_translator = pipeline('translation', model='staka/fugumt-en-ja')
txt = 'This is a cat. It is very cute.'
print(fugu_translator(seg_en.segment(txt)))
Eval results
The results of the evaluation using tatoeba(randomly selected 500 sentences) are as follows:
| source | target | BLEU(*1) |
|---|---|---|
| en | ja | 32.7 |
(*1) sacrebleu --tokenize ja-mecab
- Downloads last month
- 1,899