CAT-Translate 🐱

License: MIT Hugging Face

Tiny Language Model For Japanese and English Bidirectional Translation

  • Purrs on your lap 🐱: Small and efficient! 0.8-7B models that run on edge devices.
  • Swift and Feline Sharp 🐾: Beats TranslateGemma-12B on text-to-text translation quality.
  • Adopt and adapt 🐈: Open source (MIT License) models you can customize and extend.
Cat sleeping on top of a laptop.

Models

All models are available on Hugging Face:

Evaluation

We conducted evaluation on the translation subsets of the following benchmarks:

We chose these tasks as benchmarks because (1) they are derived from real world applications and (2) are less overoptimized compared to popular datasets (e.g., WMT). The results are below. All the models achieved the best scores among all models (including closed source) within their respective sizes for both En-Ja and Ja-En translation tasks.

Model Avg. BLEU Avg. BLEU Ja->En Avg. BLEU En->Ja BSD (Ja-En) Court (Ja-En) JMed (Ja-En) PFMT (Ja-En) wat-pat-2025 (Ja-En) BSD (En-Ja) JMed (En-Ja) PFMT (En-Ja) wat-pat-2025 (En-Ja)
CyberAgent/CAT-Translate-7B 37.68 41.06 34.31 33.75 45.29 30.65 49.86 45.74 16.29 29.62 52.94 38.37
CyberAgent/CAT-Translate-3.3B 36.16 37.51 34.80 26.51 42.44 24.47 49.93 44.23 17.21 28.67 53.88 39.44
CyberAgent/CAT-Translate-1.4B 33.73 33.26 34.19 31.28 43.84 24.08 36.55 30.57 15.71 26.92 51.53 42.58
Unbabel/Tower-Plus-9B 32.41 36.84 27.99 15.43 40.54 29.13 58.00 41.10 10.00 18.80 53.00 30.16
google/translategemma-12b-it 32.24 35.81 28.68 31.58 34.30 23.46 48.75 40.97 15.92 21.79 52.53 24.47
CyberAgent/CAT-Translate-3.3B-beta 30.60 30.32 30.88 17.20 38.65 23.96 40.58 31.22 16.63 26.68 53.40 26.80
CyberAgent/CAT-Translate-0.8B 30.42 29.71 30.68 29.63 33.19 22.96 32.51 30.56 14.60 26.22 50.62 32.87
google/translategemma-4b-it 28.09 29.41 26.76 28.86 25.89 21.50 42.65 28.16 14.14 20.68 51.99 20.23
LiquidAI/LFM2.5-1.2B-JP 25.47 24.51 26.43 19.06 29.99 22.10 43.61 7.80 14.57 23.85 54.77 12.54
pfnet/plamo-2-translate 25.24 25.92 24.57 25.55 28.63 22.90 29.02 23.48 17.35 24.98 32.04 23.89
LiquidAI/LFM2-350M-ENJP-MT 24.95 24.91 25.00 10.94 29.56 21.48 41.40 21.17 8.11 22.84 47.53 21.52
mistralai/Ministral-8B-Instruct-2410 24.12 27.52 20.71 19.23 29.21 16.25 50.23 22.69 12.91 16.49 41.66 11.80
nvidia/NVIDIA-Nemotron-Nano-9B-v2-Japanese 22.97 22.77 23.18 9.62 34.98 18.01 38.44 12.81 10.62 20.41 42.55 19.13
Rakuten/RakutenAI-2.0-mini-instruct 18.43 17.24 19.62 0.11 30.62 18.21 29.34 7.90 5.19 20.36 45.70 7.23
SakanaAI/TinySwallow-1.5B-Instruct 15.74 14.99 16.49 4.96 18.93 15.83 26.67 8.58 6.30 17.58 34.07 8.00
llm-jp/llm-jp-3.1-1.8b-instruct4 15.18 16.26 14.11 18.82 2.44 15.67 30.65 13.72 15.38 4.91 25.47 10.65
tencent/HY-MT1.5-1.8B 14.49 8.95 20.04 5.50 4.59 4.00 15.67 14.98 6.33 18.13 37.75 17.96
shisa-ai/shisa-v2.1-llama3.2-3b 14.27 14.26 14.28 17.08 3.70 8.26 26.86 15.42 13.18 5.54 25.97 12.41
google/gemma-2-2b-jpn-it 14.15 16.98 11.32 20.04 8.08 11.27 31.49 14.01 12.37 4.48 16.24 12.21
shisa-ai/shisa-v2.1-lfm2-1.2b 13.08 14.02 12.14 20.93 4.95 7.68 26.72 9.80 12.11 5.54 17.60 13.30
microsoft/phi-4 11.92 13.48 10.36 6.10 18.66 2.81 24.86 14.98 3.24 6.97 14.36 16.87
tencent/HY-MT1.5-7B 10.56 13.46 7.67 4.99 12.32 5.72 29.53 14.76 0.82 7.80 14.30 7.74
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.5 10.35 12.42 8.28 24.25 2.30 3.69 14.11 17.74 6.82 2.37 11.21 12.71
Qwen/Qwen2.5-14B-Instruct 8.39 9.88 6.89 10.81 4.70 4.27 11.18 18.46 4.01 3.69 13.42 6.42
meta-llama/Llama-3.2-3B-Instruct 6.06 9.90 2.23 18.60 0.41 2.72 16.62 11.17 1.44 1.10 4.50 1.87

A detailed experimental evaluation will be present in a technical report.

Usage

The model supports English to Japanese and Japanese to English translation with the following prompt format:

from transformers import pipeline

# Load the model
chat_pipeline = pipeline("text-generation", model="CyberAgent/CAT-Translate-7b")

# Define the prompt template
prompt = "Translate the following {src_lang} text into {tgt_lang}.\n\n{src_text}"

# Example: Japanese to English
src_lang = "Japanese"
tgt_lang = "English"
src_text = "🐈はとてもかわいいの。おててがまるくてふわふわなの。"

user_input = [{"role": "user", "content": prompt.format(src_lang=src_lang, tgt_lang=tgt_lang, src_text=src_text)}]

response = chat_pipeline(user_input)

print("-" * 20)
print("Source Text:")
print(src_text)
print("Translation:")
print(response[0]['generated_text'][-1]['content'])

Important: You need to apply the chat template to run the model correctly. Note that the chat template of 7B model is different from the other CAT-Translate models.

Why Use Instructions?

Although the model is specialized for machine translation, we require an instruction prompt to invoke the translation capability. This design choice provides better customizability—extending and merging this model is easier this way. Since the model is open source, any extensions are welcome!

Training

This 7B model is based on CyberAgent's in-house model, developed by Ryosuke Ishigami.

Our training process involved:

License

The model is licensed under the MIT License.

Citation

@misc{cat-translate-2026,
  title={CAT-Translate: Tiny Language Model For Japanese and English Bidirectional Translation},
  author={Yuu Jinnai},
  year={2026},
  url={https://huggingface.co/collections/cyberagent/cat-translate}
}

Acknowledgments

This project stands on the shoulders of giants. In particular, the following resources significantly helped us develop the model:

Downloads last month
27
Safetensors
Model size
7B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cyberagent/CAT-Translate-7b

Quantizations
2 models

Collection including cyberagent/CAT-Translate-7b

Paper for cyberagent/CAT-Translate-7b