Llama-3.1-8B Fine-tuned for Russian-Kazakh Translation

This model is a fine-tuned version of Meta's Llama-3.1-8B, optimized for bidirectional translation between Russian and Kazakh languages. The model demonstrates strong performance in both translation directions, particularly excelling in Russian to Kazakh translation where it outperforms several baseline models.

Model Details

Base Model: unsloth/Meta-Llama-3.1-8B-bnb-4bit
Training Duration: 166 hours
Hardware: 1x NVIDIA A100 SXM GPU
Training Framework: Unsloth library for efficient fine-tuning
Model Size: 8B parameters

Training Data

Data Sources and Distribution

Source	Samples	Percentage
nu	848,509	55.38%
kazparc (kk-ru)	290,785	18.98%
kazparc (kk-en)	290,875	18.99%
kaznu	78,920	5.15%
News-Commentary	9,075	0.59%
TED2020	6,887	0.45%
QED	4,664	0.30%
Tatoeba	2,301	0.15%
Total	1,532,016	100%

Evaluation Results – Russian to Kazakh (ru-kk)

MT Metrics (100 samples)

Model	Type	BLEU	COMET
PolynomeAI	Open-source	13.31	0.85
issai/LLama-3.1-KazLLM-1.0-8B	Open-source	4.51	0.75
meta/LLama-3.1-1.0-8B	Open-source	4.66	0.65

LLM Judge Evaluation (GPT-4o mini)

Comparison with Yandex:

Yandex Better: 78.0%
PolynomeAI Better: 16.5%
Both Good: 2.5%
Both Bad: 3%

Evaluation Results – Kazakh to Russian (kk-ru)

MT Metrics (100 samples)

Model	Type	BLEU	COMET
PolynomeAI	Open-source	28.72	0.91
issai/LLama-3.1-KazLLM-1.0-8B	Open-source	28.06	0.91
meta/LLama-3.1-1.0-8B	Open-source	16.64	0.87

We are not including deepvk_kazRush-ru-kk in this evaluation because it was specifically trained for Ru-Kk direction.

LLM Judge Evaluation (GPT-4o mini)

Comparison with Yandex:

Yandex Better: 52.0%
PolynomeAI Better: 41.0%
Both Good: 4.5%
Both Bad: 2.5%

Usage

To Do

Limitations and Bias

The model's performance has been primarily evaluated on a test set of 100 samples
Performance may vary depending on domain and complexity of the input text
The model inherits any biases present in the Llama-3.1-8B base model and training data
The training data is heavily skewed towards the 'nu' source (89.3%)
Most of the training data (97.1%) falls in the moderate quality range based on COMET scores (0.4 - 0.6)

PolynomeAI
/

Llama-3.1-8B-kkru

Llama-3.1-8B Fine-tuned for Russian-Kazakh Translation

Model Details

Training Data

Data Sources and Distribution

Evaluation Results – Russian to Kazakh (ru-kk)

MT Metrics (100 samples)

LLM Judge Evaluation (GPT-4o mini)

Evaluation Results – Kazakh to Russian (kk-ru)

MT Metrics (100 samples)

LLM Judge Evaluation (GPT-4o mini)

Usage

Limitations and Bias

Model tree for PolynomeAI/Llama-3.1-8B-kkru

Dataset used to train PolynomeAI/Llama-3.1-8B-kkru