|
--- |
|
pipeline_tag: translation |
|
library_name: comet |
|
language: |
|
- multilingual |
|
- af |
|
- am |
|
- ar |
|
- as |
|
- az |
|
- be |
|
- bg |
|
- bn |
|
- br |
|
- bs |
|
- ca |
|
- cs |
|
- cy |
|
- da |
|
- de |
|
- el |
|
- en |
|
- eo |
|
- es |
|
- et |
|
- eu |
|
- fa |
|
- fi |
|
- fr |
|
- fy |
|
- ga |
|
- gd |
|
- gl |
|
- gu |
|
- ha |
|
- he |
|
- hi |
|
- hr |
|
- hu |
|
- hy |
|
- id |
|
- is |
|
- it |
|
- ja |
|
- jv |
|
- ka |
|
- kk |
|
- km |
|
- kn |
|
- ko |
|
- ku |
|
- ky |
|
- la |
|
- lo |
|
- lt |
|
- lv |
|
- mg |
|
- mk |
|
- ml |
|
- mn |
|
- mr |
|
- ms |
|
- my |
|
- ne |
|
- nl |
|
- 'no' |
|
- om |
|
- or |
|
- pa |
|
- pl |
|
- ps |
|
- pt |
|
- ro |
|
- ru |
|
- sa |
|
- sd |
|
- si |
|
- sk |
|
- sl |
|
- so |
|
- sq |
|
- sr |
|
- su |
|
- sv |
|
- sw |
|
- ta |
|
- te |
|
- th |
|
- tl |
|
- tr |
|
- ug |
|
- uk |
|
- ur |
|
- uz |
|
- vi |
|
- xh |
|
- yi |
|
- zh |
|
license: apache-2.0 |
|
base_model: |
|
- FacebookAI/xlm-roberta-large |
|
--- |
|
|
|
# COMET-poly-base-wmt25 |
|
|
|
This model is based on [COMET-poly](https://github.com/zouharvi/COMET-poly), which is a fork but not compatible with original Unbabel's COMET. |
|
To run the model, you need to first install this version of COMET either with: |
|
```bash |
|
pip install "git+https://github.com/zouharvi/COMET-poly#egg=comet-poly&subdirectory=comet_poly" |
|
``` |
|
or in editable mode: |
|
```bash |
|
git clone https://github.com/zouharvi/COMET-poly.git |
|
cd COMET-poly |
|
pip3 install -e comet_poly |
|
``` |
|
|
|
This model scores the translation `mt` given its source. It is a baseline model that other COMET-poly models are compared to. |
|
```python |
|
import comet_poly |
|
model = comet_poly.load_from_checkpoint(comet_poly.download_model("zouharvi/COMET-poly-base-wmt25")) |
|
data = [ |
|
{ |
|
"src": "Iceberg lettuce got its name in the 1920s when it was shipped packed in ice to stay fresh.", |
|
"mt": "Eisbergsalat erhielt seinen Namen in den 1920er-Jahren, als er in Eis verpackt verschickt wurde, um frisch zu bleiben.", |
|
}, |
|
{ |
|
"src": "Goats have rectangular pupils, which give them a wide field of vision—up to 320 degrees!", |
|
"mt": "Kozy mají obdélníkové zornice, což jim umožňuje vidět skoro všude kolem sebe, aniž by musely otáčet hlavou.", |
|
}, |
|
{ |
|
"src": "This helps them spot predators from almost all directions without moving their heads.", |
|
"mt": "Điều này giúp chúng phát hiện kẻ săn mồi từ gần như mọi hướng mà không cần quay đầu.", |
|
} |
|
] |
|
print("scores", model.predict(data, batch_size=8, gpus=1).scores) |
|
``` |
|
Outputs: |
|
``` |
|
scores [94.98790740966797, 77.56731414794922, 90.77655029296875] |
|
``` |
|
|
|
The training data is WMT up to 2024 (inclusive) with DA/ESA/MQM merged on a single scale. |
|
This model is based on the work [TODO](TODO) which can be cited as: |
|
``` |
|
@misc{zuefle2025comet, |
|
title={COMET-poly: Machine Translation Metric Grounded in Other Candidates}, |
|
author={Maike Züfle, Vilém Zouhar, Tu Anh Dinh, Felipe Polo, Jan Niehues, Mrinmaya Sachan}, |
|
year={2025}, |
|
} |
|
``` |