--- pipeline_tag: translation library_name: comet language: - multilingual - af - am - ar - as - az - be - bg - bn - br - bs - ca - cs - cy - da - de - el - en - eo - es - et - eu - fa - fi - fr - fy - ga - gd - gl - gu - ha - he - hi - hr - hu - hy - id - is - it - ja - jv - ka - kk - km - kn - ko - ku - ky - la - lo - lt - lv - mg - mk - ml - mn - mr - ms - my - ne - nl - 'no' - om - or - pa - pl - ps - pt - ro - ru - sa - sd - si - sk - sl - so - sq - sr - su - sv - sw - ta - te - th - tl - tr - ug - uk - ur - uz - vi - xh - yi - zh license: apache-2.0 base_model: - FacebookAI/xlm-roberta-large --- # COMET-poly-base-wmt25 This model is based on [COMET-poly](https://github.com/zouharvi/COMET-poly), which is a fork but not compatible with original Unbabel's COMET. To run the model, you need to first install this version of COMET either with: ```bash pip install "git+https://github.com/zouharvi/COMET-poly#egg=comet-poly&subdirectory=comet_poly" ``` or in editable mode: ```bash git clone https://github.com/zouharvi/COMET-poly.git cd COMET-poly pip3 install -e comet_poly ``` This model scores the translation `mt` given its source. It is a baseline model that other COMET-poly models are compared to. ```python import comet_poly model = comet_poly.load_from_checkpoint(comet_poly.download_model("zouharvi/COMET-poly-base-wmt25")) data = [ { "src": "Iceberg lettuce got its name in the 1920s when it was shipped packed in ice to stay fresh.", "mt": "Eisbergsalat erhielt seinen Namen in den 1920er-Jahren, als er in Eis verpackt verschickt wurde, um frisch zu bleiben.", }, { "src": "Goats have rectangular pupils, which give them a wide field of vision—up to 320 degrees!", "mt": "Kozy mají obdélníkové zornice, což jim umožňuje vidět skoro všude kolem sebe, aniž by musely otáčet hlavou.", }, { "src": "This helps them spot predators from almost all directions without moving their heads.", "mt": "Điều này giúp chúng phát hiện kẻ săn mồi từ gần như mọi hướng mà không cần quay đầu.", } ] print("scores", model.predict(data, batch_size=8, gpus=1).scores) ``` Outputs: ``` scores [94.98790740966797, 77.56731414794922, 90.77655029296875] ``` The training data is WMT up to 2024 (inclusive) with DA/ESA/MQM merged on a single scale. This model is based on the work [TODO](TODO) which can be cited as: ``` @misc{zuefle2025comet, title={COMET-poly: Machine Translation Metric Grounded in Other Candidates}, author={Maike Züfle, Vilém Zouhar, Tu Anh Dinh, Felipe Polo, Jan Niehues, Mrinmaya Sachan}, year={2025}, } ```