---
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:40000
- loss:MSELoss
base_model: sentence-transformers/paraphrase-multilingual-mpnet-base-v2
widget:
- source_sentence: Who is filming along?
sentences:
- Wién filmt mat?
- Weider huet den Tatarescu drop higewisen, datt Rumänien durch seng krichsbedélegong
op de 6eite vun den allie'erten 110.000 mann verluer hätt.
- Brambilla 130.08.03 St.
- source_sentence: 'Four potential scenarios could still play out: Jean Asselborn.'
sentences:
- Dann ass nach eng Antenne hei um Kierchbierg virgesi Richtung RTL Gebai, do gëtt
jo een ganz neie Wunnquartier gebaut.
- D'bedélegong un de wählen wir ganz stärk gewiéscht a munche ge'genden wor re eso'gucr
me' we' 90 prozent.
- Jean Asselborn gesäit 4 Méiglechkeeten, wéi et kéint virugoen.
- source_sentence: Non-profit organisation Passerell, which provides legal council
to refugees in Luxembourg, announced that it has to make four employees redundant
in August due to a lack of funding.
sentences:
- Oetringen nach Remich....8.20» 215»
- D'ASBL Passerell, déi sech ëm d'Berodung vu Refugiéeën a Saache Rechtsfroe këmmert,
wäert am August mussen hir véier fix Salariéen entloossen.
- D'Regierung huet allerdéngs "just" 180.041 Doudeger verzeechent.
- source_sentence: This regulation was temporarily lifted during the Covid pandemic.
sentences:
- Six Jours vu New-York si fir d’équipe Girgetti — Debacco
- Dës Reegelung gouf wärend der Covid-Pandemie ausgesat.
- ING-Marathon ouni gréisser Tëschefäll ofgelaf - 18 Leit hospitaliséiert.
- source_sentence: The cross-border workers should also receive more wages.
sentences:
- D'grenzarbechetr missten och me' lo'n kre'en.
- 'De Néckel: Firun! Dât ass jo ailes, wèll ''t get dach neischt un der Bréck gemâcht!'
- D'Grande-Duchesse Josephine Charlotte an hir Ministeren hunn d'Land verlooss,
et war den Optakt vun der Zäit am Exil.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- negative_mse
- src2trg_accuracy
- trg2src_accuracy
- mean_accuracy
model-index:
- name: SentenceTransformer based on sentence-transformers/paraphrase-multilingual-mpnet-base-v2
results:
- task:
type: knowledge-distillation
name: Knowledge Distillation
dataset:
name: lb en
type: lb-en
metrics:
- type: negative_mse
value: -0.47610557079315186
name: Negative Mse
- task:
type: translation
name: Translation
dataset:
name: lb en
type: lb-en
metrics:
- type: src2trg_accuracy
value: 0.9861111111111112
name: Src2Trg Accuracy
- type: trg2src_accuracy
value: 0.9861111111111112
name: Trg2Src Accuracy
- type: mean_accuracy
value: 0.9861111111111112
name: Mean Accuracy
---
# SentenceTransformer based on sentence-transformers/paraphrase-multilingual-mpnet-base-v2
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/paraphrase-multilingual-mpnet-base-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-mpnet-base-v2) on the lb-en dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [sentence-transformers/paraphrase-multilingual-mpnet-base-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-mpnet-base-v2)
- **Maximum Sequence Length:** 128 tokens
- **Output Dimensionality:** 768 dimensions
- **Similarity Function:** Cosine Similarity
- **Training Dataset:**
- lb-en
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: XLMRobertaModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("aloizidis/make-multilingual-en-lb-2025-02-28_01-09-55")
# Run inference
sentences = [
'The cross-border workers should also receive more wages.',
"D'grenzarbechetr missten och me' lo'n kre'en.",
"De Néckel: Firun! Dât ass jo ailes, wèll 't get dach neischt un der Bréck gemâcht!",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```
## Evaluation
### Metrics
#### Knowledge Distillation
* Dataset: `lb-en`
* Evaluated with [MSEEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.MSEEvaluator)
| Metric | Value |
|:-----------------|:------------|
| **negative_mse** | **-0.4761** |
#### Translation
* Dataset: `lb-en`
* Evaluated with [TranslationEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TranslationEvaluator)
| Metric | Value |
|:------------------|:-----------|
| src2trg_accuracy | 0.9861 |
| trg2src_accuracy | 0.9861 |
| **mean_accuracy** | **0.9861** |
## Training Details
### Training Dataset
#### lb-en
* Dataset: lb-en
* Size: 40,000 training samples
* Columns: english
, non_english
, and label
* Approximate statistics based on the first 1000 samples:
| | english | non_english | label |
|:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-------------------------------------|
| type | string | string | list |
| details |
A lesson for the next year
| Eng le’er fir dat anert joer
| [0.08891881257295609, 0.20895496010780334, -0.10672671347856522, -0.03302554786205292, 0.049002278596162796, ...]
|
| On Easter, the Maquisards' northern section organizes their big spring ball in Willy Pintsch's hall at the station.
| Op O'schteren organisieren d'Maquisard'eiii section Nord, hire gro'sse fre'joersbal am sali Willy Pintsch op der gare.
| [-0.08668982982635498, -0.06969941407442093, -0.0036096556577831507, 0.1605304628610611, -0.041704729199409485, ...]
|
| The happiness, the peace is long gone now,
| V ergângen ass nu läng dat gléck, de' fréd,
| [0.07229219377040863, 0.3288629353046417, -0.012548360042273998, 0.06720984727144241, -0.02617395855486393, ...]
|
* Loss: [MSELoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#mseloss)
### Evaluation Dataset
#### lb-en
* Dataset: lb-en
* Size: 504 evaluation samples
* Columns: english
, non_english
, and label
* Approximate statistics based on the first 504 samples:
| | english | non_english | label |
|:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-------------------------------------|
| type | string | string | list |
| details | But he was not the instigator of the mass murders of the Jews, his lawyer explained, and he bore no more responsibility than the others.
| Mé hié wir net den ustêfter vun de massemuerden un de judden, erklärt sein affekot, an hicn hätt net me' verantwortong ze droen we' de' aner.
| [0.021159790456295013, 0.11144042760133743, 0.00869293138384819, 0.004551620222628117, -0.09236127883195877, ...]
|
| The Romanian automotive industry * For the first time in its history, Romania has started car production.
| D’rumänesch autoindustrie * Fir d'c'schte ke'er an senger geschieht huet Rumänien d'fabrikalio'n vun'den autoen opgeholl.
| [-0.16835248470306396, 0.14826826751232147, 0.01772368885576725, -0.027855699881911278, 0.04770198464393616, ...]
|
| The drugs were confiscated along with the dealer's car, mobile phones and cash.
| D'Drogen, den Auto, d'Boergeld an d'Handye si saiséiert ginn.
| [-0.05122023820877075, 0.01204440463334322, -0.025424882769584656, 0.1286350041627884, 0.034633491188287735, ...]
|
* Loss: [MSELoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#mseloss)
### Training Hyperparameters
#### Non-Default Hyperparameters
- `eval_strategy`: steps
- `per_device_train_batch_size`: 32
- `per_device_eval_batch_size`: 32
- `learning_rate`: 2e-05
- `num_train_epochs`: 5
- `warmup_ratio`: 0.1
- `bf16`: True
#### All Hyperparameters