--- tags: - sentence-transformers - sentence-similarity - feature-extraction - generated_from_trainer - dataset_size:40000 - loss:MSELoss base_model: sentence-transformers/paraphrase-multilingual-mpnet-base-v2 widget: - source_sentence: Who is filming along? sentences: - Wién filmt mat? - Weider huet den Tatarescu drop higewisen, datt Rumänien durch seng krichsbedélegong op de 6eite vun den allie'erten 110.000 mann verluer hätt. - Brambilla 130.08.03 St. - source_sentence: 'Four potential scenarios could still play out: Jean Asselborn.' sentences: - Dann ass nach eng Antenne hei um Kierchbierg virgesi Richtung RTL Gebai, do gëtt jo een ganz neie Wunnquartier gebaut. - D'bedélegong un de wählen wir ganz stärk gewiéscht a munche ge'genden wor re eso'gucr me' we' 90 prozent. - Jean Asselborn gesäit 4 Méiglechkeeten, wéi et kéint virugoen. - source_sentence: Non-profit organisation Passerell, which provides legal council to refugees in Luxembourg, announced that it has to make four employees redundant in August due to a lack of funding. sentences: - Oetringen nach Remich....8.20» 215» - D'ASBL Passerell, déi sech ëm d'Berodung vu Refugiéeën a Saache Rechtsfroe këmmert, wäert am August mussen hir véier fix Salariéen entloossen. - D'Regierung huet allerdéngs "just" 180.041 Doudeger verzeechent. - source_sentence: This regulation was temporarily lifted during the Covid pandemic. sentences: - Six Jours vu New-York si fir d’équipe Girgetti — Debacco - Dës Reegelung gouf wärend der Covid-Pandemie ausgesat. - ING-Marathon ouni gréisser Tëschefäll ofgelaf - 18 Leit hospitaliséiert. - source_sentence: The cross-border workers should also receive more wages. sentences: - D'grenzarbechetr missten och me' lo'n kre'en. - 'De Néckel: Firun! Dât ass jo ailes, wèll ''t get dach neischt un der Bréck gemâcht!' - D'Grande-Duchesse Josephine Charlotte an hir Ministeren hunn d'Land verlooss, et war den Optakt vun der Zäit am Exil. pipeline_tag: sentence-similarity library_name: sentence-transformers metrics: - negative_mse - src2trg_accuracy - trg2src_accuracy - mean_accuracy model-index: - name: SentenceTransformer based on sentence-transformers/paraphrase-multilingual-mpnet-base-v2 results: - task: type: knowledge-distillation name: Knowledge Distillation dataset: name: lb en type: lb-en metrics: - type: negative_mse value: -0.47610557079315186 name: Negative Mse - task: type: translation name: Translation dataset: name: lb en type: lb-en metrics: - type: src2trg_accuracy value: 0.9861111111111112 name: Src2Trg Accuracy - type: trg2src_accuracy value: 0.9861111111111112 name: Trg2Src Accuracy - type: mean_accuracy value: 0.9861111111111112 name: Mean Accuracy --- # SentenceTransformer based on sentence-transformers/paraphrase-multilingual-mpnet-base-v2 This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/paraphrase-multilingual-mpnet-base-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-mpnet-base-v2) on the lb-en dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [sentence-transformers/paraphrase-multilingual-mpnet-base-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-mpnet-base-v2) - **Maximum Sequence Length:** 128 tokens - **Output Dimensionality:** 768 dimensions - **Similarity Function:** Cosine Similarity - **Training Dataset:** - lb-en ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: XLMRobertaModel (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("aloizidis/make-multilingual-en-lb-2025-02-28_01-09-55") # Run inference sentences = [ 'The cross-border workers should also receive more wages.', "D'grenzarbechetr missten och me' lo'n kre'en.", "De Néckel: Firun! Dât ass jo ailes, wèll 't get dach neischt un der Bréck gemâcht!", ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 768] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] ``` ## Evaluation ### Metrics #### Knowledge Distillation * Dataset: `lb-en` * Evaluated with [MSEEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.MSEEvaluator) | Metric | Value | |:-----------------|:------------| | **negative_mse** | **-0.4761** | #### Translation * Dataset: `lb-en` * Evaluated with [TranslationEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TranslationEvaluator) | Metric | Value | |:------------------|:-----------| | src2trg_accuracy | 0.9861 | | trg2src_accuracy | 0.9861 | | **mean_accuracy** | **0.9861** | ## Training Details ### Training Dataset #### lb-en * Dataset: lb-en * Size: 40,000 training samples * Columns: english, non_english, and label * Approximate statistics based on the first 1000 samples: | | english | non_english | label | |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-------------------------------------| | type | string | string | list | | details | | | | * Samples: | english | non_english | label | |:---------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------| | A lesson for the next year | Eng le’er fir dat anert joer | [0.08891881257295609, 0.20895496010780334, -0.10672671347856522, -0.03302554786205292, 0.049002278596162796, ...] | | On Easter, the Maquisards' northern section organizes their big spring ball in Willy Pintsch's hall at the station. | Op O'schteren organisieren d'Maquisard'eiii section Nord, hire gro'sse fre'joersbal am sali Willy Pintsch op der gare. | [-0.08668982982635498, -0.06969941407442093, -0.0036096556577831507, 0.1605304628610611, -0.041704729199409485, ...] | | The happiness, the peace is long gone now, | V ergângen ass nu läng dat gléck, de' fréd, | [0.07229219377040863, 0.3288629353046417, -0.012548360042273998, 0.06720984727144241, -0.02617395855486393, ...] | * Loss: [MSELoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#mseloss) ### Evaluation Dataset #### lb-en * Dataset: lb-en * Size: 504 evaluation samples * Columns: english, non_english, and label * Approximate statistics based on the first 504 samples: | | english | non_english | label | |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-------------------------------------| | type | string | string | list | | details | | | | * Samples: | english | non_english | label | |:------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------| | But he was not the instigator of the mass murders of the Jews, his lawyer explained, and he bore no more responsibility than the others. | Mé hié wir net den ustêfter vun de massemuerden un de judden, erklärt sein affekot, an hicn hätt net me' verantwortong ze droen we' de' aner. | [0.021159790456295013, 0.11144042760133743, 0.00869293138384819, 0.004551620222628117, -0.09236127883195877, ...] | | The Romanian automotive industry * For the first time in its history, Romania has started car production. | D’rumänesch autoindustrie * Fir d'c'schte ke'er an senger geschieht huet Rumänien d'fabrikalio'n vun'den autoen opgeholl. | [-0.16835248470306396, 0.14826826751232147, 0.01772368885576725, -0.027855699881911278, 0.04770198464393616, ...] | | The drugs were confiscated along with the dealer's car, mobile phones and cash. | D'Drogen, den Auto, d'Boergeld an d'Handye si saiséiert ginn. | [-0.05122023820877075, 0.01204440463334322, -0.025424882769584656, 0.1286350041627884, 0.034633491188287735, ...] | * Loss: [MSELoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#mseloss) ### Training Hyperparameters #### Non-Default Hyperparameters - `eval_strategy`: steps - `per_device_train_batch_size`: 32 - `per_device_eval_batch_size`: 32 - `learning_rate`: 2e-05 - `num_train_epochs`: 5 - `warmup_ratio`: 0.1 - `bf16`: True #### All Hyperparameters
Click to expand - `overwrite_output_dir`: False - `do_predict`: False - `eval_strategy`: steps - `prediction_loss_only`: True - `per_device_train_batch_size`: 32 - `per_device_eval_batch_size`: 32 - `per_gpu_train_batch_size`: None - `per_gpu_eval_batch_size`: None - `gradient_accumulation_steps`: 1 - `eval_accumulation_steps`: None - `torch_empty_cache_steps`: None - `learning_rate`: 2e-05 - `weight_decay`: 0.0 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1.0 - `num_train_epochs`: 5 - `max_steps`: -1 - `lr_scheduler_type`: linear - `lr_scheduler_kwargs`: {} - `warmup_ratio`: 0.1 - `warmup_steps`: 0 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `save_safetensors`: True - `save_on_each_node`: False - `save_only_model`: False - `restore_callback_states_from_checkpoint`: False - `no_cuda`: False - `use_cpu`: False - `use_mps_device`: False - `seed`: 42 - `data_seed`: None - `jit_mode_eval`: False - `use_ipex`: False - `bf16`: True - `fp16`: False - `fp16_opt_level`: O1 - `half_precision_backend`: auto - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: None - `local_rank`: 0 - `ddp_backend`: None - `tpu_num_cores`: None - `tpu_metrics_debug`: False - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_prefetch_factor`: None - `past_index`: -1 - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: False - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_min_num_params`: 0 - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `fsdp_transformer_layer_cls_to_wrap`: None - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch - `optim_args`: None - `adafactor`: False - `group_by_length`: False - `length_column_name`: length - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `use_legacy_prediction_loop`: False - `push_to_hub`: False - `resume_from_checkpoint`: None - `hub_model_id`: None - `hub_strategy`: every_save - `hub_private_repo`: None - `hub_always_push`: False - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `include_inputs_for_metrics`: False - `include_for_metrics`: [] - `eval_do_concat_batches`: True - `fp16_backend`: auto - `push_to_hub_model_id`: None - `push_to_hub_organization`: None - `mp_parameters`: - `auto_find_batch_size`: False - `full_determinism`: False - `torchdynamo`: None - `ray_scope`: last - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `dispatch_batches`: None - `split_batches`: None - `include_tokens_per_second`: False - `include_num_input_tokens_seen`: False - `neftune_noise_alpha`: None - `optim_target_modules`: None - `batch_eval_metrics`: False - `eval_on_start`: False - `use_liger_kernel`: False - `eval_use_gather_object`: False - `average_tokens_across_devices`: False - `prompts`: None - `batch_sampler`: batch_sampler - `multi_dataset_batch_sampler`: proportional
### Training Logs | Epoch | Step | Training Loss | lb-en loss | lb-en_negative_mse | lb-en_mean_accuracy | |:------:|:----:|:-------------:|:----------:|:------------------:|:-------------------:| | 0.08 | 100 | 0.0056 | 0.0048 | -0.7796 | 0.7887 | | 0.16 | 200 | 0.0051 | 0.0046 | -0.7330 | 0.8373 | | 0.24 | 300 | 0.0049 | 0.0044 | -0.6992 | 0.8740 | | 0.32 | 400 | 0.0047 | 0.0043 | -0.6763 | 0.8889 | | 0.4 | 500 | 0.0046 | 0.0042 | -0.6584 | 0.8988 | | 0.48 | 600 | 0.0045 | 0.0041 | -0.6377 | 0.9067 | | 0.56 | 700 | 0.0044 | 0.0040 | -0.6209 | 0.9206 | | 0.64 | 800 | 0.0043 | 0.0040 | -0.6087 | 0.9266 | | 0.72 | 900 | 0.0043 | 0.0039 | -0.5984 | 0.9395 | | 0.8 | 1000 | 0.0042 | 0.0038 | -0.5887 | 0.9385 | | 0.88 | 1100 | 0.0042 | 0.0038 | -0.5799 | 0.9425 | | 0.96 | 1200 | 0.0041 | 0.0038 | -0.5725 | 0.9474 | | 1.04 | 1300 | 0.004 | 0.0037 | -0.5690 | 0.9524 | | 1.12 | 1400 | 0.0039 | 0.0037 | -0.5602 | 0.9554 | | 1.2 | 1500 | 0.0038 | 0.0037 | -0.5545 | 0.9603 | | 1.28 | 1600 | 0.0038 | 0.0036 | -0.5501 | 0.9673 | | 1.3600 | 1700 | 0.0038 | 0.0036 | -0.5459 | 0.9643 | | 1.44 | 1800 | 0.0037 | 0.0036 | -0.5411 | 0.9702 | | 1.52 | 1900 | 0.0038 | 0.0036 | -0.5360 | 0.9722 | | 1.6 | 2000 | 0.0037 | 0.0035 | -0.5326 | 0.9683 | | 1.6800 | 2100 | 0.0037 | 0.0035 | -0.5310 | 0.9732 | | 1.76 | 2200 | 0.0036 | 0.0035 | -0.5264 | 0.9752 | | 1.8400 | 2300 | 0.0037 | 0.0035 | -0.5224 | 0.9792 | | 1.92 | 2400 | 0.0036 | 0.0035 | -0.5205 | 0.9792 | | 2.0 | 2500 | 0.0036 | 0.0034 | -0.5166 | 0.9782 | | 2.08 | 2600 | 0.0033 | 0.0034 | -0.5137 | 0.9782 | | 2.16 | 2700 | 0.0034 | 0.0034 | -0.5121 | 0.9812 | | 2.24 | 2800 | 0.0033 | 0.0034 | -0.5093 | 0.9802 | | 2.32 | 2900 | 0.0034 | 0.0034 | -0.5063 | 0.9821 | | 2.4 | 3000 | 0.0034 | 0.0034 | -0.5051 | 0.9802 | | 2.48 | 3100 | 0.0034 | 0.0034 | -0.5030 | 0.9812 | | 2.56 | 3200 | 0.0033 | 0.0033 | -0.5002 | 0.9851 | | 2.64 | 3300 | 0.0034 | 0.0033 | -0.4962 | 0.9831 | | 2.7200 | 3400 | 0.0034 | 0.0033 | -0.4936 | 0.9831 | | 2.8 | 3500 | 0.0033 | 0.0033 | -0.4916 | 0.9841 | | 2.88 | 3600 | 0.0033 | 0.0033 | -0.4892 | 0.9841 | | 2.96 | 3700 | 0.0033 | 0.0033 | -0.4871 | 0.9841 | | 3.04 | 3800 | 0.0032 | 0.0033 | -0.4863 | 0.9861 | | 3.12 | 3900 | 0.0031 | 0.0033 | -0.4864 | 0.9841 | | 3.2 | 4000 | 0.0031 | 0.0033 | -0.4859 | 0.9841 | | 3.2800 | 4100 | 0.0031 | 0.0033 | -0.4848 | 0.9871 | | 3.36 | 4200 | 0.0031 | 0.0033 | -0.4838 | 0.9881 | | 3.44 | 4300 | 0.0031 | 0.0032 | -0.4837 | 0.9861 | | 3.52 | 4400 | 0.0031 | 0.0032 | -0.4817 | 0.9851 | | 3.6 | 4500 | 0.0031 | 0.0032 | -0.4812 | 0.9841 | | 3.68 | 4600 | 0.0031 | 0.0032 | -0.4792 | 0.9861 | | 3.76 | 4700 | 0.0031 | 0.0032 | -0.4793 | 0.9851 | | 3.84 | 4800 | 0.0031 | 0.0032 | -0.4779 | 0.9871 | | 3.92 | 4900 | 0.0031 | 0.0032 | -0.4771 | 0.9861 | | 4.0 | 5000 | 0.0031 | 0.0032 | -0.4761 | 0.9861 | ### Framework Versions - Python: 3.11.11 - Sentence Transformers: 3.4.1 - Transformers: 4.49.0 - PyTorch: 2.6.0 - Accelerate: 1.4.0 - Datasets: 3.3.2 - Tokenizers: 0.21.0 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` #### MSELoss ```bibtex @inproceedings{reimers-2020-multilingual-sentence-bert, title = "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2020", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/2004.09813", } ```