SentenceTransformer based on intfloat/multilingual-e5-base

This is a sentence-transformers model finetuned from intfloat/multilingual-e5-base on the mnlp_encoder_data dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("ngkan146/test-encoder-st")
# Run inference
sentences = [
    'What is the main purpose of chain coding in image segmentation?  \nA. To enhance the color depth of images  \nB. To compress binary images by tracing contours  \nC. To convert images into three-dimensional models  \nD. To increase the size of image files',
    'A chain code is a lossless compression based image segmentation method for binary images based upon tracing image contours. The basic principle of chain coding, like other contour codings, is to separately encode each connected component, or "blob", in the image.\n\nFor each such region, a point on the boundary is selected and its coordinates are transmitted. The encoder then moves along the boundary of the region and, at each step, transmits a symbol representing the direction of this movement.\n\nThis continues until the encoder returns to the starting position, at which point the blob has been completely described, and encoding continues with the next blob in the image.\n\nThis encoding method is particularly effective for images consisting of a reasonably small number of large connected components.\n\nVariations \nSome popular chain codes include:\n the Freeman Chain Code of Eight Directions (FCCE)\n Directional Freeman Chain Code of Eight Directions (DFCCE)\n Vertex Chain Code (VCC)\n Three OrThogonal symbol chain code (3OT)\n Unsigned Manhattan Chain Code (UMCC)\n Ant Colonies Chain Code (ACCC)\n Predator-Prey System Chain Code (PPSCC)\n Beaver Territories Chain Code (BTCC)\n Biological Reproduction Chain Code (BRCC)\n Agent-Based Modeling Chain Code (ABMCC)\n\nIn particular, FCCE, VCC, 3OT and DFCCE can be transformed from one to another\n\nA related blob encoding method is crack code. Algorithms exist to convert between chain code, crack code, and run-length encoding.\n\nA new trend of chain codes involve the utilization of biological behaviors. This started by the work of Mouring et al. who developed an algorithm that takes advantage of the pheromone of ants to track image information. An ant releases a pheromone when they find a piece of food. Other ants use the pheromone to track the food. In their algorithm, an image is transferred into a virtual environment that consists of food and paths according to the distribution of the pixels in the original image. Then, ants are distributed and their job is to move around while releasing pheromone when they encounter food items. This helps other ants identify information, and therefore, encode information.\n\nIn use \nRecently, the combination of move-to-front transform and adaptive run-length encoding accomplished efficient compression of the popular chain codes.\nChain codes also can be used to obtain high levels of compression for image documents, outperforming standards such as DjVu and JBIG2.',
    'Meripilus sumstinei, commonly known as the giant polypore or the black-staining polypore, is a species of fungus in the family Meripilaceae.\n\nTaxonomy \nOriginally described in 1905 by William Alphonso Murrill as Grifola sumstinei, the species was transferred to Meripilus in 1988.\n\nDescription \nThe cap of this polypore is  wide, with folds of flesh up to  thick. It has white to brownish concentric zones and tapers toward the base; the stipe is indistinct.\n\nDistribution and habitat \nIt is found in eastern North America from June to September. It grows in large clumps on the ground around hardwood (including oak) trunks, stumps, and logs.\n\nUses \nThe mushroom is edible.\n\nReferences',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

mnlp_encoder_data

  • Dataset: mnlp_encoder_data at 39af5de
  • Size: 8,000 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 23 tokens
    • mean: 65.95 tokens
    • max: 171 tokens
    • min: 19 tokens
    • mean: 413.21 tokens
    • max: 512 tokens
    • min: 14 tokens
    • mean: 405.39 tokens
    • max: 512 tokens
  • Samples:
    anchor positive negative
    What are the two key processes that relative nonlinearity depends on for maintaining species diversity?
    A. Species must differ in their resource consumption and reproductive rates.
    B. Species must differ in their responses to resource density and affect competition differently.
    C. Species must have identical growth rates and resource requirements.
    D. Species must compete for the same resources and have similar responses to competition.
    Relative nonlinearity is a coexistence mechanism that maintains species diversity via differences in the response to and effect on variation in resource density or some other factor mediating competition. Relative nonlinearity depends on two processes: 1) species have to differ in the curvature of their responses to resource density and 2) the patterns of resource variation generated by each species must favor the relative growth of another species. In its most basic form, one species grows best under equilibrium competitive conditions and another performs better under variable competitive conditions. Like all coexistence mechanisms, relative nonlinearity maintains species diversity by concentrating intraspecific competition relative to interspecific competition. Because resource density can be variable, intraspecific competition is the reduction of per-capita growth rate under variable resources generated by conspecifics (i.e. individuals of the same species). Interspecific competitio... Muellerella lichenicola is a species of lichenicolous fungus in the family Verrucariaceae. It was first formally described as a new species in 1826 by Søren Christian Sommerfelt, as Sphaeria lichenicola. David Leslie Hawksworth transferred it to the genus Muellerella in 1979.

    It has been reported growing on Caloplaca aurantia, Caloplaca saxicola and Physcia aipolia in Sicily, and on an unidentified crustose lichen in Iceland. In Mongolia, it has been reported growing on the thallus of a Biatora-lichen at elevation in the Bulgan district and on Aspicilia at elevation in the Altai district. In Victoria Land, Antarctica, it has been reported from multiple hosts, including members of the Teloschistaceae and Physciaceae.

    References
    What was the unemployment rate in Japan in 2010?
    A. 3.1%
    B. 4.2%
    C. 5.1%
    D. 6.0%
    The labor force in Japan numbered 65.9 million people in 2010, which was 59.6% of the population of 15 years old and older, and amongst them, 62.57 million people were employed, whereas 3.34 million people were unemployed which made the unemployment rate 5.1%. The structure of Japan's labor market experienced gradual change in the late 1980s and continued this trend throughout the 1990s. The structure of the labor market is affected by: 1) shrinking population, 2) replacement of postwar baby boom generation, 3) increasing numbers of women in the labor force, and 4) workers' rising education level. Also, an increase in the number of foreign nationals in the labor force is foreseen.

    As of 2019, Japan's unemployment rate was the lowest in the G7. Its employment rate for the working-age population (15-64) was the highest in the G7.

    By 2021 the size of the labor force changed to 68.60 million, a decrease of 0.08 million from the previous year. Viewing by sex, the male labor force was 38.0...
    The Aircraft Classification Rating (ACR) - Pavement Classification Rating (PCR) method is a standardized international airport pavement rating system developed by ICAO in 2022. The method is scheduled to replace the ACN-PCN method as the official ICAO pavement rating system by November 28, 2024. The method uses similar concepts as the ACN-PCN method, however, the ACR-PCR method is based on layered elastic analysis, uses standard subgrade categories for both flexible and rigid pavement, and eliminates the use of alpha factor and layer equivalency factors.

    The method relies on the comparison of two numbers:

    The ACR, a number defined as two times the derived single wheel load (expressed in hundreds of kilograms) conveying the relative effect on an airplane of a given weight on a pavement structure for a specified standard subgrade strength;
    The PCR, a number (and series of letters) representing the pavement bearing strength (on the same scale as ACR) of a given pavement section (runwa...
    What was the original name of WordMARC before it was changed due to a trademark conflict?
    A. MUSE
    B. WordPerfect
    C. Document Assembly
    D. Primeword
    WordMARC Composer was a scientifically oriented word processor developed by MARC Software, an offshoot of MARC Analysis Research Corporation (which specialized in high end Finite Element Analysis software for mechanical engineering). It ran originally on minicomputers such as Prime and Digital Equipment Corporation VAX. When the IBM PC emerged as the platform of choice for word processing, WordMARC allowed users to easily move documents from a minicomputer (where they could be easily shared) to PCs.

    WordMARC was the creation of Pedro Marcal, who pioneered work in finite element analysis and needed a technical word processor that both supported complex notations and was capable of running on minicomputers and other high-end machines such as Alliant and AT&T.

    WordMARC was originally known as MUSE (MARC Universal Screen Editor), but the name was changed because of a trademark conflict with another company when the product was ported to the IBM PC.

    Features
    In comparison with WordPerf...
    Parametric stereo (abbreviated as PS) is an audio compression algorithm used as an audio coding format for digital audio. It is considered an Audio Object Type of MPEG-4 Part 3 (MPEG-4 Audio) that serves to enhance the coding efficiency of low bandwidth stereo audio media. Parametric Stereo digitally codes a stereo audio signal by storing the audio as monaural alongside a small amount of extra information. This extra information (defined as "parametric overhead") describes how the monaural signal will behave across both stereo channels, which allows for the signal to exist in true stereo upon playback.

    History

    Background
    Advanced Audio Coding Low Complexity (AAC LC) combined with Spectral Band Replication (SBR) and Parametric Stereo (PS) was defined as HE-AAC v2. A HE-AAC v1 decoder will only give a mono output when decoding a HE-AAC v2 bitstream. Parametric Stereo performs sparse coding in the spatial domain, somewhat similar to what SBR does in the frequency domain. An AAC HE v2 b...
  • Loss: TripletLoss with these parameters:
    {
        "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
        "triplet_margin": 5
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • learning_rate: 2e-05
  • weight_decay: 0.01
  • num_train_epochs: 1
  • warmup_steps: 10
  • remove_unused_columns: False

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 10
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: False
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss
0.1 100 4.2263
0.2 200 3.9742
0.3 300 3.9605
0.4 400 3.9198
0.5 500 3.8953
0.6 600 3.8793
0.7 700 3.8918
0.8 800 3.8691
0.9 900 3.8747
1.0 1000 3.8523

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 4.1.0
  • Transformers: 4.51.3
  • PyTorch: 2.7.0+cu126
  • Accelerate: 1.7.0
  • Datasets: 3.6.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
63
Safetensors
Model size
278M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ngkan146/test-encoder-st

Finetuned
(76)
this model

Dataset used to train ngkan146/test-encoder-st