SentenceTransformer based on intfloat/multilingual-e5-base
This is a sentence-transformers model finetuned from intfloat/multilingual-e5-base on the mnlp_encoder_data dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: intfloat/multilingual-e5-base
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: XLMRobertaModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("ngkan146/test-encoder-st")
# Run inference
sentences = [
'What is the main purpose of chain coding in image segmentation? \nA. To enhance the color depth of images \nB. To compress binary images by tracing contours \nC. To convert images into three-dimensional models \nD. To increase the size of image files',
'A chain code is a lossless compression based image segmentation method for binary images based upon tracing image contours. The basic principle of chain coding, like other contour codings, is to separately encode each connected component, or "blob", in the image.\n\nFor each such region, a point on the boundary is selected and its coordinates are transmitted. The encoder then moves along the boundary of the region and, at each step, transmits a symbol representing the direction of this movement.\n\nThis continues until the encoder returns to the starting position, at which point the blob has been completely described, and encoding continues with the next blob in the image.\n\nThis encoding method is particularly effective for images consisting of a reasonably small number of large connected components.\n\nVariations \nSome popular chain codes include:\n the Freeman Chain Code of Eight Directions (FCCE)\n Directional Freeman Chain Code of Eight Directions (DFCCE)\n Vertex Chain Code (VCC)\n Three OrThogonal symbol chain code (3OT)\n Unsigned Manhattan Chain Code (UMCC)\n Ant Colonies Chain Code (ACCC)\n Predator-Prey System Chain Code (PPSCC)\n Beaver Territories Chain Code (BTCC)\n Biological Reproduction Chain Code (BRCC)\n Agent-Based Modeling Chain Code (ABMCC)\n\nIn particular, FCCE, VCC, 3OT and DFCCE can be transformed from one to another\n\nA related blob encoding method is crack code. Algorithms exist to convert between chain code, crack code, and run-length encoding.\n\nA new trend of chain codes involve the utilization of biological behaviors. This started by the work of Mouring et al. who developed an algorithm that takes advantage of the pheromone of ants to track image information. An ant releases a pheromone when they find a piece of food. Other ants use the pheromone to track the food. In their algorithm, an image is transferred into a virtual environment that consists of food and paths according to the distribution of the pixels in the original image. Then, ants are distributed and their job is to move around while releasing pheromone when they encounter food items. This helps other ants identify information, and therefore, encode information.\n\nIn use \nRecently, the combination of move-to-front transform and adaptive run-length encoding accomplished efficient compression of the popular chain codes.\nChain codes also can be used to obtain high levels of compression for image documents, outperforming standards such as DjVu and JBIG2.',
'Meripilus sumstinei, commonly known as the giant polypore or the black-staining polypore, is a species of fungus in the family Meripilaceae.\n\nTaxonomy \nOriginally described in 1905 by William Alphonso Murrill as Grifola sumstinei, the species was transferred to Meripilus in 1988.\n\nDescription \nThe cap of this polypore is wide, with folds of flesh up to thick. It has white to brownish concentric zones and tapers toward the base; the stipe is indistinct.\n\nDistribution and habitat \nIt is found in eastern North America from June to September. It grows in large clumps on the ground around hardwood (including oak) trunks, stumps, and logs.\n\nUses \nThe mushroom is edible.\n\nReferences',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
mnlp_encoder_data
- Dataset: mnlp_encoder_data at 39af5de
- Size: 8,000 training samples
- Columns:
anchor
,positive
, andnegative
- Approximate statistics based on the first 1000 samples:
anchor positive negative type string string string details - min: 23 tokens
- mean: 65.95 tokens
- max: 171 tokens
- min: 19 tokens
- mean: 413.21 tokens
- max: 512 tokens
- min: 14 tokens
- mean: 405.39 tokens
- max: 512 tokens
- Samples:
anchor positive negative What are the two key processes that relative nonlinearity depends on for maintaining species diversity?
A. Species must differ in their resource consumption and reproductive rates.
B. Species must differ in their responses to resource density and affect competition differently.
C. Species must have identical growth rates and resource requirements.
D. Species must compete for the same resources and have similar responses to competition.Relative nonlinearity is a coexistence mechanism that maintains species diversity via differences in the response to and effect on variation in resource density or some other factor mediating competition. Relative nonlinearity depends on two processes: 1) species have to differ in the curvature of their responses to resource density and 2) the patterns of resource variation generated by each species must favor the relative growth of another species. In its most basic form, one species grows best under equilibrium competitive conditions and another performs better under variable competitive conditions. Like all coexistence mechanisms, relative nonlinearity maintains species diversity by concentrating intraspecific competition relative to interspecific competition. Because resource density can be variable, intraspecific competition is the reduction of per-capita growth rate under variable resources generated by conspecifics (i.e. individuals of the same species). Interspecific competitio...
Muellerella lichenicola is a species of lichenicolous fungus in the family Verrucariaceae. It was first formally described as a new species in 1826 by Søren Christian Sommerfelt, as Sphaeria lichenicola. David Leslie Hawksworth transferred it to the genus Muellerella in 1979.
It has been reported growing on Caloplaca aurantia, Caloplaca saxicola and Physcia aipolia in Sicily, and on an unidentified crustose lichen in Iceland. In Mongolia, it has been reported growing on the thallus of a Biatora-lichen at elevation in the Bulgan district and on Aspicilia at elevation in the Altai district. In Victoria Land, Antarctica, it has been reported from multiple hosts, including members of the Teloschistaceae and Physciaceae.
ReferencesWhat was the unemployment rate in Japan in 2010?
A. 3.1%
B. 4.2%
C. 5.1%
D. 6.0%The labor force in Japan numbered 65.9 million people in 2010, which was 59.6% of the population of 15 years old and older, and amongst them, 62.57 million people were employed, whereas 3.34 million people were unemployed which made the unemployment rate 5.1%. The structure of Japan's labor market experienced gradual change in the late 1980s and continued this trend throughout the 1990s. The structure of the labor market is affected by: 1) shrinking population, 2) replacement of postwar baby boom generation, 3) increasing numbers of women in the labor force, and 4) workers' rising education level. Also, an increase in the number of foreign nationals in the labor force is foreseen.
As of 2019, Japan's unemployment rate was the lowest in the G7. Its employment rate for the working-age population (15-64) was the highest in the G7.
By 2021 the size of the labor force changed to 68.60 million, a decrease of 0.08 million from the previous year. Viewing by sex, the male labor force was 38.0...The Aircraft Classification Rating (ACR) - Pavement Classification Rating (PCR) method is a standardized international airport pavement rating system developed by ICAO in 2022. The method is scheduled to replace the ACN-PCN method as the official ICAO pavement rating system by November 28, 2024. The method uses similar concepts as the ACN-PCN method, however, the ACR-PCR method is based on layered elastic analysis, uses standard subgrade categories for both flexible and rigid pavement, and eliminates the use of alpha factor and layer equivalency factors.
The method relies on the comparison of two numbers:
The ACR, a number defined as two times the derived single wheel load (expressed in hundreds of kilograms) conveying the relative effect on an airplane of a given weight on a pavement structure for a specified standard subgrade strength;
The PCR, a number (and series of letters) representing the pavement bearing strength (on the same scale as ACR) of a given pavement section (runwa...What was the original name of WordMARC before it was changed due to a trademark conflict?
A. MUSE
B. WordPerfect
C. Document Assembly
D. PrimewordWordMARC Composer was a scientifically oriented word processor developed by MARC Software, an offshoot of MARC Analysis Research Corporation (which specialized in high end Finite Element Analysis software for mechanical engineering). It ran originally on minicomputers such as Prime and Digital Equipment Corporation VAX. When the IBM PC emerged as the platform of choice for word processing, WordMARC allowed users to easily move documents from a minicomputer (where they could be easily shared) to PCs.
WordMARC was the creation of Pedro Marcal, who pioneered work in finite element analysis and needed a technical word processor that both supported complex notations and was capable of running on minicomputers and other high-end machines such as Alliant and AT&T.
WordMARC was originally known as MUSE (MARC Universal Screen Editor), but the name was changed because of a trademark conflict with another company when the product was ported to the IBM PC.
Features
In comparison with WordPerf...Parametric stereo (abbreviated as PS) is an audio compression algorithm used as an audio coding format for digital audio. It is considered an Audio Object Type of MPEG-4 Part 3 (MPEG-4 Audio) that serves to enhance the coding efficiency of low bandwidth stereo audio media. Parametric Stereo digitally codes a stereo audio signal by storing the audio as monaural alongside a small amount of extra information. This extra information (defined as "parametric overhead") describes how the monaural signal will behave across both stereo channels, which allows for the signal to exist in true stereo upon playback.
History
Background
Advanced Audio Coding Low Complexity (AAC LC) combined with Spectral Band Replication (SBR) and Parametric Stereo (PS) was defined as HE-AAC v2. A HE-AAC v1 decoder will only give a mono output when decoding a HE-AAC v2 bitstream. Parametric Stereo performs sparse coding in the spatial domain, somewhat similar to what SBR does in the frequency domain. An AAC HE v2 b... - Loss:
TripletLoss
with these parameters:{ "distance_metric": "TripletDistanceMetric.EUCLIDEAN", "triplet_margin": 5 }
Training Hyperparameters
Non-Default Hyperparameters
learning_rate
: 2e-05weight_decay
: 0.01num_train_epochs
: 1warmup_steps
: 10remove_unused_columns
: False
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: noprediction_loss_only
: Trueper_device_train_batch_size
: 8per_device_eval_batch_size
: 8per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 2e-05weight_decay
: 0.01adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 1max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.0warmup_steps
: 10log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Falselabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}tp_size
: 0fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss |
---|---|---|
0.1 | 100 | 4.2263 |
0.2 | 200 | 3.9742 |
0.3 | 300 | 3.9605 |
0.4 | 400 | 3.9198 |
0.5 | 500 | 3.8953 |
0.6 | 600 | 3.8793 |
0.7 | 700 | 3.8918 |
0.8 | 800 | 3.8691 |
0.9 | 900 | 3.8747 |
1.0 | 1000 | 3.8523 |
Framework Versions
- Python: 3.10.12
- Sentence Transformers: 4.1.0
- Transformers: 4.51.3
- PyTorch: 2.7.0+cu126
- Accelerate: 1.7.0
- Datasets: 3.6.0
- Tokenizers: 0.21.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
TripletLoss
@misc{hermans2017defense,
title={In Defense of the Triplet Loss for Person Re-Identification},
author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
year={2017},
eprint={1703.07737},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
- Downloads last month
- 63
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for ngkan146/test-encoder-st
Base model
intfloat/multilingual-e5-base