SentenceTransformer based on nomic-ai/modernbert-embed-base
This is a sentence-transformers model finetuned from nomic-ai/modernbert-embed-base on the finqalab_embedding_finetune dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: nomic-ai/modernbert-embed-base
- Maximum Sequence Length: 8192 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Ch333tah/modernbert-finqalab-embeddings")
# Run inference
sentences = [
"Question: I'm experiencing issues with logging in to the app. What should I do? Answer: In case you are facing any issues, Try closing the app and opening it again. Try clearing the cache or updating to the latest version. If the issue still persists, contact our customer support through Whatsapp (+923003672522). Email your query at [email protected]",
"My app won't let me log in! Help!",
"Question: The app is not loading properly on my device. What could be the problem? Answer: If the app isn't loading properly: Please check if you have a stable internet connection. Try refreshing the screen 2-3 times. Close the app and open it again. Try clearing the cache, check for app updates or reinstall the app. If the issue still persists, contact our customer support through Whatsapp (+923003672522) or email your query at [email protected]",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Triplet
- Datasets:
ai-job-train
,ai-job-valid
,ai-job-test
,ai-job-train
,ai-job-valid
andai-job-test
- Evaluated with
TripletEvaluator
Metric | ai-job-train | ai-job-valid | ai-job-test |
---|---|---|---|
cosine_accuracy | 0.9604 | 0.9524 | 0.913 |
Training Details
Training Dataset
finqalab_embedding_finetune
- Dataset: finqalab_embedding_finetune at 144dee2
- Size: 101 training samples
- Columns:
Pos_Context
,Query
, andNeg_Context
- Approximate statistics based on the first 101 samples:
Pos_Context Query Neg_Context type string string string details - min: 27 tokens
- mean: 74.1 tokens
- max: 418 tokens
- min: 7 tokens
- mean: 15.19 tokens
- max: 30 tokens
- min: 25 tokens
- mean: 72.42 tokens
- max: 231 tokens
- Samples:
Pos_Context Query Neg_Context Question: I did not receive a verification email? What should I do? Answer: Please check your spam folder and double check your registered email address. If you still dont see a verification email, contact our customer support department at [email protected].
My verification email didn't arrive, any ideas?
Question: I did not receive my instant transfer in Finqalab account within 10 minutes. What should I do? Answer: If your instant transfer hasnt been credited to your Finqalab account within 10 minutes, please email us at [email protected] with a screenshot of the receipt or send it to us on Whatsapp (+923003672522). Our team will review the issue, escalate it to the bank by sending the transaction receipt, and follow up to ensure your funds are credited promptly.
Question: What are the applicable CGT rates for RDA Account Holders? Answer: Filer rates are applied to RDA account holders irrespective of their status (Filer or Non-filer).
How are capital gains taxes handled for someone with an RDA account?
Question: What does Minimum Lot Size mean? Answer: This means that you need to buy a minimum quantity for a share. In case of ETFs the minimum lot size is 500 or in multiples of 500 shares. Whereas, for non-ETF stocks the minimum lot size is 1 share.
Question: How do I receive bonus shares? Answer: Bonus shares are distributed to shareholders based on a ratio announced by the company. For example, if a company declares a 20% bonus issue, you will receive 2 additional shares for every 10 shares you already own.
What's the deal with getting extra shares?
Question: Do I have to pay for bonus shares? Answer: No, bonus shares are issued free of charge. They are typically paid for by utilizing the companys retained earnings or reserves.
- Loss:
MultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Evaluation Dataset
finqalab_embedding_finetune
- Dataset: finqalab_embedding_finetune at 144dee2
- Size: 21 evaluation samples
- Columns:
Pos_Context
,Query
, andNeg_Context
- Approximate statistics based on the first 21 samples:
Pos_Context Query Neg_Context type string string string details - min: 23 tokens
- mean: 78.67 tokens
- max: 152 tokens
- min: 12 tokens
- mean: 16.67 tokens
- max: 27 tokens
- min: 27 tokens
- mean: 91.9 tokens
- max: 418 tokens
- Samples:
Pos_Context Query Neg_Context Question: How will I know if my biometric verification is pending or close to the deadline? Answer: We receive reports every Monday and Friday regarding users with pending biometric verifications. If you have less than 7 days remaining, we will contact you to remind you to complete the process or provide alternate solutions if necessary.
What happens if my biometric verification is about to expire?
Question: I entered my CNIC in the Bioverify app and got CNIC not eligible for this service message. What does this mean? Answer: This means that the Biometric verification is not required.
Question: How long do I have to complete the biometric verification? Answer: Once you receive the OTP, you must finish the biometric process within 45 days. An automated email will remind you to complete it.
What's the deadline for that fingerprint thing after I get the code?
Question: Can I complete the biometric verification from outside Pakistan? Answer: Yes, if you are currently abroad, you can complete the biometric process either online or by visiting an NCCPL office when you return to Pakistan, provided its within 45 days of account activation.
Question: Is historical price data available for stocks in the app? Answer: Yes, it is.
Can I see stock prices from the past in this app?
Question: How often is the stock data updated in the app? Answer: The stock data is updated in real time.
- Loss:
MultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 16per_device_eval_batch_size
: 16learning_rate
: 2e-05num_train_epochs
: 1warmup_ratio
: 0.1batch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 16per_device_eval_batch_size
: 16per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 2e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 1max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | ai-job-train_cosine_accuracy | ai-job-valid_cosine_accuracy | ai-job-test_cosine_accuracy |
---|---|---|---|---|
-1 | -1 | 0.9604 | 0.9524 | 0.9130 |
Framework Versions
- Python: 3.11.11
- Sentence Transformers: 3.4.0
- Transformers: 4.48.1
- PyTorch: 2.5.1+cu121
- Accelerate: 1.2.1
- Datasets: 3.2.0
- Tokenizers: 0.21.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 104
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for Ch333tah/modernbert-finqalab-embeddings
Base model
answerdotai/ModernBERT-base
Finetuned
nomic-ai/modernbert-embed-base
Evaluation results
- Cosine Accuracy on ai job trainself-reported0.960
- Cosine Accuracy on ai job trainself-reported0.960
- Cosine Accuracy on ai job validself-reported0.905
- Cosine Accuracy on ai job validself-reported0.952
- Cosine Accuracy on ai job testself-reported0.913
- Cosine Accuracy on ai job testself-reported0.913