- name: SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
- task:
type: semantic-similarity
name: Semantic Similarity
name: FT label
type: FT_label
- type: pearson_cosine
value: 0.40571243927086686
name: Pearson Cosine
- type: spearman_cosine
value: 0.4157655660967662
name: Spearman Cosine
- type: pearson_manhattan
value: 0.4294377953337607
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.41636474785618866
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.4293067637823527
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.41576593946890283
name: Spearman Euclidean
- type: pearson_dot
value: 0.4057124337715868
name: Pearson Dot
- type: spearman_dot
value: 0.4157663124606592
name: Spearman Dot
- type: pearson_max
value: 0.4294377953337607
name: Pearson Max
- type: spearman_max
value: 0.41636474785618866
name: Spearman Max
# SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) <!-- at revision e4ce9877abf3edfe10b0d82785e83bdcb973e22e -->
- **Maximum Sequence Length:** 256 tokens
- **Output Dimensionality:** 384 tokens
- **Similarity Function:** Cosine Similarity
<!-- - **Training Dataset:** Unknown -->
<!-- - **Language:** Unknown -->
<!-- - **License:** Unknown -->
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Hgkang00/FT-label-consent-10")
# Run inference
sentences = [
'I engage in risky behaviors like reckless driving or reckless sexual encounters.',
'Symptoms during a manic episode include inflated self-esteem or grandiosity,increased goal-directed activity, or excessive involvement in risky activities.',
'Marked decrease in functioning in areas like work, interpersonal relations, or self-care since the onset of the disturbance.',
embeddings = model.encode(sentences)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
# [3, 3]
### Direct Usage (Transformers)
<details><summary>Click to see the direct usage in Transformers</summary>
### Downstream Usage (Sentence Transformers)
You can finetune this model on your own dataset.
<details><summary>Click to expand</summary>
## Evaluation
### Metrics
#### Semantic Similarity
* Dataset: `FT_label`
* Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
| Metric | Value |
| pearson_cosine | 0.4057 |
| **spearman_cosine** | **0.4158** |
| pearson_manhattan | 0.4294 |
| spearman_manhattan | 0.4164 |
| pearson_euclidean | 0.4293 |
| spearman_euclidean | 0.4158 |
| pearson_dot | 0.4057 |
| spearman_dot | 0.4158 |
| pearson_max | 0.4294 |
| spearman_max | 0.4164 |
## Bias, Risks and Limitations
### Recommendations
## Training Details
### Training Dataset
#### Unnamed Dataset
* Size: 33,800 training samples
* Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>score</code>
* Approximate statistics based on the first 1000 samples:
| | sentence1 | sentence2 | score |
| type | string | string | float |
| details | <ul><li>min: 29 tokens</li><li>mean: 29.0 tokens</li><li>max: 29 tokens</li></ul> | <ul><li>min: 14 tokens</li><li>mean: 25.15 tokens</li><li>max: 43 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.06</li><li>max: 1.0</li></ul> |
* Samples:
| sentence1 | sentence2 | score |
| <code>Presence of delusions, hallucinations or disorganized speech, for a significant portion of time within a 1-month period</code> | <code>I often hear voices telling me things that are not real, even when I'm alone in my room.</code> | <code>1.0</code> |
| <code>Presence of delusions, hallucinations or disorganized speech, for a significant portion of time within a 1-month period</code> | <code>I have strong beliefs that people are plotting against me and trying to harm me, which makes it hard for me to trust anyone.</code> | <code>1.0</code> |
| <code>Presence of delusions, hallucinations or disorganized speech, for a significant portion of time within a 1-month period</code> | <code>Sometimes, I see things that others around me don't see, like strange figures or objects.</code> | <code>1.0</code> |
* Loss: [<code>CoSENTLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosentloss) with these parameters:
"scale": 20.0,
"similarity_fct": "pairwise_cos_sim"
### Evaluation Dataset
#### Unnamed Dataset
* Size: 4,225 evaluation samples
* Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>score</code>
* Approximate statistics based on the first 1000 samples:
| | sentence1 | sentence2 | score |
| type | string | string | float |
| details | <ul><li>min: 18 tokens</li><li>mean: 31.8 tokens</li><li>max: 60 tokens</li></ul> | <ul><li>min: 15 tokens</li><li>mean: 24.59 tokens</li><li>max: 41 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.06</li><li>max: 1.0</li></ul> |
* Samples:
| sentence1 | sentence2 | score |
| <code>Presence of delusions, hallucinations or disorganized speech, for a significant portion of time within a 1-month period</code> | <code>People around me have noticed that my behavior is becoming more erratic and unpredictable.</code> | <code>1.0</code> |
| <code>Presence of delusions, hallucinations or disorganized speech, for a significant portion of time within a 1-month period</code> | <code>There are times when I repeat certain actions or words without any clear purpose, almost like being stuck in a loop.</code> | <code>0.0</code> |
| <code>Presence of delusions, hallucinations or disorganized speech, for a significant portion of time within a 1-month period</code> | <code>I feel detached from reality at times and have trouble distinguishing between what is real and what is not.</code> | <code>0.0</code> |
* Loss: [<code>CoSENTLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosentloss) with these parameters:
"scale": 20.0,
"similarity_fct": "pairwise_cos_sim"
### Training Hyperparameters
#### Non-Default Hyperparameters
- `eval_strategy`: epoch
- `per_device_train_batch_size`: 256
- `per_device_eval_batch_size`: 128
- `num_train_epochs`: 10
- `warmup_ratio`: 0.1
#### All Hyperparameters
<details><summary>Click to expand</summary>
### Training Logs
### Framework Versions
- Python: 3.10.12
- Sentence Transformers: 3.0.0
- Transformers: 4.41.1
- PyTorch: 2.3.0+cu121
- Accelerate: 0.30.1
- Datasets: 2.19.1
- Tokenizers: 0.19.1
## Citation
### BibTeX
#### Sentence Transformers
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
#### CoSENTLoss
title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
author={Su Jianlin},
## Glossary
## Model Card Authors
## Model Card Contact
