File size: 80,478 Bytes
3aeabe7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
---
language:
- code
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- dense
- generated_from_trainer
- dataset_size:197351
- loss:MultipleNegativesRankingLoss
base_model: NeuML/pubmedbert-base-embeddings
widget:
- source_sentence: ABCB7
  sentences:
  - This gene encodes a tetrameric mitochondrial flavoprotein, which is a member of
    the acyl-CoA dehydrogenase family. This enzyme catalyzes the initial step of the
    mitochondrial fatty acid beta-oxidation pathway. Mutations in this gene have been
    associated with short-chain acyl-CoA dehydrogenase (SCAD) deficiency. Alternative
    splicing results in two variants which encode different isoforms. [provided by
    RefSeq, Oct 2014]
  - The membrane-associated protein encoded by this gene is a member of the superfamily
    of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules
    across extra- and intra-cellular membranes. ABC genes are divided into seven distinct
    subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member
    of the MDR/TAP subfamily. Members of the MDR/TAP subfamily are involved in multidrug
    resistance as well as antigen presentation. This gene encodes a half-transporter
    involved in the transport of heme from the mitochondria to the cytosol. With iron/sulfur
    cluster precursors as its substrates, this protein may play a role in metal homeostasis.
    Mutations in this gene have been associated with mitochondrial iron accumulation
    and isodicentric (X)(q13) and sideroblastic anemia. Alternatively spliced transcript
    variants encoding multiple isoforms have been observed for this gene. [provided
    by RefSeq, Nov 2012]
  - The membrane-associated protein encoded by this gene is a member of the superfamily
    of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules
    across extra- and intracellular membranes. ABC genes are divided into seven distinct
    subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, and White). This encoded protein
    is a member of the ABC1 subfamily. Members of the ABC1 subfamily comprise the
    only major ABC subfamily found exclusively in multicellular eukaryotes. This gene
    is clustered among 4 other ABC1 family members on 17q24, but neither the substrate
    nor the function of this gene is known. Alternative splicing of this gene results
    in several transcript variants; however, not all variants have been fully described.
    [provided by RefSeq, Jul 2008]
- source_sentence: ABCC8
  sentences:
  - The protein encoded by this gene is a member of the superfamily of ATP-binding
    cassette (ABC) transporters. ABC proteins transport various molecules across extra-
    and intra-cellular membranes. ABC genes are divided into seven distinct subfamilies
    (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the
    MRP subfamily which is involved in multi-drug resistance. This protein functions
    as a modulator of ATP-sensitive potassium channels and insulin release. Mutations
    in the ABCC8 gene and deficiencies in the encoded protein have been observed in
    patients with hyperinsulinemic hypoglycemia of infancy, an autosomal recessive
    disorder of unregulated and high insulin secretion. Mutations have also been associated
    with non-insulin-dependent diabetes mellitus type II, an autosomal dominant disease
    of defective insulin secretion. Alternatively spliced transcript variants have
    been found for this gene. [provided by RefSeq, Jul 2020]
  - Predicted to enable GTPase activator activity and zinc ion binding activity. Predicted
    to be involved in protein transport. Located in membrane. [provided by Alliance
    of Genome Resources, Jul 2025]
  - The protein encoded by this gene is a member of the superfamily of ATP-binding
    cassette (ABC) transporters. ABC proteins transport various molecules across extra-
    and intra-cellular membranes. ABC genes are divided into seven distinct subfamilies
    (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This ABC full transporter is a
    member of the MRP subfamily which is involved in multi-drug resistance. The product
    of this gene participates in physiological processes involving bile acids, conjugated
    steroids, and cyclic nucleotides. In addition, a SNP in this gene is responsible
    for determination of human earwax type. This gene and family member ABCC12 are
    determined to be derived by duplication and are both localized to chromosome 16q12.1.
    Multiple alternatively spliced transcript variants have been described for this
    gene. [provided by RefSeq, Jul 2008]
- source_sentence: MALAT1 TMSB4X MT-CO1 MT-CO2 ACTB RPL13 MT-CO3 TPT1 EEF1A1 RPLP1
    MT-ATP6 S100A10 LGALS1 MT-ND4 VIM RPL18A SH3BGRL3 RPS24 S100A4 FTL PTMA RPS13
    RPS12 SRGN MT-ND1 RPL11 TMSB10 RPL28 RPS23 RPL10 CYBA RPS6 RPL32 RPL15 RPL19 RPS27A
    RPS4X RPL9 MT-ND2 RPL21 RPL30 RPS14 GAPDH RPL8 MT-CYB MT-ND3 RPL13A RPL10A CD74
    TAGLN2 RPL41 FTH1 RPS8 RPS7 RPL6 RPL34 RPS15A S100A6 RPL18 UBA52 YBX1 MYL6 RPL3
    RPS16
  sentences:
  - MALAT1 PTMA TMSB10 LGALS1 ACTB PRDX1 S100A4 MT-CO1 H3-3B TMSB4X MT-ATP6 MT-CO3
    VIM TPT1 LMO4 HNRNPA2B1 RPL10 SH3BGRL3 TAGLN2 MT-CO2 HNRNPU DDIT4 RPL28 PFN1 RPS3
    IGFBP7 RPS14 HMGB1 MT-ND4 RPS24 RPL13A FTH1 RPS7 CFL1 CD74 RPL30 RPL13 RPLP2 RPS4X
    RPL19 SOX4 KLF2 BST2 S100A11 RPL23A RACK1 RPL41 PSMA4 DDX5 NCL RSRP1 IRF1 RPS15A
    SERF2 EEF1A1 RPS23 CALM1 MT-CYB UBA52 CYBA HSP90AA1 MYL12A AHNAK ITM2B
  - This measurement was conducted with 10x 3' v3. This sample is derived from a 3-month-old
    male patient with KMT2A-rearranged (KMT2A-r) infant acute lymphoblastic leukemia
    (ALL) with a CD8_Cytotoxic T cell type, specifically T/NK cells, and a presumed
    MLL-AF4 fusion.
  - This measurement was conducted with 10x 3' v3. Blast cells derived from a 1-month-old
    human with a presumed MLL-AF10 fusion, projected as cDC-like cells.
- source_sentence: MALAT1 MT-ATP6 MT-CO3 MT-ND3 MT-CO1 MT-CYB CXCL14 MT-ND4 EEF1A1
    MT-ND1 VIM IGFBP7 RPL41 MT-CO2 COL1A2 FTH1 TPT1 RPL10 RPLP1 RPS12 RPS27 S100A6
    TMSB4X A2M APOE DCN RPL13 PTGDS TMSB10 LGALS1 RPL3 RPS19 MT-ND2 ACTB FBLN1 FTL
    RPL34 RPL21 RPS27A RPL30 RPS14 RPS28 RPL19 RPL11 RPS8 RPL32 RPL26 RARRES2 RPL28
    RPL37 RPS4X RPS15A RPL13A RPL35A RPS23 RPS16 RPL18A CD81 CALD1 CD63 RPS6 COL6A2
    RPL12 RPL6
  sentences:
  - MALAT1 MT-ATP6 MT-ND3 MT-CO1 MT-CO3 MT-CYB MT-CO2 MT-ND4 TMSB10 A2M FABP5 PTMA
    MT-ND1 VIM ACTB RPL13 RPS28 CAV1 SPARCL1 MT-ND2 CD74 EEF1A1 RPL41 KLF2 IFITM3
    RPL10 RPS27 CLDN5 TMSB4X TPT1 RPL15 ENPP2 RPLP1 RPL32 RPS14 TM4SF1 FOS EIF1 S100A6
    CALM1 RPL3 CD81 HES1 SRGN ID1 GNG11 IGFBP4 RPS27A STOM GSN TAGLN2 RPL8 IGFBP7
    CD320 FTH1 MCAM HSP90AA1 GNAS MYL6 TIMP3 RPL19 RPL34 RPS12 EPAS1
  - This measurement was conducted with 10x 3' v3. Fibroblasts derived from the terminal
    ileum of a female individual in her fourth decade, exhibiting Crohn's disease
    (CD) related changes.
  - This measurement was conducted with 10x 3' v3. Glial cells derived from the ileal
    epithelium of a female in her fourth decade.
- source_sentence: MALAT1 RPS27 RPL41 RPL10 RPL13 RPL21 TMSB4X RPL34 RPL13A RPS12
    RPLP1 EEF1A1 RPL32 RPS6 RPS14 RPS27A RPS4X RPS29 RPLP2 RPS19 RPL11 RPL23A RPL31
    RPS15A RPS3 RPL28 RPL27A RPL18A RPS23 RPL19 RPS28 RPS15 TMSB10 RPL7 RPL30 RPL3
    RPS8 RPL35A RPS13 RPL26 RPL15 RPL9 RPL12 RPL10A RPL37 RPS20 RPS16 RPL18 RPS5 RPL36
    RPS24 RPL8 RPL6 TPT1 RPL35 FAU RPL29 RPL37A RPSA RPL14 MT-CO3 RPL27 RPS7 RPL38
  sentences:
  - This measurement was conducted with 10x 3' v2. Classical monocyte cell sample
    from blood of a 64-year old female Asian individual with managed systemic lupus
    erythematosus (SLE).
  - This measurement was conducted with 10x 3' v2. Sample is a CD8-positive, alpha-beta
    T cell from a 29-year old Asian female with managed systemic lupus erythematosus
    (SLE). The cell was isolated from peripheral blood mononuclear cells.
  - MALAT1 RPS27 TMSB4X RPS29 RPL41 RPL10 RPL21 RPL13A RPL34 RPS12 RPL13 RPLP2 RPS15A
    EEF1A1 RPS27A RPS14 RPLP1 RPL32 RPS28 RPS6 RPS19 RPL27A RPL23A RPL11 RPL28 RPS8
    RPS3 RPL18A RPS15 RPL37 RPS4X MT-CO1 RPL31 RPL3 RPL19 RPL30 RPL26 RPL35A RPS23
    MT-CO3 RPS20 TMSB10 RPL36 RPL7 RPL9 RPS26 RPL37A TPT1 MT-CO2 MT-ND2 MT-ND3 RPS16
    RPS13 RPL35 RPS24 RPL38 RPL12 RPL18 RPL6 RPL15 PTMA RPL14 RPL10A MT-ND4
datasets:
- jo-mengr/cellxgene_pseudo_bulk_100k_multiplets_natural_language_annotation
- jo-mengr/descriptions_genes
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy
model-index:
- name: SentenceTransformer based on NeuML/pubmedbert-base-embeddings
  results:
  - task:
      type: triplet
      name: Triplet
    dataset:
      name: cellxgene pseudo bulk 100k multiplets natural language annotation cell
        sentence 2
      type: cellxgene_pseudo_bulk_100k_multiplets_natural_language_annotation_cell_sentence_2
    metrics:
    - type: cosine_accuracy
      value: 0.7920319437980652
      name: Cosine Accuracy
  - task:
      type: triplet
      name: Triplet
    dataset:
      name: gene description
      type: gene_description
    metrics:
    - type: cosine_accuracy
      value: 0.8550000190734863
      name: Cosine Accuracy
---

# SentenceTransformer based on NeuML/pubmedbert-base-embeddings

This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [NeuML/pubmedbert-base-embeddings](https://huggingface.co/NeuML/pubmedbert-base-embeddings) on the [cellxgene_pseudo_bulk_100k_multiplets_natural_language_annotation](https://huggingface.co/datasets/jo-mengr/cellxgene_pseudo_bulk_100k_multiplets_natural_language_annotation) and [gene_description](https://huggingface.co/datasets/jo-mengr/descriptions_genes) datasets. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

## Model Details

### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [NeuML/pubmedbert-base-embeddings](https://huggingface.co/NeuML/pubmedbert-base-embeddings) <!-- at revision d6eaca8254bc229f3ca42749a5510ae287eb3486 -->
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 768 dimensions
- **Similarity Function:** Cosine Similarity
- **Training Datasets:**
    - [cellxgene_pseudo_bulk_100k_multiplets_natural_language_annotation](https://huggingface.co/datasets/jo-mengr/cellxgene_pseudo_bulk_100k_multiplets_natural_language_annotation)
    - [gene_description](https://huggingface.co/datasets/jo-mengr/descriptions_genes)
- **Language:** code
<!-- - **License:** Unknown -->

### Model Sources

- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

### Full Model Architecture

```
SentenceTransformer(
  (0): MMContextEncoder(
    (text_encoder): BertModel(
      (embeddings): BertEmbeddings(
        (word_embeddings): Embedding(30522, 768, padding_idx=0)
        (position_embeddings): Embedding(512, 768)
        (token_type_embeddings): Embedding(2, 768)
        (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (encoder): BertEncoder(
        (layer): ModuleList(
          (0-11): 12 x BertLayer(
            (attention): BertAttention(
              (self): BertSdpaSelfAttention(
                (query): Linear(in_features=768, out_features=768, bias=True)
                (key): Linear(in_features=768, out_features=768, bias=True)
                (value): Linear(in_features=768, out_features=768, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (output): BertSelfOutput(
                (dense): Linear(in_features=768, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
            (intermediate): BertIntermediate(
              (dense): Linear(in_features=768, out_features=3072, bias=True)
              (intermediate_act_fn): GELUActivation()
            )
            (output): BertOutput(
              (dense): Linear(in_features=3072, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
        )
      )
      (pooler): BertPooler(
        (dense): Linear(in_features=768, out_features=768, bias=True)
        (activation): Tanh()
      )
    )
    (pooling): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  )
)
```

## Usage

### Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

```bash
pip install -U sentence-transformers
```

Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("jo-mengr/mmcontext-pubmedbert-scvi_fm-v3")
# Run inference
sentences = [
    'MALAT1 RPS27 RPL41 RPL10 RPL13 RPL21 TMSB4X RPL34 RPL13A RPS12 RPLP1 EEF1A1 RPL32 RPS6 RPS14 RPS27A RPS4X RPS29 RPLP2 RPS19 RPL11 RPL23A RPL31 RPS15A RPS3 RPL28 RPL27A RPL18A RPS23 RPL19 RPS28 RPS15 TMSB10 RPL7 RPL30 RPL3 RPS8 RPL35A RPS13 RPL26 RPL15 RPL9 RPL12 RPL10A RPL37 RPS20 RPS16 RPL18 RPS5 RPL36 RPS24 RPL8 RPL6 TPT1 RPL35 FAU RPL29 RPL37A RPSA RPL14 MT-CO3 RPL27 RPS7 RPL38',
    "This measurement was conducted with 10x 3' v2. Sample is a CD8-positive, alpha-beta T cell from a 29-year old Asian female with managed systemic lupus erythematosus (SLE). The cell was isolated from peripheral blood mononuclear cells.",
    "This measurement was conducted with 10x 3' v2. Classical monocyte cell sample from blood of a 64-year old female Asian individual with managed systemic lupus erythematosus (SLE).",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.7308, 0.4065],
#         [0.7308, 1.0000, 0.5852],
#         [0.4065, 0.5852, 1.0000]])
```

<!--
### Direct Usage (Transformers)

<details><summary>Click to see the direct usage in Transformers</summary>

</details>
-->

<!--
### Downstream Usage (Sentence Transformers)

You can finetune this model on your own dataset.

<details><summary>Click to expand</summary>

</details>
-->

<!--
### Out-of-Scope Use

*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->

## Evaluation

### Metrics

#### Triplet

* Datasets: `cellxgene_pseudo_bulk_100k_multiplets_natural_language_annotation_cell_sentence_2` and `gene_description`
* Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)

| Metric              | cellxgene_pseudo_bulk_100k_multiplets_natural_language_annotation_cell_sentence_2 | gene_description |
|:--------------------|:----------------------------------------------------------------------------------|:-----------------|
| **cosine_accuracy** | **0.792**                                                                         | **0.855**        |

<!--
## Bias, Risks and Limitations

*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->

<!--
### Recommendations

*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->

## Training Details

### Training Datasets

#### cellxgene_pseudo_bulk_100k_multiplets_natural_language_annotation

* Dataset: [cellxgene_pseudo_bulk_100k_multiplets_natural_language_annotation](https://huggingface.co/datasets/jo-mengr/cellxgene_pseudo_bulk_100k_multiplets_natural_language_annotation) at [d518eb2](https://huggingface.co/datasets/jo-mengr/cellxgene_pseudo_bulk_100k_multiplets_natural_language_annotation/tree/d518eb24af305653b43acd9e26f9502632059e7c)
* Size: 81,143 training samples
* Columns: <code>anchor</code>, <code>positive</code>, <code>negative_1</code>, and <code>negative_2</code>
* Approximate statistics based on the first 1000 samples:
  |         | anchor                                                                                            | positive                                                                                         | negative_1                                                                                         | negative_2                                                                                        |
  |:--------|:--------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------|
  | type    | string                                                                                            | string                                                                                           | string                                                                                             | string                                                                                            |
  | details | <ul><li>min: 365 characters</li><li>mean: 389.52 characters</li><li>max: 450 characters</li></ul> | <ul><li>min: 92 characters</li><li>mean: 216.13 characters</li><li>max: 900 characters</li></ul> | <ul><li>min: 103 characters</li><li>mean: 212.72 characters</li><li>max: 1186 characters</li></ul> | <ul><li>min: 358 characters</li><li>mean: 389.11 characters</li><li>max: 433 characters</li></ul> |
* Samples:
  | anchor                                                                                                                                                                                                                                                                                                                                                                                                                           | positive                                                                                                                                                                                                                                                     | negative_1                                                                                                                                                                                                                                                   | negative_2                                                                                                                                                                                                                                                                                                                                                                                                          |
  |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
  | <code>TMSB4X TMSB10 ACTB RPL13A MT-CO3 MALAT1 GNLY RPS15A RPS27 NKG7 IFITM2 RPL12 RPL23A MT-CO2 RPS19 RPS3 RPLP2 RPL28 RPL6 LGALS1 RPL21 RPS6 RPLP1 GZMA EEF1A1 RPL26 RPL37A RPS29 PFN1 RPL34 RPS15 RPS24 RPL11 RPL32 HMGB2 FTH1 RPS23 PTMA MT-CO1 RPL39 RPS20 HSP90AA1 GZMB RPL19 ARHGDIB HNRNPA2B1 PLAAT4 RPS8 RPL37 RPL10 FAU CMC1 RPL41 VIM RPL31 RPL3 MYL12A RPS16 RPL5 CBX3 ATP5F1E HCST RPL27 RPL35</code>                | <code>This measurement was conducted with 10x 3' v2. A proliferating lymphocyte cell sample, obtained from a 34-year-old female Asian individual, derived from peripheral blood mononuclear cells.</code>                                                    | <code>This measurement was conducted with 10x 3' v2. Sample is a CD8-positive, alpha-beta T cell derived from a 31-year-old Asian female's peripheral blood mononuclear cells.</code>                                                                        | <code>MALAT1 RPS27 RPL41 RPL34 RPL21 RPL10 TMSB4X RPL13 RPL13A RPL32 RPS12 RPLP1 RPS29 RPS14 RPS6 EEF1A1 RPS27A RPLP2 RPS19 RPS4X RPS28 RPL39 RPS15A RPL11 RPL27A RPL23A RPS15 RPL18A RPL12 RPL31 RPL26 RPL28 RPL19 RPS8 RPS3 RPL3 RPL36 RPL7 RPL30 RPS23 TMSB10 RPL37 RPL35A RPS13 RPL15 RPL10A MT-CO3 RPS20 RPL18 RPL35 RPL9 RPS16 RPS24 RPS21 RPL37A MT-CO2 RPL29 RPS5 RPL6 RPL8 RPL38 RPL14 MT-CO1 RPL27</code> |
  | <code>EEF1A1 MALAT1 RPL10 RPS27 RPS12 RPLP1 MT2A RPL41 RPL39 RPL30 MT-ND4L FTH1 RPL13 MT-CO2 RPL32 JUNB RPL28 RPS19 RPL34 TPT1 RPS28 RPS15A RPS27A MT-CYB RPS3 RPS23 RPS4X RPL11 RPS8 RPS14 RPS15 RPL37 RPL5 RPS21 RPS13 FOS RPL19 MT-ND3 RPS29 RPL26 RPL3 RPL18A RPL8 MT-CO1 TMSB10 RPL35A RPL14 RPS6 RPL29 MT-ATP8 RPLP2 RPL36 BTG1 RPL23A RPL18 RPL6 RPSA TMSB4X ZFP36L2 NACA PABPC1 ACTB RPS7 MT-CO3</code>                  | <code>This measurement was conducted with 10x 5' v1. Sample is a cell from the omentum tissue, specifically an effector memory CD4-positive, alpha-beta T cell, from a female in her sixth decade.</code>                                                    | <code>This measurement was conducted with 10x 5' v1. Sample is a CD4-positive helper T cell, specifically Trm_Th1/Th17 subset, derived from the duodenum tissue of a male individual in his sixth decade.</code>                                             | <code>MALAT1 MT-ATP6 MT-CO2 RPLP1 MT-CO1 TPT1 MT-CO3 RPS27 MT-CYB MT-ND3 RPL41 RPL10 MT-ND4 EEF1A1 VIM JUND TMSB4X RPS12 RPL13 PTMA RPL39 FTH1 RPS27A RPL30 RPS29 RPL32 RPL34 RPS19 RPL28 RPS15A RPL21 RPL37 MT-ND2 CRIP1 ANXA1 RPL11 RPS14 RPS28 RPS6 RPS8 RPS3 EIF1 RPS23 RPS13 RPS24 UBC MT-ND1 RPL19 RPS15 H3-3B RPL26 RPL9 RPS21 RPLP2 RPL35A RPL37A RPL12 RPS4X ACTB RPL3 RPS16 SRGN RPL36 RPL13A</code>      |
  | <code>MALAT1 GRIK1 SYT1 PCDH9 RORA NRG1 CADPS ZFPM2 LRRC4C LINGO2 RALYL PTPRD SPHKAP CNTNAP5 SLC8A1 CCSER1 HDAC9 CELF2 R3HDM1 CNTN4 RBMS3 PCDH7 GALNT13 UNC5D ROBO1 SYNPR SNAP25 GPM6A ANK3 FRMPD4 CHRM2 RYR2 KHDRBS2 CADM1 CACNA1D RGS6 PDE4D DOCK4 UNC13C CDH18 FAT3 MEG3 NR2F2-AS1 HMCN1 GULP1 CAMK2D ZEB1 SYN2 DYNC1I1 OXR1 DPP10 OSBPL6 FRAS1 PPP3CA ZNF385D ZMAT4 PCBP3 HS6ST3 ERC2 PLEKHA5 CDK14 MAP2 NCOA1 ATP8A2</code> | <code>This measurement was conducted with 10x 3' v3. Neuron cell type from a 29-year-old male, specifically from the thalamic complex, specifically the thalamus (THM) - posterior nuclear complex of thalamus (PoN) - medial geniculate nuclei (MG).</code> | <code>This measurement was conducted with 10x 3' v3. Astrocyte cell type from the thalamic complex, specifically from the thalamus (THM) - posterior nuclear complex of thalamus (PoN) - medial geniculate nuclei (MG) region, of a 42-year-old male.</code> | <code>MALAT1 PCDH9 PLP1 MBP ST18 QKI PDE4B RNF220 PTPRD SEPTIN7 TTLL7 NCKAP5 GPM6B PIP4K2A MOBP SLC44A1 PTGDS PLCL1 MAP7 ELMO1 SIK3 FTH1 TMTC2 ZBTB20 MAN2A1 TMEM165 DOCK10 TCF12 EDIL3 ZEB2 DPYD MAP4K4 PHLPP1 TF GAB1 TRIM2 FRMD4B DNAJC6 MARCHF1 ANK3 DST AGAP1 TMEM144 NEAT1 PLEKHH1 DLG1 CRYAB ERBIN RTN4 SPP1 ATP8A1 DOCK4 SLAIN1 APP DOCK5 APBB2 SAMD12 SHTN1 ZNF536 ZFYVE16 ARAP2 LIMCH1 HIPK2 BCAS1</code> |
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
  ```json
  {
      "scale": 20.0,
      "similarity_fct": "cos_sim"
  }
  ```

#### gene_description

* Dataset: [gene_description](https://huggingface.co/datasets/jo-mengr/descriptions_genes) at [dd22363](https://huggingface.co/datasets/jo-mengr/descriptions_genes/tree/dd22363de0a7c501f41ba324fb3b8d6ecdd14dc7)
* Size: 116,208 training samples
* Columns: <code>anchor</code>, <code>positive</code>, and <code>negative_1</code>
* Approximate statistics based on the first 1000 samples:
  |         | anchor                                                                                       | positive                                                                                          | negative_1                                                                                        |
  |:--------|:---------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------|
  | type    | string                                                                                       | string                                                                                            | string                                                                                            |
  | details | <ul><li>min: 3 characters</li><li>mean: 5.88 characters</li><li>max: 12 characters</li></ul> | <ul><li>min: 16 characters</li><li>mean: 367.09 characters</li><li>max: 1375 characters</li></ul> | <ul><li>min: 13 characters</li><li>mean: 167.33 characters</li><li>max: 1375 characters</li></ul> |
* Samples:
  | anchor            | positive                                                                                                                                                                                                                                          | negative_1                        |
  |:------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------|
  | <code>A1BG</code> | <code>The protein encoded by this gene is a plasma glycoprotein of unknown function. The protein shows sequence similarity to the variable regions of some immunoglobulin supergene family member proteins. [provided by RefSeq, Jul 2008]</code> | <code>A1BG antisense RNA 1</code> |
  | <code>A1BG</code> | <code>The protein encoded by this gene is a plasma glycoprotein of unknown function. The protein shows sequence similarity to the variable regions of some immunoglobulin supergene family member proteins. [provided by RefSeq, Jul 2008]</code> | <code>G antigen 12D</code>        |
  | <code>A1BG</code> | <code>The protein encoded by this gene is a plasma glycoprotein of unknown function. The protein shows sequence similarity to the variable regions of some immunoglobulin supergene family member proteins. [provided by RefSeq, Jul 2008]</code> | <code>G antigen 12B</code>        |
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
  ```json
  {
      "scale": 20.0,
      "similarity_fct": "cos_sim"
  }
  ```

### Evaluation Datasets

#### cellxgene_pseudo_bulk_100k_multiplets_natural_language_annotation

* Dataset: [cellxgene_pseudo_bulk_100k_multiplets_natural_language_annotation](https://huggingface.co/datasets/jo-mengr/cellxgene_pseudo_bulk_100k_multiplets_natural_language_annotation) at [d518eb2](https://huggingface.co/datasets/jo-mengr/cellxgene_pseudo_bulk_100k_multiplets_natural_language_annotation/tree/d518eb24af305653b43acd9e26f9502632059e7c)
* Size: 9,011 evaluation samples
* Columns: <code>anchor</code>, <code>positive</code>, <code>negative_1</code>, and <code>negative_2</code>
* Approximate statistics based on the first 1000 samples:
  |         | anchor                                                                                            | positive                                                                                         | negative_1                                                                                       | negative_2                                                                                        |
  |:--------|:--------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------|
  | type    | string                                                                                            | string                                                                                           | string                                                                                           | string                                                                                            |
  | details | <ul><li>min: 363 characters</li><li>mean: 390.19 characters</li><li>max: 437 characters</li></ul> | <ul><li>min: 99 characters</li><li>mean: 209.99 characters</li><li>max: 941 characters</li></ul> | <ul><li>min: 101 characters</li><li>mean: 208.8 characters</li><li>max: 728 characters</li></ul> | <ul><li>min: 356 characters</li><li>mean: 390.44 characters</li><li>max: 433 characters</li></ul> |
* Samples:
  | anchor                                                                                                                                                                                                                                                                                                                                                                                                                     | positive                                                                                                                                                                                                                                                                    | negative_1                                                                                                                                                                                                                                                | negative_2                                                                                                                                                                                                                                                                                                                                                                                                                        |
  |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
  | <code>MT-CO1 MT-CO2 MT-ND3 MT-CO3 MT-ND4 MT-ATP6 MT-CYB MT-ND2 MALAT1 MT-ND1 RPL41 EEF1A1 RPS12 RPLP1 MT-ND4L RPS24 MT-ND5 RPL34 RPS27 RPL10 RPL32 RPL13 RPL11 RPS28 RPL28 RPS15 RPL36 RPL30 RPS27A RPL26 FTH1 TMSB4X ACTB FTL RPL24 RTN4 RPS15A RPS14 RPL3 RPS19 ATP6V0B TPT1 RPS8 FAU RPL8 RPS23 S100A6 RPL18 RPL18A RPL19 RPS13 RPL35 RPS6 RPL37 RPS3 RPL35A NDUFA4 RPS4X RPL21 ATP5F1E COX7C ITM2B RPL29 IGFBP7</code> | <code>This measurement was conducted with 10x 3' v3. Cell sample from the cortex of kidney, taken from a 43-year-old male of European ethnicity with a reported history of kidney cancer. The cell type is identified as a kidney collecting duct intercalated cell.</code> | <code>This measurement was conducted with 10x 3' v3. Cell sample from the cortex of kidney, taken from a 72-year-old male of European ethnicity, identified as a kidney collecting duct intercalated cell, and preserved through cryopreservation.</code> | <code>MALAT1 MT-CO1 MT-ND3 MT-ATP6 MT-CO2 MT-CO3 MT-ND4 MT-CYB TMSB4X MT-ND2 RPS27 TMSB10 RPL10 MT-ND1 RPL41 ACTB TXNIP RPS27A RPS12 EEF1A1 RPL13 RPS19 RPL32 RPS3 RPL28 RPS15A RPLP1 RPS24 RPL30 RPS29 RPL34 RPL11 RPL26 RPS23 MT-ND5 RPL19 RPS15 RPL18 RPL3 TPT1 RPL37 RPS14 RPS28 PFN1 BTG1 RPS6 FAU RPL9 RPL15 PTMA S100A4 MT-ND4L ATP5F1E RPS8 RPL27A RPS21 RPS7 EIF1 RPL14 RPL12 RPS4X RPL23A FTL RPL18A</code>             |
  | <code>MALAT1 KCND2 NRXN1 CDH18 NRXN3 ZNF385D CADM2 RALYL NKAIN2 CADPS2 RIMS1 FSTL5 GRID2 TRPM3 CHN2 DPP6 JMJD1C RORA PDE1A UNC13C TIAM1 NRG1 SNAP25 ZFPM2 CALN1 LSAMP CNTN1 ABLIM1 SYNE1 ANK3 CA10 NFIA ZBTB20 NTM CADM1 OPCML RELN DNM3 MT-CO3 NEBL ERC1 SCN2A PPP3CA CACNA1A GALNT13 LRRC4C GPM6A RABGAP1L RIT2 CAMK4 GRIA4 PTPRD RBFOX3 MCTP1 LHFPL6 PCLO MEG3 PDE10A NOVA1 RTN1 ZNF385B CNTN4 GABRB2 SPOCK1</code>     | <code>This measurement was conducted with 10x 3' v3. Neuron cell type from a 29-year-old male cerebellum, specifically from the Cerebellar Vermis - CBV region, with European self-reported ethnicity, analyzed at the nucleus level.</code>                                | <code>This measurement was conducted with 10x 3' v3. Sample is an oligodendrocyte precursor cell taken from the cerebellum tissue of a 42-year-old human male, specifically from the Cerebellum (CB) - Cerebellar Vermis - CBV dissection.</code>         | <code>MALAT1 NRXN3 SNTG1 UNC5C GRIA4 NRG1 RORA INPP4B CLSTN2 NKAIN2 FRMD4A DPP6 GRID2 NRXN1 LSAMP JMJD1C HS6ST3 NXPH1 MIR99AHG LRRC4C NTM CCNH NFIA ZFPM2 AFF3 OPCML PTPRT CADM2 ZBTB20 OLFM3 SLC22A3 CNTNAP5 CACNA2D3 CNTN4 KCND2 ADARB2 XKR4 GPM6A IL1RAPL1 ALK ANKRD36C UBE2E2 SYN3 GARNL3 PTPRG DAB1 TCF4 LINC00461 PRANCR GRIN2B TNRC6B MAPK10 NOVA1 NFIB ANK3 KCNMA1 KCNQ5 SPON1 TRIM9 VWA8 GDAP1 GABRG2 AHI1 ATP1B1</code> |
  | <code>EEF1A1 RPL28 RPLP1 RPS8 RPL10 ACTB RPL41 RPS4X GAPDH RPS27 RPS15A RPS23 RPS12 RPS3 RPLP0 RPS7 RPL11 RPL32 RPS24 RPL12 HMGN2 RPS19 RPL34 RPS28 RPL8 PTMA RPS13 RPL19 RPL37 RPL30 RPL6 RPS14 RPL15 SERF2 RPL18A RPLP2 TMSB4X RPS6 CD74 RPL29 RPL13 RPL18 RPS15 RPSA RPL26 PABPC1 RPS27A FTH1 RPL5 TMSB10 RPS21 RPL14 FAU RPL23A PFN1 RPL35A RPS5 RPS16 HMGN1 OAZ1 HMGB1 TPT1 PPIA NACA</code>                          | <code>This measurement was conducted with 10x 5' v1. Cell sample from the tonsil of a 9-year-old female with recurrent tonsillitis, characterized as a centroblast B cell with IGLC2, IGLV7-43, IGLJ3 immunoglobulin genes expressed.</code>                                | <code>This measurement was conducted with 10x 5' v1. Germinal center B cell derived from the tonsil tissue of a 3-year-old male with recurrent tonsillitis.</code>                                                                                        | <code>CD74 RPL10 MALAT1 EEF1A1 RPLP1 RPL28 RPL41 RPL13 RPS8 SSR4 TPT1 RPLP0 RPS15A RPL18A UBC RPL37 RPS12 EEF2 RPL19 RPS4X RPL3 RPS27 RPS23 RPL11 RPS28 SAT1 RPS3 RPL34 RPS13 RACK1 RPL29 RPL32 RPS7 RPS19 RPL18 RPL8 RPL30 RPL12 RPS15 RPS14 RPS6 SEC11C RPL15 RPS5 ATP5MG RPL23A RPL35A RPS27A FAU TSC22D3 RPL6 PPIB XBP1 FTL GAPDH RPL5 HLA-DRB5 RPL14 HERPUD1 RGS2 HSPA8 RPL36 RPL26 RPL9</code>                              |
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
  ```json
  {
      "scale": 20.0,
      "similarity_fct": "cos_sim"
  }
  ```

#### gene_description

* Dataset: [gene_description](https://huggingface.co/datasets/jo-mengr/descriptions_genes) at [dd22363](https://huggingface.co/datasets/jo-mengr/descriptions_genes/tree/dd22363de0a7c501f41ba324fb3b8d6ecdd14dc7)
* Size: 1,000 evaluation samples
* Columns: <code>anchor</code>, <code>positive</code>, and <code>negative_1</code>
* Approximate statistics based on the first 1000 samples:
  |         | anchor                                                                                       | positive                                                                                          | negative_1                                                                                        |
  |:--------|:---------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------|
  | type    | string                                                                                       | string                                                                                            | string                                                                                            |
  | details | <ul><li>min: 3 characters</li><li>mean: 5.88 characters</li><li>max: 12 characters</li></ul> | <ul><li>min: 16 characters</li><li>mean: 367.09 characters</li><li>max: 1375 characters</li></ul> | <ul><li>min: 13 characters</li><li>mean: 167.33 characters</li><li>max: 1375 characters</li></ul> |
* Samples:
  | anchor            | positive                                                                                                                                                                                                                                          | negative_1                        |
  |:------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------|
  | <code>A1BG</code> | <code>The protein encoded by this gene is a plasma glycoprotein of unknown function. The protein shows sequence similarity to the variable regions of some immunoglobulin supergene family member proteins. [provided by RefSeq, Jul 2008]</code> | <code>A1BG antisense RNA 1</code> |
  | <code>A1BG</code> | <code>The protein encoded by this gene is a plasma glycoprotein of unknown function. The protein shows sequence similarity to the variable regions of some immunoglobulin supergene family member proteins. [provided by RefSeq, Jul 2008]</code> | <code>G antigen 12D</code>        |
  | <code>A1BG</code> | <code>The protein encoded by this gene is a plasma glycoprotein of unknown function. The protein shows sequence similarity to the variable regions of some immunoglobulin supergene family member proteins. [provided by RefSeq, Jul 2008]</code> | <code>G antigen 12B</code>        |
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
  ```json
  {
      "scale": 20.0,
      "similarity_fct": "cos_sim"
  }
  ```

### Training Hyperparameters
#### Non-Default Hyperparameters

- `eval_strategy`: steps
- `per_device_train_batch_size`: 128
- `per_device_eval_batch_size`: 128
- `learning_rate`: 2e-05
- `num_train_epochs`: 4
- `warmup_ratio`: 0.1
- `bf16`: True
- `gradient_checkpointing`: True

#### All Hyperparameters
<details><summary>Click to expand</summary>

- `overwrite_output_dir`: False
- `do_predict`: False
- `eval_strategy`: steps
- `prediction_loss_only`: True
- `per_device_train_batch_size`: 128
- `per_device_eval_batch_size`: 128
- `per_gpu_train_batch_size`: None
- `per_gpu_eval_batch_size`: None
- `gradient_accumulation_steps`: 1
- `eval_accumulation_steps`: None
- `torch_empty_cache_steps`: None
- `learning_rate`: 2e-05
- `weight_decay`: 0.0
- `adam_beta1`: 0.9
- `adam_beta2`: 0.999
- `adam_epsilon`: 1e-08
- `max_grad_norm`: 1.0
- `num_train_epochs`: 4
- `max_steps`: -1
- `lr_scheduler_type`: linear
- `lr_scheduler_kwargs`: {}
- `warmup_ratio`: 0.1
- `warmup_steps`: 0
- `log_level`: passive
- `log_level_replica`: warning
- `log_on_each_node`: True
- `logging_nan_inf_filter`: True
- `save_safetensors`: True
- `save_on_each_node`: False
- `save_only_model`: False
- `restore_callback_states_from_checkpoint`: False
- `no_cuda`: False
- `use_cpu`: False
- `use_mps_device`: False
- `seed`: 42
- `data_seed`: None
- `jit_mode_eval`: False
- `use_ipex`: False
- `bf16`: True
- `fp16`: False
- `fp16_opt_level`: O1
- `half_precision_backend`: auto
- `bf16_full_eval`: False
- `fp16_full_eval`: False
- `tf32`: None
- `local_rank`: 0
- `ddp_backend`: None
- `tpu_num_cores`: None
- `tpu_metrics_debug`: False
- `debug`: []
- `dataloader_drop_last`: False
- `dataloader_num_workers`: 0
- `dataloader_prefetch_factor`: None
- `past_index`: -1
- `disable_tqdm`: False
- `remove_unused_columns`: True
- `label_names`: None
- `load_best_model_at_end`: False
- `ignore_data_skip`: False
- `fsdp`: []
- `fsdp_min_num_params`: 0
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
- `fsdp_transformer_layer_cls_to_wrap`: None
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
- `deepspeed`: None
- `label_smoothing_factor`: 0.0
- `optim`: adamw_torch
- `optim_args`: None
- `adafactor`: False
- `group_by_length`: False
- `length_column_name`: length
- `ddp_find_unused_parameters`: None
- `ddp_bucket_cap_mb`: None
- `ddp_broadcast_buffers`: False
- `dataloader_pin_memory`: True
- `dataloader_persistent_workers`: False
- `skip_memory_metrics`: True
- `use_legacy_prediction_loop`: False
- `push_to_hub`: False
- `resume_from_checkpoint`: None
- `hub_model_id`: None
- `hub_strategy`: every_save
- `hub_private_repo`: None
- `hub_always_push`: False
- `hub_revision`: None
- `gradient_checkpointing`: True
- `gradient_checkpointing_kwargs`: None
- `include_inputs_for_metrics`: False
- `include_for_metrics`: []
- `eval_do_concat_batches`: True
- `fp16_backend`: auto
- `push_to_hub_model_id`: None
- `push_to_hub_organization`: None
- `mp_parameters`: 
- `auto_find_batch_size`: False
- `full_determinism`: False
- `torchdynamo`: None
- `ray_scope`: last
- `ddp_timeout`: 1800
- `torch_compile`: False
- `torch_compile_backend`: None
- `torch_compile_mode`: None
- `include_tokens_per_second`: False
- `include_num_input_tokens_seen`: False
- `neftune_noise_alpha`: None
- `optim_target_modules`: None
- `batch_eval_metrics`: False
- `eval_on_start`: False
- `use_liger_kernel`: False
- `liger_kernel_config`: None
- `eval_use_gather_object`: False
- `average_tokens_across_devices`: False
- `prompts`: None
- `batch_sampler`: batch_sampler
- `multi_dataset_batch_sampler`: proportional
- `router_mapping`: {}
- `learning_rate_mapping`: {}

</details>

### Training Logs
<details><summary>Click to expand</summary>

| Epoch  | Step | Training Loss | cellxgene pseudo bulk 100k multiplets natural language annotation loss | gene description loss | cellxgene_pseudo_bulk_100k_multiplets_natural_language_annotation_cell_sentence_2_cosine_accuracy | gene_description_cosine_accuracy |
|:------:|:----:|:-------------:|:----------------------------------------------------------------------:|:---------------------:|:-------------------------------------------------------------------------------------------------:|:--------------------------------:|
| 0.0324 | 50   | 11.1118       | 19.5753                                                                | 5.9302                | 0.5109                                                                                            | 0.1640                           |
| 0.0649 | 100  | 8.5954        | 18.3310                                                                | 5.4544                | 0.5140                                                                                            | 0.1800                           |
| 0.0973 | 150  | 9.2422        | 15.3028                                                                | 4.9547                | 0.5157                                                                                            | 0.2050                           |
| 0.1297 | 200  | 7.4027        | 12.1164                                                                | 4.6149                | 0.5179                                                                                            | 0.3010                           |
| 0.1621 | 250  | 6.3683        | 8.7628                                                                 | 4.4274                | 0.5128                                                                                            | 0.3680                           |
| 0.1946 | 300  | 4.8876        | 7.1226                                                                 | 4.3115                | 0.5173                                                                                            | 0.4290                           |
| 0.2270 | 350  | 4.2794        | 6.1769                                                                 | 4.1100                | 0.5200                                                                                            | 0.5230                           |
| 0.2594 | 400  | 3.9819        | 5.5841                                                                 | 4.0913                | 0.5491                                                                                            | 0.5470                           |
| 0.2918 | 450  | 3.3978        | 5.4411                                                                 | 3.9073                | 0.5835                                                                                            | 0.5910                           |
| 0.3243 | 500  | 3.382         | 5.3812                                                                 | 3.7190                | 0.6073                                                                                            | 0.6380                           |
| 0.3567 | 550  | 3.3258        | 5.1994                                                                 | 3.6217                | 0.6317                                                                                            | 0.6570                           |
| 0.3891 | 600  | 3.1445        | 5.0669                                                                 | 3.5130                | 0.6525                                                                                            | 0.6790                           |
| 0.4215 | 650  | 2.821         | 5.1486                                                                 | 3.4302                | 0.6592                                                                                            | 0.6960                           |
| 0.4540 | 700  | 3.1259        | 5.1082                                                                 | 3.3893                | 0.6745                                                                                            | 0.6940                           |
| 0.4864 | 750  | 2.5501        | 5.0555                                                                 | 3.3233                | 0.6844                                                                                            | 0.6980                           |
| 0.5188 | 800  | 2.7482        | 4.8770                                                                 | 3.2845                | 0.6963                                                                                            | 0.7350                           |
| 0.5512 | 850  | 3.0687        | 4.8827                                                                 | 3.2678                | 0.7028                                                                                            | 0.7250                           |
| 0.5837 | 900  | 2.7547        | 4.7859                                                                 | 3.2468                | 0.7092                                                                                            | 0.7140                           |
| 0.6161 | 950  | 2.5732        | 4.7323                                                                 | 3.2219                | 0.7160                                                                                            | 0.7300                           |
| 0.6485 | 1000 | 2.5944        | 4.8185                                                                 | 3.1714                | 0.7169                                                                                            | 0.7540                           |
| 0.6809 | 1050 | 2.5687        | 4.6262                                                                 | 3.1597                | 0.7253                                                                                            | 0.7360                           |
| 0.7134 | 1100 | 2.8425        | 4.6943                                                                 | 3.1093                | 0.7343                                                                                            | 0.7560                           |
| 0.7458 | 1150 | 2.3715        | 4.6413                                                                 | 3.1107                | 0.7327                                                                                            | 0.7480                           |
| 0.7782 | 1200 | 2.6028        | 4.5452                                                                 | 3.1065                | 0.7397                                                                                            | 0.7490                           |
| 0.8106 | 1250 | 2.6916        | 4.5529                                                                 | 3.0629                | 0.7426                                                                                            | 0.7710                           |
| 0.8431 | 1300 | 2.5536        | 4.5393                                                                 | 3.0937                | 0.7442                                                                                            | 0.7730                           |
| 0.8755 | 1350 | 2.3964        | 4.5170                                                                 | 3.0533                | 0.7474                                                                                            | 0.7750                           |
| 0.9079 | 1400 | 2.5294        | 4.4737                                                                 | 3.0284                | 0.7514                                                                                            | 0.7810                           |
| 0.9403 | 1450 | 2.3428        | 4.5252                                                                 | 3.0048                | 0.7523                                                                                            | 0.7860                           |
| 0.9728 | 1500 | 2.2832        | 4.4570                                                                 | 3.0046                | 0.7546                                                                                            | 0.7880                           |
| 1.0052 | 1550 | 2.4838        | 4.4645                                                                 | 2.9743                | 0.7560                                                                                            | 0.7870                           |
| 1.0376 | 1600 | 2.2069        | 4.4958                                                                 | 2.9889                | 0.7565                                                                                            | 0.7900                           |
| 1.0700 | 1650 | 2.1644        | 4.4804                                                                 | 2.9450                | 0.7577                                                                                            | 0.8090                           |
| 1.1025 | 1700 | 2.2339        | 4.4097                                                                 | 2.9550                | 0.7584                                                                                            | 0.7950                           |
| 1.1349 | 1750 | 2.3097        | 4.4550                                                                 | 2.9476                | 0.7589                                                                                            | 0.7940                           |
| 1.1673 | 1800 | 2.0396        | 4.4098                                                                 | 2.9459                | 0.7584                                                                                            | 0.7960                           |
| 1.1997 | 1850 | 2.2754        | 4.3819                                                                 | 2.9214                | 0.7630                                                                                            | 0.8090                           |
| 1.2322 | 1900 | 2.2027        | 4.4073                                                                 | 2.8998                | 0.7635                                                                                            | 0.8160                           |
| 1.2646 | 1950 | 2.233         | 4.3522                                                                 | 2.9309                | 0.7637                                                                                            | 0.7990                           |
| 1.2970 | 2000 | 2.1282        | 4.3914                                                                 | 2.9119                | 0.7657                                                                                            | 0.8030                           |
| 1.3294 | 2050 | 2.2827        | 4.4009                                                                 | 2.9088                | 0.7621                                                                                            | 0.8100                           |
| 1.3619 | 2100 | 2.1032        | 4.3749                                                                 | 2.9004                | 0.7674                                                                                            | 0.8090                           |
| 1.3943 | 2150 | 2.1256        | 4.3401                                                                 | 2.8910                | 0.7680                                                                                            | 0.8090                           |
| 1.4267 | 2200 | 2.2666        | 4.3397                                                                 | 2.8794                | 0.7701                                                                                            | 0.8150                           |
| 1.4591 | 2250 | 2.3281        | 4.3065                                                                 | 2.8766                | 0.7704                                                                                            | 0.8150                           |
| 1.4916 | 2300 | 2.1385        | 4.3089                                                                 | 2.8434                | 0.7718                                                                                            | 0.8220                           |
| 1.5240 | 2350 | 2.2675        | 4.3343                                                                 | 2.8382                | 0.7732                                                                                            | 0.8210                           |
| 1.5564 | 2400 | 2.2412        | 4.2702                                                                 | 2.8591                | 0.7741                                                                                            | 0.8270                           |
| 1.5888 | 2450 | 2.0092        | 4.2566                                                                 | 2.8626                | 0.7739                                                                                            | 0.8250                           |
| 1.6213 | 2500 | 2.2628        | 4.2259                                                                 | 2.8382                | 0.7771                                                                                            | 0.8320                           |
| 1.6537 | 2550 | 2.3358        | 4.2075                                                                 | 2.8568                | 0.7775                                                                                            | 0.8250                           |
| 1.6861 | 2600 | 2.139         | 4.2953                                                                 | 2.8455                | 0.7786                                                                                            | 0.8290                           |
| 1.7185 | 2650 | 2.2749        | 4.2156                                                                 | 2.8392                | 0.7807                                                                                            | 0.8330                           |
| 1.7510 | 2700 | 2.1997        | 4.2526                                                                 | 2.8198                | 0.7792                                                                                            | 0.8370                           |
| 1.7834 | 2750 | 2.1923        | 4.3028                                                                 | 2.8413                | 0.7809                                                                                            | 0.8310                           |
| 1.8158 | 2800 | 2.384         | 4.1491                                                                 | 2.8303                | 0.7823                                                                                            | 0.8370                           |
| 1.8482 | 2850 | 2.211         | 4.2045                                                                 | 2.8420                | 0.7840                                                                                            | 0.8300                           |
| 1.8807 | 2900 | 2.1251        | 4.1696                                                                 | 2.8533                | 0.7868                                                                                            | 0.8290                           |
| 1.9131 | 2950 | 2.1539        | 4.1611                                                                 | 2.8321                | 0.7842                                                                                            | 0.8380                           |
| 1.9455 | 3000 | 2.1108        | 4.1235                                                                 | 2.8206                | 0.7870                                                                                            | 0.8420                           |
| 1.9780 | 3050 | 2.2329        | 4.1143                                                                 | 2.8159                | 0.7873                                                                                            | 0.8370                           |
| 2.0104 | 3100 | 2.107         | 4.1063                                                                 | 2.8296                | 0.7856                                                                                            | 0.8510                           |
| 2.0428 | 3150 | 2.0815        | 4.0980                                                                 | 2.8250                | 0.7880                                                                                            | 0.8510                           |
| 2.0752 | 3200 | 2.1147        | 4.1009                                                                 | 2.8179                | 0.7862                                                                                            | 0.8520                           |
| 2.1077 | 3250 | 2.1254        | 4.0894                                                                 | 2.8121                | 0.7877                                                                                            | 0.8540                           |
| 2.1401 | 3300 | 2.2891        | 4.1078                                                                 | 2.8076                | 0.7857                                                                                            | 0.8540                           |
| 2.1725 | 3350 | 1.9332        | 4.1062                                                                 | 2.8099                | 0.7877                                                                                            | 0.8520                           |
| 2.2049 | 3400 | 2.0915        | 4.0826                                                                 | 2.8105                | 0.7884                                                                                            | 0.8500                           |
| 2.2374 | 3450 | 2.1009        | 4.0940                                                                 | 2.8079                | 0.7884                                                                                            | 0.8490                           |
| 2.2698 | 3500 | 1.9798        | 4.0965                                                                 | 2.8021                | 0.7885                                                                                            | 0.8490                           |
| 2.3022 | 3550 | 1.9953        | 4.0991                                                                 | 2.8020                | 0.7871                                                                                            | 0.8500                           |
| 2.3346 | 3600 | 2.0243        | 4.0925                                                                 | 2.8069                | 0.7881                                                                                            | 0.8490                           |
| 2.3671 | 3650 | 1.9352        | 4.0702                                                                 | 2.8065                | 0.7878                                                                                            | 0.8470                           |
| 2.3995 | 3700 | 2.0431        | 4.0910                                                                 | 2.8070                | 0.7877                                                                                            | 0.8510                           |
| 2.4319 | 3750 | 2.1696        | 4.0813                                                                 | 2.7993                | 0.7898                                                                                            | 0.8530                           |
| 2.4643 | 3800 | 1.9443        | 4.0904                                                                 | 2.8072                | 0.7873                                                                                            | 0.8480                           |
| 2.4968 | 3850 | 2.2002        | 4.0618                                                                 | 2.8043                | 0.7886                                                                                            | 0.8490                           |
| 2.5292 | 3900 | 2.1554        | 4.0779                                                                 | 2.8028                | 0.7894                                                                                            | 0.8510                           |
| 2.5616 | 3950 | 2.0185        | 4.0936                                                                 | 2.8081                | 0.7896                                                                                            | 0.8510                           |
| 2.5940 | 4000 | 1.9604        | 4.0973                                                                 | 2.8034                | 0.7895                                                                                            | 0.8530                           |
| 2.6265 | 4050 | 2.1299        | 4.0703                                                                 | 2.7996                | 0.7903                                                                                            | 0.8530                           |
| 2.6589 | 4100 | 1.9768        | 4.0632                                                                 | 2.7984                | 0.7885                                                                                            | 0.8550                           |
| 2.6913 | 4150 | 2.1236        | 4.0532                                                                 | 2.7967                | 0.7894                                                                                            | 0.8490                           |
| 2.7237 | 4200 | 2.1007        | 4.0455                                                                 | 2.7914                | 0.7885                                                                                            | 0.8530                           |
| 2.7562 | 4250 | 2.0482        | 4.0679                                                                 | 2.7918                | 0.7904                                                                                            | 0.8470                           |
| 2.7886 | 4300 | 1.9541        | 4.0671                                                                 | 2.7904                | 0.7906                                                                                            | 0.8490                           |
| 2.8210 | 4350 | 2.0531        | 4.0699                                                                 | 2.7902                | 0.7900                                                                                            | 0.8500                           |
| 2.8534 | 4400 | 1.9997        | 4.0799                                                                 | 2.7870                | 0.7885                                                                                            | 0.8500                           |
| 2.8859 | 4450 | 1.9374        | 4.0731                                                                 | 2.7884                | 0.7886                                                                                            | 0.8480                           |
| 2.9183 | 4500 | 2.0898        | 4.0449                                                                 | 2.7937                | 0.7895                                                                                            | 0.8510                           |
| 2.9507 | 4550 | 2.0351        | 4.0502                                                                 | 2.8023                | 0.7901                                                                                            | 0.8470                           |
| 2.9831 | 4600 | 1.9308        | 4.0406                                                                 | 2.7928                | 0.7896                                                                                            | 0.8510                           |
| 3.0156 | 4650 | 2.3701        | 4.0345                                                                 | 2.7892                | 0.7910                                                                                            | 0.8510                           |
| 3.0480 | 4700 | 1.9955        | 4.0689                                                                 | 2.7872                | 0.7888                                                                                            | 0.8510                           |
| 3.0804 | 4750 | 1.9005        | 4.0190                                                                 | 2.7872                | 0.7925                                                                                            | 0.8510                           |
| 3.1128 | 4800 | 2.1007        | 4.0551                                                                 | 2.7921                | 0.7897                                                                                            | 0.8500                           |
| 3.1453 | 4850 | 1.9132        | 4.0367                                                                 | 2.7896                | 0.7916                                                                                            | 0.8500                           |
| 3.1777 | 4900 | 1.9924        | 4.0449                                                                 | 2.7923                | 0.7905                                                                                            | 0.8490                           |
| 3.2101 | 4950 | 2.146         | 4.0392                                                                 | 2.7914                | 0.7901                                                                                            | 0.8510                           |
| 3.2425 | 5000 | 2.0803        | 4.0458                                                                 | 2.7926                | 0.7913                                                                                            | 0.8470                           |
| 3.2750 | 5050 | 2.0173        | 4.0445                                                                 | 2.7913                | 0.7908                                                                                            | 0.8490                           |
| 3.3074 | 5100 | 2.0224        | 4.0406                                                                 | 2.7910                | 0.7933                                                                                            | 0.8480                           |
| 3.3398 | 5150 | 1.9332        | 4.0337                                                                 | 2.7837                | 0.7917                                                                                            | 0.8530                           |
| 3.3722 | 5200 | 2.0368        | 4.0272                                                                 | 2.7780                | 0.7905                                                                                            | 0.8530                           |
| 3.4047 | 5250 | 2.1804        | 4.0399                                                                 | 2.7819                | 0.7926                                                                                            | 0.8500                           |
| 3.4371 | 5300 | 2.0873        | 4.0408                                                                 | 2.7759                | 0.7926                                                                                            | 0.8580                           |
| 3.4695 | 5350 | 2.1205        | 4.0551                                                                 | 2.7746                | 0.7894                                                                                            | 0.8560                           |
| 3.5019 | 5400 | 1.945         | 4.0467                                                                 | 2.7791                | 0.7917                                                                                            | 0.8540                           |
| 3.5344 | 5450 | 2.1594        | 4.0339                                                                 | 2.7767                | 0.7929                                                                                            | 0.8540                           |
| 3.5668 | 5500 | 2.2175        | 4.0215                                                                 | 2.7804                | 0.7917                                                                                            | 0.8520                           |
| 3.5992 | 5550 | 1.9389        | 4.0251                                                                 | 2.7794                | 0.7916                                                                                            | 0.8480                           |
| 3.6316 | 5600 | 1.8196        | 4.0301                                                                 | 2.7805                | 0.7917                                                                                            | 0.8500                           |
| 3.6641 | 5650 | 1.8026        | 4.0289                                                                 | 2.7784                | 0.7908                                                                                            | 0.8500                           |
| 3.6965 | 5700 | 1.9885        | 4.0219                                                                 | 2.7775                | 0.7917                                                                                            | 0.8530                           |
| 3.7289 | 5750 | 2.137         | 4.0052                                                                 | 2.7749                | 0.7926                                                                                            | 0.8550                           |
| 3.7613 | 5800 | 2.14          | 4.0050                                                                 | 2.7785                | 0.7917                                                                                            | 0.8550                           |
| 3.7938 | 5850 | 2.1486        | 4.0081                                                                 | 2.7785                | 0.7923                                                                                            | 0.8530                           |
| 3.8262 | 5900 | 2.0139        | 4.0139                                                                 | 2.7787                | 0.7916                                                                                            | 0.8540                           |
| 3.8586 | 5950 | 2.1015        | 4.0230                                                                 | 2.7789                | 0.7925                                                                                            | 0.8520                           |
| 3.8911 | 6000 | 1.791         | 4.0231                                                                 | 2.7764                | 0.7925                                                                                            | 0.8540                           |
| 3.9235 | 6050 | 1.9892        | 4.0208                                                                 | 2.7763                | 0.7924                                                                                            | 0.8540                           |
| 3.9559 | 6100 | 2.0315        | 4.0217                                                                 | 2.7762                | 0.7923                                                                                            | 0.8560                           |
| 3.9883 | 6150 | 2.0294        | 4.0220                                                                 | 2.7764                | 0.7920                                                                                            | 0.8550                           |

</details>

### Framework Versions
- Python: 3.11.6
- Sentence Transformers: 5.0.0
- Transformers: 4.55.0.dev0
- PyTorch: 2.5.1+cu121
- Accelerate: 1.9.0
- Datasets: 2.19.1
- Tokenizers: 0.21.4

## Citation

### BibTeX

#### Sentence Transformers
```bibtex
@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
```

#### MultipleNegativesRankingLoss
```bibtex
@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
```

<!--
## Glossary

*Clearly define terms in order to be accessible across audiences.*
-->

<!--
## Model Card Authors

*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->

<!--
## Model Card Contact

*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->