metadata
language:
- code
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- dense
- generated_from_trainer
- dataset_size:2825
- loss:MultipleNegativesRankingLoss
base_model: NeuML/pubmedbert-base-embeddings
widget:
- source_sentence: >-
MALAT1 IGFBP7 S100A6 KRT19 EEF1A1 TMSB4X ACTB TMSB10 WFDC2 GSTP1 ANXA2
TPM1 APP ATP1A1 BCAM ITGB1 CD24 KRT18 AHNAK KRT8 GAPDH SPINT2 COL4A2 PTMA
EPCAM IGFBP2 SYTL2 SERF2 CD151 S100A13 CD9 PKM S100A11 PPIA NME2 PFN1 TPT1
ATP1B1 COL4A1 HSPA5 SYNE2 MYH9 COX6A1 DBN1 NPC2 PPDPF ITM2B BEX3 PDIA3
MALT1 RNF213 RACK1 VIM RCN1 ENO1 MYL6 MDK CTSD ROMO1 KTN1 H3-3B ITM2C
ARHGAP29 MGST3
sentences:
- >-
This measurement was conducted with 10x 5' transcription profiling.
Epithelial cells derived from the proximal tubule of a male human
kidney, taken from an individual in their sixth decade. The cells are
identified as tumor-normal, suggesting they were taken from a
non-cancerous region.
- >-
This measurement was conducted with 10x 3' v3. Central nervous system
macrophage (microglia) derived from the cerebellum of a 29-year-old
male.
- >-
MALAT1 RORA PCDH9 AUTS2 PDE4D RBMS3 TCF4 FTX ZEB2 MAST4 MEF2C SLC9A9
SIPA1L1 TOX RASAL2 SPARCL1 FBXL17 AHI1 FAM110B JMJD1C QKI TCF12 ATP1B1
IMMP2L NF1 MYCBP2 ERC1 PHACTR1 HIVEP3 AFF3 SSBP2 DST TNIK CLASP2 OXR1
NEAT1 SAMD4A R3HDM1 ARID1B HERC1 BIRC6 SYNE1 TBC1D5 MAP1B APP ADK TTC3
MBD5 RBM6 OSBPL3 TNRC6B KLHL5 EHBP1 MACF1 RTN3 TUT4 MAP7 TTLL7 LPP PAM
HDAC8 ANK3 SPIDR PRKCE
- source_sentence: >-
MALAT1 NEAT1 RORA LHFPL6 ZBTB20 RBMS3 ZEB1 TACC1 RGS5 PARD3 EPS8 DCN
ARHGAP29 IGFBP7 NRDC COBLL1 ERC1 PTGDS FOXP1 ADK JAK1 TCF4 ARHGEF12 ZFHX3
NSMCE2 CALM1 FTX MYCBP2 CRY1 MAST4 PDE8A PDZRN3 EML2 HP1BP3 JUND AHI1
TSPAN3 BRD4 ARHGAP26 LUC7L2 UTRN PITPNC1 SNRK SRRM2 FTH1 HECTD4 PDE4B
IMMP2L HES4 N4BP2L2 UBR2 VIM MAP4 RSF1 EIF4B MYO9A ROCK1 ARHGAP10 PICALM
ACTB PAG1 GNB1 MLLT10 HSP90AA1
sentences:
- >-
This measurement was conducted with 10x 3' v3. Pericyte cells from the
cerebral cortex, specifically the Lingual gyrus and Primary Visual
Cortex (V1C), of a 50-year-old male human. These cells belong to the
vascular supercluster.
- >-
MALAT1 PCDH9 QKI PDE4B MBP FMNL2 SLC44A1 RNF220 SIK3 DLG1 ELMO1 MAP7
DOCK10 MAN2A1 DST CLASP2 PHLPP1 SEPTIN7 TCF12 WWOX DPYD TTLL7 PIP4K2A
ZBTB20 FHIT MAP4K4 PDE8A SHTN1 FRMD4B ATP8A1 AUTS2 ZEB2 ERBIN RTN4 ANK3
AGAP1 PTK2 FRYL ZSWIM6 TMEM165 MBNL2 FTX MAP4K5 FUT8 PDE4D SGK1 SSH2 APP
ALCAM ANKIB1 KIF1B PICALM PTGDS WSB1 PTBP2 BAZ2B LPGAT1 MACF1 CREB5
AOPEP TSPAN5 SLCO3A1 ZFYVE16 ARID1B
- >-
This measurement was conducted with 10x 3' v3. Neuron cell type from a
29-year-old male cerebral cortex, specifically from the Cingulate gyrus,
retrosplenial (CgGrs) - A29-A30 region, with European self-reported
ethnicity, analyzed at the nucleus level.
- source_sentence: >-
MALAT1 EEF1A1 HNRNPA1 TAGLN GAPDH VIM RACK1 TMSB10 MYLK TPT1 PTMA EEF2
TMSB4X NPM1 CIRBP CALD1 FTH1 TPM1 ACTB NACA HSP90AB1 H3-3A UBA52 MYL6
EIF3E FAU DSTN COX7C TPM2 JUN JUNB MYL9 BTF3 EIF1 LDHB ACTA2 H3-3B EIF4A2
EIF3F TSC22D1 STMN1 SOX4 ATP5MC2 ZFP36L1 BEX3 FOS MDK CFL1 AP1S2 PPIA
EIF3L ATP5F1E COMMD6 HSPB1 CCNI COX4I1 GADD45B SERBP1 UQCRB GNG5 NAP1L1
PABPC1 SLC25A3 HSPA8
sentences:
- >-
This measurement was conducted with 10x 3' v2. Sample contains enteric
smooth muscle cells derived from a colon tissue at Carnegie stage 23
(F8.4).
- >-
MALAT1 EEF1A1 FTL TMSB10 RACK1 PTMA TPT1 AGR2 TMSB4X GAPDH FTH1 HNRNPA1
S100A10 NPM1 NACA ACTB FAU HSP90AB1 H3-3B LGALS3 BTF3 H3-3A EEF2
HSP90AA1 GSTP1 SERF2 UBA52 HINT1 EIF1 TXN ZFAS1 COX7C YBX1 MYL6 APOE
PABPC1 HMGB1 HMGA1 HSPD1 PPIA SELENOP EPCAM UBB PPDPF NDUFA4 CHCHD2 LDHB
ATP5MC2 CFL1 NAP1L1 EIF3E ID2 ATP5F1E ENO1 PEBP1 PFN1 TPI1 RAN MARCKSL1
ZFP36L2 ATP5MC3 HSP90B1 HMGN1 TMA7
- >-
This measurement was conducted with 10x 3' v2. Erythroblasts derived
from the duodeno-jejunal junction of a human fetus at Carnegie stage 23
(F9.9).
- source_sentence: >-
MALAT1 GAPDH HSP90AA1 EEF1A1 PKM TPI1 CCNI H3-3B ALDOA VAMP2 FTL PGK1 FTH1
PGAM1 PTMA ENO1 HSP90AB1 HSPB1 UBC APLP2 CPE RTN4 BTG1 ITM2B EIF1 BNIP3
GPX3 VDAC1 NEAT1 YBX1 PEBP1 CIRBP BNIP3L OAZ1 UNC119 HSPA8 STMN1 COX7C
TPT1 CALM2 FAU UQCRB EIF4A2 EEF2 PPP1CC TUBB4B NDUFA4 SEC62 CLK1 GABARAPL2
HSPA5 ACTB TPD52 GSTP1 DNAJA1 DYNLL1 HNRNPC MYL6 ST13 SRSF5 ANKRD12
TSC22D1 EIF3E DNAJB6
sentences:
- >-
MALAT1 PCDH9 QKI MBP FMNL2 SLC44A1 SEPTIN7 PIP4K2A DST PDE4B RNF220
MAN2A1 FTX DLG1 ZSWIM6 TCF12 ZEB2 FRYL PHLPP1 UBE2E2 ATP8A1 ELMO1 DPYD
NEAT1 MAP7 DOCK10 TTLL7 AUTS2 ERBIN FRMD4B RTN4 TMEM165 ZBTB20 CADM1
FHIT LIMCH1 MAP4K4 PDE8A SYNJ2 HBS1L SGK1 RERE GSN CLASP2 MAPRE2 ALCAM
SHTN1 ANKRD28 SLC25A13 FUT8 ZFYVE16 ARAP2 HIPK2 POLR2F DDX17 SPTLC2
PDE4D RAB30 CREB5 AGAP1 SIK3 PXK ACYP2 GNG7
- >-
This measurement was conducted with 10x 3' v2. Cell type: OFF-bipolar
cell, derived from the peripheral region of retina of a 60-year old
European male.
- >-
This measurement was conducted with 10x 3' v3. Retinal bipolar neuron
from a 77-year-old female, with nucleus suspension type.
- source_sentence: >-
MALAT1 CD74 EEF1A1 TMSB4X TSC22D3 HSPE1 CD69 FTH1 DNAJB1 JUNB HSP90AA1
BTG1 SARAF ACTB FTL DUSP1 CD37 FAU HLA-DRB5 RACK1 YPEL5 RAC2 TPT1 EIF1
TMSB10 PTMA NACA H3-3B HSPD1 MS4A1 CD52 CYBA TCL1A HNRNPA1 DUSP2 NCF1
CD79B SYPL1 GDI2 YBX1 RBM39 ITM2B PNRC1 BTG2 EEF2 JUN ZFP36L1 TOMM7
SLC25A5 RNASET2 CD44 NOP58 PABPC1 ICAM3 MEF2C FXYD5 HSP90AB1 CD79A PFN1
TMEM109 EIF1B APH1A SERP1 CXCR4
sentences:
- >-
MALAT1 SSR4 EEF1A1 HERPUD1 FTL TPT1 ISG20 XBP1 CD79A SEC11C NEAT1 NPC2
EEF2 SPCS1 PRDX4 FAU EIF1 OST4 SEC61B COX7C FKBP2 CD74 ERLEC1 GSTP1
RABAC1 DDX5 SAT1 FKBP11 SRP9 ATP5PF COX6C TMED10 METTL7A PTMA NDUFA4
TMSB10 HSPA5 SEL1L MGLL RGS1 ATP5F1D OAZ1 SUB1 CCDC88A SPCS2 HMGN3 SERP1
HNRNPA2B1 ERGIC3 UQCR11 SELENOS TMEM258 SERF2 SEC13 EIF5B C4orf3 ATP5MG
GABARAP PGAM1 ATP5MK COX5A SOCS3 NACA TOMM7
- >-
This measurement was conducted with 10x 5' v1. Naive B cell sample from
the tonsil tissue of a 9-year old female with recurrent tonsillitis.
- >-
This measurement was conducted with 10x 5' v1. Naive B cell sample taken
from a 5-year-old female individual with obstructive sleep apnea and
recurrent tonsillitis, originating from tonsil tissue.
datasets:
- jo-mengr/cellxgene_pseudo_bulk_3_5k_multiplets_natural_language_annotation
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy
model-index:
- name: SentenceTransformer based on NeuML/pubmedbert-base-embeddings
results:
- task:
type: triplet
name: Triplet
dataset:
name: >-
cellxgene pseudo bulk 3 5k multiplets natural language annotation
cell sentence 2
type: >-
cellxgene_pseudo_bulk_3_5k_multiplets_natural_language_annotation_cell_sentence_2
metrics:
- type: cosine_accuracy
value: 0.46979865431785583
name: Cosine Accuracy
SentenceTransformer based on NeuML/pubmedbert-base-embeddings
This is a sentence-transformers model finetuned from NeuML/pubmedbert-base-embeddings on the cellxgene_pseudo_bulk_3_5k_multiplets_natural_language_annotation dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: NeuML/pubmedbert-base-embeddings
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
- Language: code
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): MMContextEncoder(
(text_encoder): BertModel(
(embeddings): BertEmbeddings(
(word_embeddings): Embedding(30522, 768, padding_idx=0)
(position_embeddings): Embedding(512, 768)
(token_type_embeddings): Embedding(2, 768)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(encoder): BertEncoder(
(layer): ModuleList(
(0-11): 12 x BertLayer(
(attention): BertAttention(
(self): BertSdpaSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(pooler): BertPooler(
(dense): Linear(in_features=768, out_features=768, bias=True)
(activation): Tanh()
)
)
(pooling): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("jo-mengr/mmcontext-pubmedbert-v3")
# Run inference
sentences = [
'MALAT1 CD74 EEF1A1 TMSB4X TSC22D3 HSPE1 CD69 FTH1 DNAJB1 JUNB HSP90AA1 BTG1 SARAF ACTB FTL DUSP1 CD37 FAU HLA-DRB5 RACK1 YPEL5 RAC2 TPT1 EIF1 TMSB10 PTMA NACA H3-3B HSPD1 MS4A1 CD52 CYBA TCL1A HNRNPA1 DUSP2 NCF1 CD79B SYPL1 GDI2 YBX1 RBM39 ITM2B PNRC1 BTG2 EEF2 JUN ZFP36L1 TOMM7 SLC25A5 RNASET2 CD44 NOP58 PABPC1 ICAM3 MEF2C FXYD5 HSP90AB1 CD79A PFN1 TMEM109 EIF1B APH1A SERP1 CXCR4',
"This measurement was conducted with 10x 5' v1. Naive B cell sample from the tonsil tissue of a 9-year old female with recurrent tonsillitis.",
"This measurement was conducted with 10x 5' v1. Naive B cell sample taken from a 5-year-old female individual with obstructive sleep apnea and recurrent tonsillitis, originating from tonsil tissue.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 1.0000, 1.0000],
# [1.0000, 1.0000, 1.0000],
# [1.0000, 1.0000, 1.0000]])
Evaluation
Metrics
Triplet
- Dataset:
cellxgene_pseudo_bulk_3_5k_multiplets_natural_language_annotation_cell_sentence_2
- Evaluated with
TripletEvaluator
Metric | Value |
---|---|
cosine_accuracy | 0.4698 |
Training Details
Training Dataset
cellxgene_pseudo_bulk_3_5k_multiplets_natural_language_annotation
- Dataset: cellxgene_pseudo_bulk_3_5k_multiplets_natural_language_annotation at 8d2de45
- Size: 2,825 training samples
- Columns:
anchor
,positive
,negative_1
, andnegative_2
- Approximate statistics based on the first 1000 samples:
anchor positive negative_1 negative_2 type string string string string details - min: 355 characters
- mean: 384.6 characters
- max: 432 characters
- min: 90 characters
- mean: 211.95 characters
- max: 939 characters
- min: 87 characters
- mean: 212.94 characters
- max: 775 characters
- min: 354 characters
- mean: 384.03 characters
- max: 433 characters
- Samples:
anchor positive negative_1 negative_2 MALAT1 TMSB4X EEF1A1 CD74 TPT1 PTMA TMSB10 EIF1 H3-3B FAU FTH1 RACK1 PABPC1 ACTB BTG1 UBC UBA52 FTL EIF3E COX4I1 NBEAL1 NPM1 NACA MYL6 HSP90AB1 DDX5 SKP1 SF3B1 SEPTIN7 ATP5F1E SBDS COX7C SERF2 PNRC1 ATP5MG UBB NAP1L1 TOMM7 CALM1 TMA7 SEC62 POU2F2 ADAM28 TPR CYBA YBX1 KLF6 ATRX ERP29 HNRNPC ATP5F1D POLR2F NFKBIA SMCHD1 EIF3J LSM5 PFN1 LUC7L3 EIF4G2 ARPC3 GAPDH NCL RALGPS2 CAPZA1
This measurement was conducted with 10x 3' v2. B cell sample taken from a 30-year-old female with blood tissue, exhibiting elevated expression of type 1 interferon-stimulated genes (ISGs) in monocytes, reduction of naïve CD4+ T cells correlating with monocyte ISG expression, and expansion of repertoire-restricted cytotoxic GZMH+ CD8+ T cells.
This measurement was conducted with 10x 3' v2. CD8-positive, alpha-beta T cell derived from the blood tissue of a 29-year-old female Asian individual with managed systemic lupus erythematosus (SLE).
MALAT1 EEF1A1 TMSB4X TPT1 PTMA TMSB10 FAU UBA52 EIF1 ACTB FTL BTG1 ZFP36L2 FTH1 H3-3B NACA CALM1 RACK1 HSP90AB1 ID2 COX7C S100A6 PABPC1 HSP90AA1 MYL6 CIRBP PFN1 DDX5 SRSF7 CXCR4 ATP5F1E SNRPD2 COX4I1 ITM2B SERF2 SH3BGRL3 BTF3 PNRC1 UBC UQCRB SON ATP5MG EEF2 CFL1 SSR4 NPM1 TOMM7 TMA7 SEC62 CD74 VIM CYBA SYNE2 YBX1 TRAM1 RORA CDC42 PPP2R5C GADD45B EIF3L SRSF5 NFKBIA STK4 RBM3
GAPDH EEF1A1 TPT1 MALAT1 ACTB PTMA NACA TMSB10 HSPA8 RACK1 HSP90AA1 PKM HSP90AB1 UBA52 ENO1 TUBA1B NHP2 ALDOA SF3B2 NOP56 TPI1 SFPQ PRDX1 RBM39 WARS1 SERF2 FAU ARPC2 UQCRQ EEF2 HNRNPF SSR4 NPM1 SLC25A5 POU2F2 HSPA5 YBX1 RAB27A PABPC1 SRI XRCC5 FTL DDX18 SEL1L3 SNU13 ACIN1 PSMA7 C19orf53 FKBP8 LSM5 MLEC COX7A2 TCERG1 CBLB NCL HNRNPA2B1 SEM1 JUND SRRM1 EPRS1 SH3BGRL3 SERBP1 PFDN2 CIAO1
This measurement was conducted with 10x 5' v1. Gamma-delta T cell derived from blood of a 26-year old male, activated with CD3.
This measurement was conducted with 10x 5' v1. Activated memory B cell from a 26-year-old male individual, activated using CD3.
GAPDH ACTB PTMA TMSB4X HSP90AA1 NPM1 RAN EEF1A1 HSP90AB1 TPT1 PPIA FTL YBX1 HSPA8 PFN1 MALAT1 NCL HSPE1 HMGB1 CHCHD2 NACA TPI1 PRDX1 HSPD1 H2AZ1 CFL1 BTF3 FABP5 RACK1 FAU EIF1 PGAM1 ALDOA RHOA HNRNPA2B1 H3-3B ATP5MC3 TMSB10 PTGES3 SH3BGRL3 CYCS UBA52 SLC25A5 NAP1L1 PARK7 FTH1 P4HB CALM1 MYL6 ZNF706 NDUFB9 YWHAB PSMA7 LDHB SNRPD2 COX7C TXN TPM3 SRSF2 HINT1 PRR13 PABPC1 ENO1 GSTP1
MALAT1 DOCK4 PLXDC2 ARHGAP24 FRMD4A LRMDA QKI NEAT1 MEF2A ELMO1 SFMBT2 DOCK8 HSP90AA1 SYNDIG1 SORL1 SLC8A1 NAV3 RASAL2 APBB1IP MEF2C ITPR2 CYRIB SRGAP2 CELF2 ST6GAL1 EPB41L2 GRID2 MBNL1 ST6GALNAC3 ANKRD44 MGAT4A ABCC4 ARHGAP22 SSH2 OXR1 LDLRAD4 SH3RF3 UBE2E2 MAML2 MAML3 CD74 TBC1D22A FYB1 ATP8B4 HSPH1 SAT1 PCNX2 FCHSD2 ETV6 PTPRJ GNAQ DISC1 CHST11 SLC9A9 KCNQ3 LINC02798 MYCBP2 HDAC9 DIP2B PICALM EIF4G3 SLC1A3 FTL SUSD6
This measurement was conducted with 10x 3' v3. A central nervous system macrophage (microglia) cell type, derived from the hypothalamus tissue, specifically the preoptic region of HTH (HTHpo), of a 50-year-old male individual with European ethnicity.
This measurement was conducted with 10x 3' v3. Neuron cell type from a 50-year-old male human, specifically from the preoptic region of the hypothalamus (HTHpo). The cell was analyzed using single-nucleus RNA sequencing.
MALAT1 NALF1 ROBO1 CDH12 LSAMP NRXN1 RALYL CACNA2D1 MGAT4C DPYD LRRC4C EGFEM1P DPP10 EPHA6 TENM2 ANK3 MACROD2 RYR2 ANK2 GRID2 KAZN SNTG1 MIR99AHG DMD MEG3 HDAC9 DPP6 MARCHF1 GRIK1 CADM1 SYT1 KCNH7 FRMD4A RABGAP1L GRM1 MAP2 IL1RAPL1 RORA FRAS1 GABRB1 CDH8 TAFA1 FGF14 NCAM2 PARD3 SLC8A1 AHI1 SMYD3 EDIL3 OPCML KCND2 TNRC6A SGCZ DST PDE1A SLC35F1 LINC03051 TRPM3 GRIA4 MEG8 MEIS2 HMGCLL1 PRANCR DCC
- Loss:
MultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Evaluation Dataset
cellxgene_pseudo_bulk_3_5k_multiplets_natural_language_annotation
- Dataset: cellxgene_pseudo_bulk_3_5k_multiplets_natural_language_annotation at 8d2de45
- Size: 298 evaluation samples
- Columns:
anchor
,positive
,negative_1
, andnegative_2
- Approximate statistics based on the first 298 samples:
anchor positive negative_1 negative_2 type string string string string details - min: 361 characters
- mean: 383.43 characters
- max: 419 characters
- min: 105 characters
- mean: 214.29 characters
- max: 809 characters
- min: 110 characters
- mean: 219.1 characters
- max: 809 characters
- min: 361 characters
- mean: 383.64 characters
- max: 419 characters
- Samples:
anchor positive negative_1 negative_2 MALAT1 RORA PTGDS RBMS3 RNF220 GSN DCN ZEB1 SPARCL1 LHFPL6 PTN ZBTB20 NEAT1 TIMP3 AFF3 TCF4 CHD9 FTX DDX17 CST3 CD81 PPFIBP1 SPTBN1 ARHGAP29 AOPEP PARD3 MAML2 IMMP2L WWOX N4BP2L2 FYN VEZT PIAS1 MYO9A TGFBR3 FNDC3B ACTB FBLN1 ITGA8 UBE2K DOCK9 WAC TNRC6B QKI FOXP1 RTN4 IGFBP2 IGFBP5 OSBPL9 STAG1 FOXO3 PLEKHG1 PDZRN3 MKLN1 AFDN TBC1D5 ITM2B FCHSD2 UACA TCF12 NCOA2 TPM1 ZFHX3 COL6A1
This measurement was conducted with 10x 3' v3. Fibroblast cells from the cerebral cortex, specifically the postcentral gyrus, primary somatosensory cortex, S1C, of a 42-year-old male with European ethnicity.
This measurement was conducted with 10x 3' v3. Neuron cell type from a 29-year-old male human cerebral nuclei, specifically from the Extended amygdala (EXA) - Bed nucleus of stria terminalis and nearby - BNST region.
MALAT1 PCDH9 PDE4D CELF2 PHACTR1 R3HDM1 FTX PPP3CA SIPA1L1 DMD ANK3 PRKCB JMJD1C GPC6 AHI1 AUTS2 ATP2B1 MEF2C ARHGAP26 PTK2 KIAA1217 OXR1 SMYD3 CHN1 SYNE1 UBE2E2 CAMK1D EIF4G3 TNRC6A CLASP2 TTC3 N4BP2L2 LIMCH1 RORA FBXW7 RTN4 AFF3 FBXL17 RABGAP1L PBX1 MRTFB MBD5 ARID1B RASAL2 DLG1 EPB41L2 ERC1 TNRC6B ZFAND3 FMNL2 DPYD CCPG1 MYCBP2 MAST4 HERC1 RTN3 MAP7 APP RERE DIAPH2 ITPR1 DST CAMK4 ASAP1
MALAT1 CD74 EEF1A1 ACTB TMSB4X BTG1 TPT1 CD37 TMSB10 PFN1 PTMA TXNIP MS4A1 TSC22D3 FAU CORO1A RACK1 FTL EIF1 GAPDH SERF2 PPIA TCL1A CD79A DUSP1 NCF1 ARPC2 NACA SP100 ZFP36 BTF3 ZFP36L2 LAPTM5 SELENOH PABPC1 MEF2C ERP29 MYL12A CD69 ATP5F1E COX7C UQCRB CFL1 SELL XIST FXYD5 DDX5 ARHGDIB CCNI CXCR4 PPDPF H3-3B LIMD2 IFITM3 SRRM2 CD52 NPM1 IFITM2 PTPRCAP UBA52 EBLN3P CYBA CNN2 RASGRP2
This measurement was conducted with 10x 5' v1. Naive B cell sample taken from a 6-year old female tonsil.
This measurement was conducted with 10x 5' v1. Plasmablast cells derived from a 3-year-old male tonsil tissue sample, with IGH + IGL, IGHV5-5103, IGHD6-601, IGHJ4*02, IGLV2-14, IGLC3, IGLJ3, and IgG3 isotype.
CD74 ACTB PTMA EEF1A1 TMSB4X PFN1 HMGB1 MALAT1 TPT1 H4C3 PPIA CFL1 GAPDH TUBA1B FAU HMGN2 SERF2 H3-3A ARPC2 MS4A1 HMGN1 H2AZ1 ANP32B STMN1 HSP90AA1 SLC25A5 HNRNPA1 NPM1 TMSB10 HSP90AB1 RACK1 OAZ1 ARHGDIB LAPTM5 CORO1A NACA CALM2 BTF3 HSPA8 TPI1 H3-3B PCBP2 PKM MYL6 ARPC3 HNRNPA2B1 ATP5F1E ARPC5 COX5A RHOA CD79A PRDX1 COX4I1 RAN LCP1 SRP14 SNRPG HMGB2 ATP5MG EEF2 EIF1 SUMO2 HLA-DRB5 PABPC1
MALAT1 PCDH9 PDE4D PHACTR1 KIAA1217 CELF2 FTX PDE4B ANK3 ATP2B1 DMD PPP3CA PRKCB TUBA1B MAP1B CHN1 AFF3 ARHGAP26 RABGAP1L OXR1 DPYD RASAL2 SMYD3 BICD1 CLASP2 FTH1 PTK2 HIVEP2 HERC1 TLE4 SPARCL1 FMNL2 EIF4G3 FBXW7 PLXDC2 APP DST TNIK AGAP1 AUTS2 PRKCE JMJD1C R3HDM1 MEF2C TNRC6A AEBP2 NFIB BASP1 TTC3 CAMK2D NRGN TSHZ2 CAMK1D LHFPL6 MBD5 N4BP2L2 ARID1B RTN4 MACF1 SYNE1 NDFIP1 RTN3 SARAF ARGLU1
This measurement was conducted with 10x 3' v3. Neuron cell type from the primary motor cortex (M1C) of a 29-year-old male with European ethnicity, belonging to the deep-layer corticothalamic and 6b subcluster.
This measurement was conducted with 10x 3' v3. Neuron cell type from a 29-year old male human, specifically from the pons tissue and more specifically, the Pontine reticular formation - PnRF dissection. The cell falls under the Midbrain-derived inhibitory supercluster term.
MALAT1 PCDH9 OXR1 KIAA1217 TCF4 PAM AHI1 GPC6 DMD RBMS3 ANK3 FTX PDE4B AUTS2 MACF1 LIMCH1 CLASP2 CELF2 BACH2 ITFG1 FARP1 RORA SMYD3 TNRC6A OSBPL8 PDE4D AAK1 SBF2 FBXL17 AGAP1 PRKCE NF1 EIF4G3 ERC1 PHF20L1 SIK3 TTC3 WWOX MBD5 MYCBP2 ARID1B MEF2C RIPOR2 LRRFIP1 RTN3 DST RABGAP1L PITPNC1 SATB1 WLS SNRPN APP PRKACB UBR3 IQSEC1 SSBP2 AIG1 JAZF1 TNIK ZEB2 CERS6 ANKRD36C CADM1 PBX1
- Loss:
MultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 128per_device_eval_batch_size
: 128learning_rate
: 0.05warmup_ratio
: 0.1gradient_checkpointing
: True
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 128per_device_eval_batch_size
: 128per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 0.05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 3max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torch_fusedoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsehub_revision
: Nonegradient_checkpointing
: Truegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseliger_kernel_config
: Noneeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: proportionalrouter_mapping
: {}learning_rate_mapping
: {}
Training Logs
Epoch | Step | Training Loss | cellxgene pseudo bulk 3 5k multiplets natural language annotation loss | cellxgene_pseudo_bulk_3_5k_multiplets_natural_language_annotation_cell_sentence_2_cosine_accuracy |
---|---|---|---|---|
2.1739 | 50 | 6.383 | 5.7937 | 0.4698 |
Framework Versions
- Python: 3.12.9
- Sentence Transformers: 5.0.0
- Transformers: 4.55.0.dev0
- PyTorch: 2.8.0
- Accelerate: 1.9.0
- Datasets: 2.19.1
- Tokenizers: 0.21.4
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}