SentenceTransformer based on NeuML/pubmedbert-base-embeddings
This is a sentence-transformers model finetuned from NeuML/pubmedbert-base-embeddings on the cellxgene_pseudo_bulk_3_5k_multiplets_natural_language_annotation dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: NeuML/pubmedbert-base-embeddings
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
- Language: code
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): MMContextEncoder(
(text_encoder): BertModel(
(embeddings): BertEmbeddings(
(word_embeddings): Embedding(30522, 768, padding_idx=0)
(position_embeddings): Embedding(512, 768)
(token_type_embeddings): Embedding(2, 768)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(encoder): BertEncoder(
(layer): ModuleList(
(0-11): 12 x BertLayer(
(attention): BertAttention(
(self): BertSdpaSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(pooler): BertPooler(
(dense): Linear(in_features=768, out_features=768, bias=True)
(activation): Tanh()
)
)
(pooling): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the ๐ค Hub
model = SentenceTransformer("jo-mengr/mmcontext-pubmedbert-v3")
# Run inference
sentences = [
'MALAT1 CD74 EEF1A1 TMSB4X TSC22D3 HSPE1 CD69 FTH1 DNAJB1 JUNB HSP90AA1 BTG1 SARAF ACTB FTL DUSP1 CD37 FAU HLA-DRB5 RACK1 YPEL5 RAC2 TPT1 EIF1 TMSB10 PTMA NACA H3-3B HSPD1 MS4A1 CD52 CYBA TCL1A HNRNPA1 DUSP2 NCF1 CD79B SYPL1 GDI2 YBX1 RBM39 ITM2B PNRC1 BTG2 EEF2 JUN ZFP36L1 TOMM7 SLC25A5 RNASET2 CD44 NOP58 PABPC1 ICAM3 MEF2C FXYD5 HSP90AB1 CD79A PFN1 TMEM109 EIF1B APH1A SERP1 CXCR4',
"This measurement was conducted with 10x 5' v1. Naive B cell sample from the tonsil tissue of a 9-year old female with recurrent tonsillitis.",
"This measurement was conducted with 10x 5' v1. Naive B cell sample taken from a 5-year-old female individual with obstructive sleep apnea and recurrent tonsillitis, originating from tonsil tissue.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 1.0000, 1.0000],
# [1.0000, 1.0000, 1.0000],
# [1.0000, 1.0000, 1.0000]])
Evaluation
Metrics
Triplet
- Dataset:
cellxgene_pseudo_bulk_3_5k_multiplets_natural_language_annotation_cell_sentence_2
- Evaluated with
TripletEvaluator
Metric | Value |
---|---|
cosine_accuracy | 0.4698 |
Training Details
Training Dataset
cellxgene_pseudo_bulk_3_5k_multiplets_natural_language_annotation
- Dataset: cellxgene_pseudo_bulk_3_5k_multiplets_natural_language_annotation at 8d2de45
- Size: 2,825 training samples
- Columns:
anchor
,positive
,negative_1
, andnegative_2
- Approximate statistics based on the first 1000 samples:
anchor positive negative_1 negative_2 type string string string string details - min: 355 characters
- mean: 384.6 characters
- max: 432 characters
- min: 90 characters
- mean: 211.95 characters
- max: 939 characters
- min: 87 characters
- mean: 212.94 characters
- max: 775 characters
- min: 354 characters
- mean: 384.03 characters
- max: 433 characters
- Samples:
anchor positive negative_1 negative_2 MALAT1 TMSB4X EEF1A1 CD74 TPT1 PTMA TMSB10 EIF1 H3-3B FAU FTH1 RACK1 PABPC1 ACTB BTG1 UBC UBA52 FTL EIF3E COX4I1 NBEAL1 NPM1 NACA MYL6 HSP90AB1 DDX5 SKP1 SF3B1 SEPTIN7 ATP5F1E SBDS COX7C SERF2 PNRC1 ATP5MG UBB NAP1L1 TOMM7 CALM1 TMA7 SEC62 POU2F2 ADAM28 TPR CYBA YBX1 KLF6 ATRX ERP29 HNRNPC ATP5F1D POLR2F NFKBIA SMCHD1 EIF3J LSM5 PFN1 LUC7L3 EIF4G2 ARPC3 GAPDH NCL RALGPS2 CAPZA1
This measurement was conducted with 10x 3' v2. B cell sample taken from a 30-year-old female with blood tissue, exhibiting elevated expression of type 1 interferon-stimulated genes (ISGs) in monocytes, reduction of naรฏve CD4+ T cells correlating with monocyte ISG expression, and expansion of repertoire-restricted cytotoxic GZMH+ CD8+ T cells.
This measurement was conducted with 10x 3' v2. CD8-positive, alpha-beta T cell derived from the blood tissue of a 29-year-old female Asian individual with managed systemic lupus erythematosus (SLE).
MALAT1 EEF1A1 TMSB4X TPT1 PTMA TMSB10 FAU UBA52 EIF1 ACTB FTL BTG1 ZFP36L2 FTH1 H3-3B NACA CALM1 RACK1 HSP90AB1 ID2 COX7C S100A6 PABPC1 HSP90AA1 MYL6 CIRBP PFN1 DDX5 SRSF7 CXCR4 ATP5F1E SNRPD2 COX4I1 ITM2B SERF2 SH3BGRL3 BTF3 PNRC1 UBC UQCRB SON ATP5MG EEF2 CFL1 SSR4 NPM1 TOMM7 TMA7 SEC62 CD74 VIM CYBA SYNE2 YBX1 TRAM1 RORA CDC42 PPP2R5C GADD45B EIF3L SRSF5 NFKBIA STK4 RBM3
GAPDH EEF1A1 TPT1 MALAT1 ACTB PTMA NACA TMSB10 HSPA8 RACK1 HSP90AA1 PKM HSP90AB1 UBA52 ENO1 TUBA1B NHP2 ALDOA SF3B2 NOP56 TPI1 SFPQ PRDX1 RBM39 WARS1 SERF2 FAU ARPC2 UQCRQ EEF2 HNRNPF SSR4 NPM1 SLC25A5 POU2F2 HSPA5 YBX1 RAB27A PABPC1 SRI XRCC5 FTL DDX18 SEL1L3 SNU13 ACIN1 PSMA7 C19orf53 FKBP8 LSM5 MLEC COX7A2 TCERG1 CBLB NCL HNRNPA2B1 SEM1 JUND SRRM1 EPRS1 SH3BGRL3 SERBP1 PFDN2 CIAO1
This measurement was conducted with 10x 5' v1. Gamma-delta T cell derived from blood of a 26-year old male, activated with CD3.
This measurement was conducted with 10x 5' v1. Activated memory B cell from a 26-year-old male individual, activated using CD3.
GAPDH ACTB PTMA TMSB4X HSP90AA1 NPM1 RAN EEF1A1 HSP90AB1 TPT1 PPIA FTL YBX1 HSPA8 PFN1 MALAT1 NCL HSPE1 HMGB1 CHCHD2 NACA TPI1 PRDX1 HSPD1 H2AZ1 CFL1 BTF3 FABP5 RACK1 FAU EIF1 PGAM1 ALDOA RHOA HNRNPA2B1 H3-3B ATP5MC3 TMSB10 PTGES3 SH3BGRL3 CYCS UBA52 SLC25A5 NAP1L1 PARK7 FTH1 P4HB CALM1 MYL6 ZNF706 NDUFB9 YWHAB PSMA7 LDHB SNRPD2 COX7C TXN TPM3 SRSF2 HINT1 PRR13 PABPC1 ENO1 GSTP1
MALAT1 DOCK4 PLXDC2 ARHGAP24 FRMD4A LRMDA QKI NEAT1 MEF2A ELMO1 SFMBT2 DOCK8 HSP90AA1 SYNDIG1 SORL1 SLC8A1 NAV3 RASAL2 APBB1IP MEF2C ITPR2 CYRIB SRGAP2 CELF2 ST6GAL1 EPB41L2 GRID2 MBNL1 ST6GALNAC3 ANKRD44 MGAT4A ABCC4 ARHGAP22 SSH2 OXR1 LDLRAD4 SH3RF3 UBE2E2 MAML2 MAML3 CD74 TBC1D22A FYB1 ATP8B4 HSPH1 SAT1 PCNX2 FCHSD2 ETV6 PTPRJ GNAQ DISC1 CHST11 SLC9A9 KCNQ3 LINC02798 MYCBP2 HDAC9 DIP2B PICALM EIF4G3 SLC1A3 FTL SUSD6
This measurement was conducted with 10x 3' v3. A central nervous system macrophage (microglia) cell type, derived from the hypothalamus tissue, specifically the preoptic region of HTH (HTHpo), of a 50-year-old male individual with European ethnicity.
This measurement was conducted with 10x 3' v3. Neuron cell type from a 50-year-old male human, specifically from the preoptic region of the hypothalamus (HTHpo). The cell was analyzed using single-nucleus RNA sequencing.
MALAT1 NALF1 ROBO1 CDH12 LSAMP NRXN1 RALYL CACNA2D1 MGAT4C DPYD LRRC4C EGFEM1P DPP10 EPHA6 TENM2 ANK3 MACROD2 RYR2 ANK2 GRID2 KAZN SNTG1 MIR99AHG DMD MEG3 HDAC9 DPP6 MARCHF1 GRIK1 CADM1 SYT1 KCNH7 FRMD4A RABGAP1L GRM1 MAP2 IL1RAPL1 RORA FRAS1 GABRB1 CDH8 TAFA1 FGF14 NCAM2 PARD3 SLC8A1 AHI1 SMYD3 EDIL3 OPCML KCND2 TNRC6A SGCZ DST PDE1A SLC35F1 LINC03051 TRPM3 GRIA4 MEG8 MEIS2 HMGCLL1 PRANCR DCC
- Loss:
MultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Evaluation Dataset
cellxgene_pseudo_bulk_3_5k_multiplets_natural_language_annotation
- Dataset: cellxgene_pseudo_bulk_3_5k_multiplets_natural_language_annotation at 8d2de45
- Size: 298 evaluation samples
- Columns:
anchor
,positive
,negative_1
, andnegative_2
- Approximate statistics based on the first 298 samples:
anchor positive negative_1 negative_2 type string string string string details - min: 361 characters
- mean: 383.43 characters
- max: 419 characters
- min: 105 characters
- mean: 214.29 characters
- max: 809 characters
- min: 110 characters
- mean: 219.1 characters
- max: 809 characters
- min: 361 characters
- mean: 383.64 characters
- max: 419 characters
- Samples:
anchor positive negative_1 negative_2 MALAT1 RORA PTGDS RBMS3 RNF220 GSN DCN ZEB1 SPARCL1 LHFPL6 PTN ZBTB20 NEAT1 TIMP3 AFF3 TCF4 CHD9 FTX DDX17 CST3 CD81 PPFIBP1 SPTBN1 ARHGAP29 AOPEP PARD3 MAML2 IMMP2L WWOX N4BP2L2 FYN VEZT PIAS1 MYO9A TGFBR3 FNDC3B ACTB FBLN1 ITGA8 UBE2K DOCK9 WAC TNRC6B QKI FOXP1 RTN4 IGFBP2 IGFBP5 OSBPL9 STAG1 FOXO3 PLEKHG1 PDZRN3 MKLN1 AFDN TBC1D5 ITM2B FCHSD2 UACA TCF12 NCOA2 TPM1 ZFHX3 COL6A1
This measurement was conducted with 10x 3' v3. Fibroblast cells from the cerebral cortex, specifically the postcentral gyrus, primary somatosensory cortex, S1C, of a 42-year-old male with European ethnicity.
This measurement was conducted with 10x 3' v3. Neuron cell type from a 29-year-old male human cerebral nuclei, specifically from the Extended amygdala (EXA) - Bed nucleus of stria terminalis and nearby - BNST region.
MALAT1 PCDH9 PDE4D CELF2 PHACTR1 R3HDM1 FTX PPP3CA SIPA1L1 DMD ANK3 PRKCB JMJD1C GPC6 AHI1 AUTS2 ATP2B1 MEF2C ARHGAP26 PTK2 KIAA1217 OXR1 SMYD3 CHN1 SYNE1 UBE2E2 CAMK1D EIF4G3 TNRC6A CLASP2 TTC3 N4BP2L2 LIMCH1 RORA FBXW7 RTN4 AFF3 FBXL17 RABGAP1L PBX1 MRTFB MBD5 ARID1B RASAL2 DLG1 EPB41L2 ERC1 TNRC6B ZFAND3 FMNL2 DPYD CCPG1 MYCBP2 MAST4 HERC1 RTN3 MAP7 APP RERE DIAPH2 ITPR1 DST CAMK4 ASAP1
MALAT1 CD74 EEF1A1 ACTB TMSB4X BTG1 TPT1 CD37 TMSB10 PFN1 PTMA TXNIP MS4A1 TSC22D3 FAU CORO1A RACK1 FTL EIF1 GAPDH SERF2 PPIA TCL1A CD79A DUSP1 NCF1 ARPC2 NACA SP100 ZFP36 BTF3 ZFP36L2 LAPTM5 SELENOH PABPC1 MEF2C ERP29 MYL12A CD69 ATP5F1E COX7C UQCRB CFL1 SELL XIST FXYD5 DDX5 ARHGDIB CCNI CXCR4 PPDPF H3-3B LIMD2 IFITM3 SRRM2 CD52 NPM1 IFITM2 PTPRCAP UBA52 EBLN3P CYBA CNN2 RASGRP2
This measurement was conducted with 10x 5' v1. Naive B cell sample taken from a 6-year old female tonsil.
This measurement was conducted with 10x 5' v1. Plasmablast cells derived from a 3-year-old male tonsil tissue sample, with IGH + IGL, IGHV5-5103, IGHD6-601, IGHJ4*02, IGLV2-14, IGLC3, IGLJ3, and IgG3 isotype.
CD74 ACTB PTMA EEF1A1 TMSB4X PFN1 HMGB1 MALAT1 TPT1 H4C3 PPIA CFL1 GAPDH TUBA1B FAU HMGN2 SERF2 H3-3A ARPC2 MS4A1 HMGN1 H2AZ1 ANP32B STMN1 HSP90AA1 SLC25A5 HNRNPA1 NPM1 TMSB10 HSP90AB1 RACK1 OAZ1 ARHGDIB LAPTM5 CORO1A NACA CALM2 BTF3 HSPA8 TPI1 H3-3B PCBP2 PKM MYL6 ARPC3 HNRNPA2B1 ATP5F1E ARPC5 COX5A RHOA CD79A PRDX1 COX4I1 RAN LCP1 SRP14 SNRPG HMGB2 ATP5MG EEF2 EIF1 SUMO2 HLA-DRB5 PABPC1
MALAT1 PCDH9 PDE4D PHACTR1 KIAA1217 CELF2 FTX PDE4B ANK3 ATP2B1 DMD PPP3CA PRKCB TUBA1B MAP1B CHN1 AFF3 ARHGAP26 RABGAP1L OXR1 DPYD RASAL2 SMYD3 BICD1 CLASP2 FTH1 PTK2 HIVEP2 HERC1 TLE4 SPARCL1 FMNL2 EIF4G3 FBXW7 PLXDC2 APP DST TNIK AGAP1 AUTS2 PRKCE JMJD1C R3HDM1 MEF2C TNRC6A AEBP2 NFIB BASP1 TTC3 CAMK2D NRGN TSHZ2 CAMK1D LHFPL6 MBD5 N4BP2L2 ARID1B RTN4 MACF1 SYNE1 NDFIP1 RTN3 SARAF ARGLU1
This measurement was conducted with 10x 3' v3. Neuron cell type from the primary motor cortex (M1C) of a 29-year-old male with European ethnicity, belonging to the deep-layer corticothalamic and 6b subcluster.
This measurement was conducted with 10x 3' v3. Neuron cell type from a 29-year old male human, specifically from the pons tissue and more specifically, the Pontine reticular formation - PnRF dissection. The cell falls under the Midbrain-derived inhibitory supercluster term.
MALAT1 PCDH9 OXR1 KIAA1217 TCF4 PAM AHI1 GPC6 DMD RBMS3 ANK3 FTX PDE4B AUTS2 MACF1 LIMCH1 CLASP2 CELF2 BACH2 ITFG1 FARP1 RORA SMYD3 TNRC6A OSBPL8 PDE4D AAK1 SBF2 FBXL17 AGAP1 PRKCE NF1 EIF4G3 ERC1 PHF20L1 SIK3 TTC3 WWOX MBD5 MYCBP2 ARID1B MEF2C RIPOR2 LRRFIP1 RTN3 DST RABGAP1L PITPNC1 SATB1 WLS SNRPN APP PRKACB UBR3 IQSEC1 SSBP2 AIG1 JAZF1 TNIK ZEB2 CERS6 ANKRD36C CADM1 PBX1
- Loss:
MultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 128per_device_eval_batch_size
: 128learning_rate
: 0.05warmup_ratio
: 0.1gradient_checkpointing
: True
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 128per_device_eval_batch_size
: 128per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 0.05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 3max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torch_fusedoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsehub_revision
: Nonegradient_checkpointing
: Truegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseliger_kernel_config
: Noneeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: proportionalrouter_mapping
: {}learning_rate_mapping
: {}
Training Logs
Epoch | Step | Training Loss | cellxgene pseudo bulk 3 5k multiplets natural language annotation loss | cellxgene_pseudo_bulk_3_5k_multiplets_natural_language_annotation_cell_sentence_2_cosine_accuracy |
---|---|---|---|---|
2.1739 | 50 | 6.383 | 5.7937 | 0.4698 |
Framework Versions
- Python: 3.12.9
- Sentence Transformers: 5.0.0
- Transformers: 4.55.0.dev0
- PyTorch: 2.8.0
- Accelerate: 1.9.0
- Datasets: 2.19.1
- Tokenizers: 0.21.4
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Model tree for jo-mengr/mmcontext-pubmedbert-v3
Finetuned
NeuML/pubmedbert-base-embeddings
Evaluation results
- Cosine Accuracy on cellxgene pseudo bulk 3 5k multiplets natural language annotation cell sentence 2self-reported0.470