splade-co-condenser-marco trained on MS MARCO hard negatives with distillation

This is a SPLADE Sparse Encoder model finetuned from Luyu/co-condenser-marco on the msmarco dataset using the sentence-transformers library. It maps sentences & paragraphs to a 30522-dimensional sparse vector space and can be used for semantic search and sparse retrieval.

Model Details

Model Description

Model Type: SPLADE Sparse Encoder
Base model: Luyu/co-condenser-marco
Maximum Sequence Length: 256 tokens
Output Dimensionality: 30522 dimensions
Similarity Function: Dot Product
Training Dataset:
- msmarco
Language: en
License: apache-2.0

Model Sources

Documentation: Sentence Transformers Documentation
Documentation: Sparse Encoder Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sparse Encoders on Hugging Face

Full Model Architecture

SparseEncoder(
  (0): MLMTransformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertForMaskedLM'})
  (1): SpladePooling({'pooling_strategy': 'max', 'activation_function': 'relu', 'word_embedding_dimension': 30522})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SparseEncoder

# Download from the 🤗 Hub
model = SparseEncoder("arthurbresnu/splade-co-condenser-marco-msmarco-qwen3-reranker-0.6B-margin-mse")
# Run inference
queries = [
    "what town is grand lake st mary near",
]
documents = [
    'Grand Lake St. Marys State Park. Grand Lake St. Marys State Park is an American state park, west of St. Marys, and south-east of Celina, 23 miles (37 km) south-west of Lima in the north-western part of Ohio. Grand Lake covers 13,500 acres (5,500 ha) in Mercer and Auglaize counties.',
    'Lake Poinsett. Home > Florida Lakes > Lake Poinsett. Lake Poinsett BASS ONLINE 2016-10-18T14:26:01+00:00. Lake Poinsett Fishing. As the St. Johns River snakes out of Lake Washington and through the lush, green marshes, it eventually forms a â\x80\x98minorâ\x80\x99 wide spot in its trace some eight miles to the North.',
    'Slavery in America began when the first African slaves were brought to the North American colony of Jamestown, Virginia, in 1619, to aid in the production of such lucrative crops as tobacco.',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 30522] [3, 30522]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[23.0166, 12.0320,  1.7877]])

Evaluation

Metrics

Sparse Information Retrieval

Datasets: NanoMSMARCO, NanoNFCorpus, NanoNQ, NanoClimateFEVER, NanoDBPedia, NanoFEVER, NanoFiQA2018, NanoHotpotQA, NanoMSMARCO, NanoNFCorpus, NanoNQ, NanoQuoraRetrieval, NanoSCIDOCS, NanoArguAna, NanoSciFact and NanoTouche2020
Evaluated with SparseInformationRetrievalEvaluator

Metric	NanoMSMARCO	NanoNFCorpus	NanoNQ	NanoClimateFEVER	NanoDBPedia	NanoFEVER	NanoFiQA2018	NanoHotpotQA	NanoQuoraRetrieval	NanoSCIDOCS	NanoArguAna	NanoSciFact	NanoTouche2020
dot_accuracy@1	0.38	0.36	0.5	0.32	0.78	0.76	0.4	0.84	0.82	0.42	0.14	0.52	0.6939
dot_accuracy@3	0.68	0.5	0.74	0.48	0.86	0.9	0.5	0.94	1.0	0.58	0.46	0.72	0.8163
dot_accuracy@5	0.76	0.62	0.78	0.58	0.9	0.92	0.62	0.96	1.0	0.66	0.58	0.74	0.9184
dot_accuracy@10	0.82	0.7	0.82	0.72	0.96	0.96	0.74	0.98	1.0	0.82	0.74	0.84	0.9796
dot_precision@1	0.38	0.36	0.5	0.32	0.78	0.76	0.4	0.84	0.82	0.42	0.14	0.52	0.6939
dot_precision@3	0.2267	0.32	0.2533	0.1733	0.68	0.3067	0.2133	0.4733	0.3867	0.2733	0.1533	0.26	0.6054
dot_precision@5	0.152	0.328	0.16	0.14	0.608	0.192	0.168	0.32	0.252	0.228	0.116	0.164	0.5551
dot_precision@10	0.082	0.264	0.088	0.098	0.516	0.104	0.106	0.172	0.134	0.164	0.074	0.094	0.4714
dot_recall@1	0.38	0.024	0.46	0.159	0.0805	0.7167	0.2261	0.42	0.744	0.0877	0.14	0.5	0.0472
dot_recall@3	0.68	0.0792	0.69	0.2357	0.1699	0.8533	0.2985	0.71	0.938	0.1697	0.46	0.71	0.1243
dot_recall@5	0.76	0.0983	0.73	0.2957	0.2371	0.8833	0.4133	0.8	0.9687	0.2347	0.58	0.725	0.189
dot_recall@10	0.82	0.1246	0.79	0.3847	0.3389	0.9433	0.527	0.86	0.99	0.3357	0.74	0.83	0.3121
dot_ndcg@10	0.6108	0.3105	0.6426	0.325	0.634	0.8472	0.4163	0.7969	0.9198	0.326	0.4314	0.6742	0.534
dot_mrr@10	0.5427	0.4639	0.6181	0.4319	0.834	0.8369	0.4877	0.895	0.9033	0.5275	0.3336	0.6251	0.7833
dot_map@100	0.551	0.1358	0.5913	0.2561	0.4791	0.8096	0.3523	0.7206	0.8869	0.2485	0.3456	0.6257	0.394
query_active_dims	47.4	46.64	55.28	110.88	48.7	76.32	52.36	81.46	50.98	77.1	184.38	93.36	47.4898
query_sparsity_ratio	0.9984	0.9985	0.9982	0.9964	0.9984	0.9975	0.9983	0.9973	0.9983	0.9975	0.994	0.9969	0.9984
corpus_active_dims	140.1564	215.3624	160.6441	192.7664	181.0865	218.5546	134.6856	189.121	54.8137	191.1029	172.4982	216.2216	139.9833
corpus_sparsity_ratio	0.9954	0.9929	0.9947	0.9937	0.9941	0.9928	0.9956	0.9938	0.9982	0.9937	0.9943	0.9929	0.9954

Sparse Nano BEIR

Dataset: NanoBEIR_mean

Evaluated with SparseNanoBEIREvaluator with these parameters:

{
    "dataset_names": [
        "msmarco",
        "nfcorpus",
        "nq"
    ]
}

Metric	Value
dot_accuracy@1	0.4133
dot_accuracy@3	0.6267
dot_accuracy@5	0.7133
dot_accuracy@10	0.7867
dot_precision@1	0.4133
dot_precision@3	0.2644
dot_precision@5	0.2107
dot_precision@10	0.1447
dot_recall@1	0.2879
dot_recall@3	0.4693
dot_recall@5	0.5227
dot_recall@10	0.5778
dot_ndcg@10	0.5177
dot_mrr@10	0.5369
dot_map@100	0.4213
query_active_dims	49.6067
query_sparsity_ratio	0.9984
corpus_active_dims	164.0576
corpus_sparsity_ratio	0.9946

Sparse Nano BEIR

Dataset: NanoBEIR_mean

Evaluated with SparseNanoBEIREvaluator with these parameters:

{
    "dataset_names": [
        "climatefever",
        "dbpedia",
        "fever",
        "fiqa2018",
        "hotpotqa",
        "msmarco",
        "nfcorpus",
        "nq",
        "quoraretrieval",
        "scidocs",
        "arguana",
        "scifact",
        "touche2020"
    ]
}

Metric	Value
dot_accuracy@1	0.5334
dot_accuracy@3	0.7059
dot_accuracy@5	0.7722
dot_accuracy@10	0.8523
dot_precision@1	0.5334
dot_precision@3	0.3327
dot_precision@5	0.2602
dot_precision@10	0.1821
dot_recall@1	0.3065
dot_recall@3	0.4707
dot_recall@5	0.5319
dot_recall@10	0.6151
dot_ndcg@10	0.5745
dot_mrr@10	0.6371
dot_map@100	0.492
query_active_dims	74.8382
query_sparsity_ratio	0.9975
corpus_active_dims	164.4836
corpus_sparsity_ratio	0.9946

Training Details

Training Dataset

msmarco

Dataset: msmarco at 9e329ed
Size: 90,000 training samples
Columns: query, positive, negative, and score

Approximate statistics based on the first 1000 samples:

	query	positive	negative	score
type	string	string	string	float
details	min: 4 tokens mean: 9.05 tokens max: 22 tokens	min: 17 tokens mean: 79.74 tokens max: 228 tokens	min: 14 tokens mean: 77.68 tokens max: 256 tokens	min: -3.38 mean: 10.51 max: 21.0

Samples:

query	positive	negative	score
`journal entries for standard cost variances`	1 Fiber Optic, Inc., investigates all variances above 10 percent of the flexible budget. 2 The flexible budget for direct materials is $50,000. 3 The direct materials price variance is $4,000 unfavorable and the direct materials quantity variance is $(6,000) favorable. Assuming a standard price of $5 per yard, prepare a journal entry to record the purchase of raw materials for the month. 2 The company used 39,000 yards of material in production for the month, and the flexible budget shows the company expected to use 40,800 yards.	`In accounting the monthly close is the processing of transactions, journal entries and financial statements at the end of each month.`	`9.375`
`what county in pana, il in?`	`Pana /ËpeÉªnÉ/ is a city in Christian County, Illinois, United States. The population was 5,614 at the 2000 census.`	`Burr Ridge, IL is currently using an area code overlay in which area codes 331 and 630 serve the same geographic area. Ten digit dialing (area code + seven digit number) is necessary. In addition to Burr Ridge, IL area code information read more about area codes 331 and 630 details and Illinois area codes. Burr Ridge, IL is located in DuPage County and observes the Central Time Zone. View our Times by Area Code tool.`	`13.75`
`when was keep on loving you released`	`Share this page. REO's first Top 40 appearance proved to be a fruitful one, with the group taking Keep on Loving You to the number one spot in December of 1980.`	Description: âIf Loving You Is Wrongâ is the new dramatic series created for television by writer/director Tyler Perry, premiering this fall on OWN. âIf Loving You Is Wrongâ is the compelling story of several women from very different walks of life.ack to the Have and The Have Nots, the scenes are too long and the characters are one dimensional. If it wasn't for Tika Sumpter, the show would be unbearable to watch. Love Thy Neighbor is the worst show ever. It is a throwback to blackface mistrel shows.	`8.875`

Loss: SpladeLoss with these parameters:

{
    "loss": "SparseMarginMSELoss",
    "document_regularizer_weight": 0.08,
    "query_regularizer_weight": 0.1
}

Evaluation Dataset

msmarco

Dataset: msmarco at 9e329ed
Size: 10,000 evaluation samples
Columns: query, positive, negative, and score

Approximate statistics based on the first 1000 samples:

	query	positive	negative	score
type	string	string	string	float
details	min: 4 tokens mean: 9.22 tokens max: 43 tokens	min: 19 tokens mean: 80.19 tokens max: 209 tokens	min: 14 tokens mean: 77.78 tokens max: 239 tokens	min: -9.0 mean: 10.74 max: 21.75

Samples:

query	positive	negative	score
`what trump said about obama playing golf during campaign`	`Obama also has played golf with Woods during his presidency, though typically the presidentâs golf partners are personal friends and select aides, as opposed to celebrities. At a campaign rally in December 2015, Trump ripped into Obama for playing hundreds of rounds of golf as president. âHe played more golf last year than Tiger Woods,â Trump said suggestively. âWe donât have time for this. We have to work.â.`	`Trump slams Obama, Clinton for 'politically correct' war against ISIS, warns of more attacks. Republican presidential nominee Donald Trump has accused the Obama administration of waging a 'politically correct' war against the ISIS terror group and warned that more terror attacks would take place.`	`8.421875`
`how much volume is a gram`	`One gram is equal to 0.0353 ounces. A gram of sugar is approximately 1/4 teaspoon of sugar. A regular paper clip weighs about 1 gram. The gram and kilogram are units of mass in the metric system of measurement. The metric system was invented in France in 1799. It was improved in 1960 and named the System of International Units, or SI.`	Divide the object's mass by its volume. This value is the object's density and expresses it in units of mass per unit of volume. For example, for a 20-gram mass that takes up a volume of 5 cubic centimeters, the density is 4 grams per cubic centimeter.Ad.ivide the object's mass by its volume. This value is the object's density and expresses it in units of mass per unit of volume. For example, for a 20-gram mass that takes up a volume of 5 cubic centimeters, the density is 4 grams per cubic centimeter. Ad.	`2.65625`
`differences between the sexes`	`sexual dimorphism in birds can be manifested in size or plumage differences between the sexes sexual size dimorphism varies among taxa with males typically being larger though this is not always the case i e birds of prey hummingbirds and some species of flightless birds`	`Caribou are the only species of deer in which both sexes have antlers. Mature bulls can carry enormous and complex antlers, whereas cows and young animals generally have smaller and simpler ones. Mature bulls usually shed their antlers shortly after the rut whereas cows can keep theirs until spring.`	`10.21875`

Loss: SpladeLoss with these parameters:

{
    "loss": "SparseMarginMSELoss",
    "document_regularizer_weight": 0.08,
    "query_regularizer_weight": 0.1
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
learning_rate: 2e-05
num_train_epochs: 1
warmup_ratio: 0.1
bf16: True
load_best_model_at_end: True

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 1
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
hub_revision: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
liger_kernel_config: None
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}

Training Logs

Epoch	Step	Training Loss	Validation Loss	NanoMSMARCO_dot_ndcg@10	NanoNFCorpus_dot_ndcg@10	NanoNQ_dot_ndcg@10	NanoBEIR_mean_dot_ndcg@10	NanoClimateFEVER_dot_ndcg@10	NanoDBPedia_dot_ndcg@10	NanoFEVER_dot_ndcg@10	NanoFiQA2018_dot_ndcg@10	NanoHotpotQA_dot_ndcg@10	NanoQuoraRetrieval_dot_ndcg@10	NanoSCIDOCS_dot_ndcg@10	NanoArguAna_dot_ndcg@10	NanoSciFact_dot_ndcg@10	NanoTouche2020_dot_ndcg@10
0.0178	100	576200.8	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.0356	200	2635.0334	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.0533	300	70.7781	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.0711	400	46.7365	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.0889	500	33.3391	46.8835	0.5158	0.2778	0.6192	0.4709	-	-	-	-	-	-	-	-	-	-
0.1067	600	29.4815	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.1244	700	27.123	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.1422	800	22.7267	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.16	900	22.2125	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.1778	1000	23.7129	22.1341	0.5768	0.2807	0.5689	0.4754	-	-	-	-	-	-	-	-	-	-
0.1956	1100	23.1061	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.2133	1200	23.3015	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.2311	1300	19.0495	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.2489	1400	20.465	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.2667	1500	19.5227	18.4953	0.5447	0.2930	0.5663	0.4680	-	-	-	-	-	-	-	-	-	-
0.2844	1600	19.7019	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.3022	1700	20.2723	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.32	1800	18.644	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.3378	1900	17.8863	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.3556	2000	17.824	21.6579	0.5722	0.2951	0.5739	0.4804	-	-	-	-	-	-	-	-	-	-
0.3733	2100	18.2091	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.3911	2200	17.9996	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.4089	2300	15.7506	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.4267	2400	17.8921	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.4444	2500	16.3761	20.0396	0.5493	0.2811	0.6257	0.4854	-	-	-	-	-	-	-	-	-	-
0.4622	2600	18.1791	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.48	2700	15.3429	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.4978	2800	14.9936	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.5156	2900	15.364	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.5333	3000	15.6449	17.3149	0.5672	0.3030	0.6095	0.4932	-	-	-	-	-	-	-	-	-	-
0.5511	3100	15.6673	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.5689	3200	15.0578	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.5867	3300	15.906	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.6044	3400	15.6495	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.6222	3500	13.6636	14.5839	0.5683	0.2978	0.6191	0.4951	-	-	-	-	-	-	-	-	-	-
0.64	3600	14.7215	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.6578	3700	15.1004	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.6756	3800	13.7198	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.6933	3900	13.9975	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.7111	4000	13.5657	14.8618	0.5983	0.3042	0.6183	0.5069	-	-	-	-	-	-	-	-	-	-
0.7289	4100	13.8326	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.7467	4200	14.5209	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.7644	4300	13.4064	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.7822	4400	13.7625	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.8	4500	13.2154	14.3594	0.5734	0.3266	0.6345	0.5115	-	-	-	-	-	-	-	-	-	-
0.8178	4600	13.7091	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.8356	4700	12.5913	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.8533	4800	12.433	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.8711	4900	13.0404	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.8889	5000	12.409	14.0825	0.6108	0.3105	0.6426	0.5213	-	-	-	-	-	-	-	-	-	-
0.9067	5100	12.4556	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.9244	5200	12.4219	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.9422	5300	12.4269	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.96	5400	12.5363	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.9778	5500	12.4979	13.8156	0.6024	0.3101	0.6405	0.5177	-	-	-	-	-	-	-	-	-	-
0.9956	5600	11.9616	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
-1	-1	-	-	0.6108	0.3105	0.6426	0.5745	0.3250	0.6340	0.8472	0.4163	0.7969	0.9198	0.3260	0.4314	0.6742	0.5340

The bold row denotes the saved checkpoint.

Framework Versions

Python: 3.13.3
Sentence Transformers: 4.2.0.dev0
Transformers: 4.53.0
PyTorch: 2.7.1+cu126
Accelerate: 1.8.1
Datasets: 3.6.0
Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

SpladeLoss

@misc{formal2022distillationhardnegativesampling,
      title={From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective},
      author={Thibault Formal and Carlos Lassance and Benjamin Piwowarski and Stéphane Clinchant},
      year={2022},
      eprint={2205.04733},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2205.04733},
}

SparseMarginMSELoss

@misc{hofstätter2021improving,
    title={Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation},
    author={Sebastian Hofstätter and Sophia Althammer and Michael Schröder and Mete Sertkan and Allan Hanbury},
    year={2021},
    eprint={2010.02666},
    archivePrefix={arXiv},
    primaryClass={cs.IR}
}

FlopsLoss

@article{paria2020minimizing,
    title={Minimizing flops to learn efficient sparse representations},
    author={Paria, Biswajit and Yeh, Chih-Kuan and Yen, Ian EH and Xu, Ning and Ravikumar, Pradeep and P{'o}czos, Barnab{'a}s},
    journal={arXiv preprint arXiv:2004.05665},
    year={2020}
}

arthurbresnu
/

splade-co-condenser-marco-msmarco-qwen3-reranker-0.6B-margin-mse

splade-co-condenser-marco trained on MS MARCO hard negatives with distillation

Model Details

Model Description

Model Sources

Full Model Architecture

Usage

Direct Usage (Sentence Transformers)

Evaluation

Metrics

Sparse Information Retrieval

Sparse Nano BEIR

Sparse Nano BEIR

Training Details

Training Dataset

msmarco

Evaluation Dataset

msmarco

Training Hyperparameters

Non-Default Hyperparameters

All Hyperparameters

Training Logs

Framework Versions

Citation

BibTeX

Sentence Transformers

SpladeLoss

SparseMarginMSELoss

FlopsLoss

Model tree for arthurbresnu/splade-co-condenser-marco-msmarco-qwen3-reranker-0.6B-margin-mse

Dataset used to train arthurbresnu/splade-co-condenser-marco-msmarco-qwen3-reranker-0.6B-margin-mse

Evaluation results