BERT-tiny trained on GooAQ

This is a Cross Encoder model finetuned from prajjwal1/bert-tiny using the sentence-transformers library. It computes scores for pairs of texts, which can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

This model was trained using train_script.py.

Model Details

Model Description

  • Model Type: Cross Encoder
  • Base model: prajjwal1/bert-tiny
  • Maximum Sequence Length: 512 tokens
  • Number of Output Labels: 1 label
  • Language: en
  • License: apache-2.0

Model Sources

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder("cross-encoder-testing/reranker-bert-tiny-gooaq-bce")
# Get scores for pairs of texts
pairs = [
    ['are javascript developers in demand?', "JavaScript is the skill that is most in-demand for IT in 2020, according to a report from developer skills tester DevSkiller. The report, “Top IT Skills report 2020: Demand and Hiring Trends,” has JavaScript switching places with Java when compared to last year's report, with Java in third place this year, behind SQL."],
    ['are javascript developers in demand?', 'In one line difference between the two is: JavaScript is the programming language where as AngularJS is a framework based on JavaScript. ... It is also the basic for all java script based technologies like jquery, angular JS, bootstrap JS and so on. Angular JS is a framework written in javascript and uses MVC architecture.'],
    ['are javascript developers in demand?', 'Java applications are run in a virtual machine or web browser while JavaScript is run on a web browser. Java code is compiled whereas while JavaScript code is in text and in a web page. JavaScript is an OOP scripting language, whereas Java is an OOP programming language.'],
    ['are javascript developers in demand?', 'Things in the body tag are the things that should be displayed: the actual content. Javascript in the body is executed as it is read and as the page is rendered. Javascript in the head is interpreted before anything is rendered.'],
    ['are javascript developers in demand?', 'Web apps tend to be built using JavaScript, CSS and HTML5. Unlike mobile apps, there is no standard software development kit for building web apps. However, developers do have access to templates. Compared to mobile apps, web apps are usually quicker and easier to build — but they are much simpler in terms of features.'],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    'are javascript developers in demand?',
    [
        "JavaScript is the skill that is most in-demand for IT in 2020, according to a report from developer skills tester DevSkiller. The report, “Top IT Skills report 2020: Demand and Hiring Trends,” has JavaScript switching places with Java when compared to last year's report, with Java in third place this year, behind SQL.",
        'In one line difference between the two is: JavaScript is the programming language where as AngularJS is a framework based on JavaScript. ... It is also the basic for all java script based technologies like jquery, angular JS, bootstrap JS and so on. Angular JS is a framework written in javascript and uses MVC architecture.',
        'Java applications are run in a virtual machine or web browser while JavaScript is run on a web browser. Java code is compiled whereas while JavaScript code is in text and in a web page. JavaScript is an OOP scripting language, whereas Java is an OOP programming language.',
        'Things in the body tag are the things that should be displayed: the actual content. Javascript in the body is executed as it is read and as the page is rendered. Javascript in the head is interpreted before anything is rendered.',
        'Web apps tend to be built using JavaScript, CSS and HTML5. Unlike mobile apps, there is no standard software development kit for building web apps. However, developers do have access to templates. Compared to mobile apps, web apps are usually quicker and easier to build — but they are much simpler in terms of features.',
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Evaluation

Metrics

Cross Encoder Reranking

Metric gooaq-dev NanoMSMARCO NanoNFCorpus NanoNQ
map 0.5677 (+0.0366) 0.4280 (-0.0616) 0.3397 (+0.0787) 0.4149 (-0.0047)
mrr@10 0.5558 (+0.0318) 0.4129 (-0.0646) 0.5196 (+0.0198) 0.4132 (-0.0135)
ndcg@10 0.6157 (+0.0245) 0.4772 (-0.0632) 0.3308 (+0.0058) 0.4859 (-0.0147)

Cross Encoder Nano BEIR

Metric Value
map 0.3942 (+0.0041)
mrr@10 0.4486 (-0.0194)
ndcg@10 0.4313 (-0.0241)

Training Details

Training Dataset

Unnamed Dataset

  • Size: 578,402 training samples
  • Columns: question, answer, and label
  • Approximate statistics based on the first 1000 samples:
    question answer label
    type string string int
    details
    • min: 21 characters
    • mean: 43.81 characters
    • max: 96 characters
    • min: 51 characters
    • mean: 252.46 characters
    • max: 405 characters
    • 0: ~82.90%
    • 1: ~17.10%
  • Samples:
    question answer label
    are javascript developers in demand? JavaScript is the skill that is most in-demand for IT in 2020, according to a report from developer skills tester DevSkiller. The report, “Top IT Skills report 2020: Demand and Hiring Trends,” has JavaScript switching places with Java when compared to last year's report, with Java in third place this year, behind SQL. 1
    are javascript developers in demand? In one line difference between the two is: JavaScript is the programming language where as AngularJS is a framework based on JavaScript. ... It is also the basic for all java script based technologies like jquery, angular JS, bootstrap JS and so on. Angular JS is a framework written in javascript and uses MVC architecture. 0
    are javascript developers in demand? Java applications are run in a virtual machine or web browser while JavaScript is run on a web browser. Java code is compiled whereas while JavaScript code is in text and in a web page. JavaScript is an OOP scripting language, whereas Java is an OOP programming language. 0
  • Loss: BinaryCrossEntropyLoss with these parameters:
    {
        "activation_fct": "torch.nn.modules.linear.Identity",
        "pos_weight": 5
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 2048
  • per_device_eval_batch_size: 2048
  • learning_rate: 0.0005
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • seed: 12
  • bf16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 2048
  • per_device_eval_batch_size: 2048
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 0.0005
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 12
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss gooaq-dev_ndcg@10 NanoMSMARCO_ndcg@10 NanoNFCorpus_ndcg@10 NanoNQ_ndcg@10 NanoBEIR_R100_mean_ndcg@10
-1 -1 - 0.0887 (-0.5025) 0.0063 (-0.5341) 0.3262 (+0.0012) 0.0000 (-0.5006) 0.1108 (-0.3445)
0.0035 1 1.1945 - - - - -
0.0707 20 1.1664 0.4082 (-0.1830) 0.1805 (-0.3600) 0.3168 (-0.0083) 0.2243 (-0.2763) 0.2405 (-0.2149)
0.1413 40 1.1107 0.5260 (-0.0652) 0.3453 (-0.1951) 0.3335 (+0.0085) 0.3430 (-0.1576) 0.3406 (-0.1147)
0.2120 60 1.022 0.5623 (-0.0289) 0.3929 (-0.1475) 0.3512 (+0.0262) 0.3472 (-0.1535) 0.3638 (-0.0916)
0.2827 80 0.973 0.5691 (-0.0221) 0.4048 (-0.1356) 0.3530 (+0.0280) 0.3833 (-0.1174) 0.3804 (-0.0750)
0.3534 100 0.963 0.5814 (-0.0098) 0.4385 (-0.1019) 0.3471 (+0.0221) 0.4227 (-0.0779) 0.4028 (-0.0526)
0.4240 120 0.9419 0.5963 (+0.0050) 0.4106 (-0.1298) 0.3540 (+0.0289) 0.4843 (-0.0163) 0.4163 (-0.0391)
0.4947 140 0.9331 0.5953 (+0.0041) 0.4310 (-0.1094) 0.3367 (+0.0117) 0.4163 (-0.0843) 0.3947 (-0.0607)
0.5654 160 0.9263 0.6070 (+0.0158) 0.4626 (-0.0778) 0.3443 (+0.0193) 0.4823 (-0.0184) 0.4297 (-0.0256)
0.6360 180 0.9212 0.6069 (+0.0156) 0.4602 (-0.0802) 0.3391 (+0.0141) 0.4782 (-0.0224) 0.4258 (-0.0295)
0.7067 200 0.901 0.6126 (+0.0214) 0.4602 (-0.0803) 0.3413 (+0.0162) 0.4780 (-0.0227) 0.4265 (-0.0289)
0.7774 220 0.8997 0.6136 (+0.0224) 0.4801 (-0.0604) 0.3349 (+0.0098) 0.4903 (-0.0103) 0.4351 (-0.0203)
0.8481 240 0.9021 0.6132 (+0.0220) 0.4850 (-0.0554) 0.3438 (+0.0188) 0.4855 (-0.0151) 0.4381 (-0.0173)
0.9187 260 0.9013 0.6188 (+0.0276) 0.4820 (-0.0584) 0.3387 (+0.0137) 0.4851 (-0.0156) 0.4353 (-0.0201)
0.9894 280 0.8996 0.6157 (+0.0245) 0.4772 (-0.0632) 0.3305 (+0.0054) 0.4859 (-0.0147) 0.4312 (-0.0242)
-1 -1 - 0.6157 (+0.0245) 0.4772 (-0.0632) 0.3308 (+0.0058) 0.4859 (-0.0147) 0.4313 (-0.0241)

Environmental Impact

Carbon emissions were measured using CodeCarbon.

  • Energy Consumed: 0.019 kWh
  • Carbon Emitted: 0.007 kg of CO2
  • Hours Used: 0.099 hours

Training Hardware

  • On Cloud: No
  • GPU Model: 1 x NVIDIA GeForce RTX 3090
  • CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
  • RAM Size: 31.78 GB

Framework Versions

  • Python: 3.11.6
  • Sentence Transformers: 3.5.0.dev0
  • Transformers: 4.48.3
  • PyTorch: 2.5.0+cu121
  • Accelerate: 1.3.0
  • Datasets: 2.20.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
0
Safetensors
Model size
4.39M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support text-classification models for sentence-transformers library.

Model tree for cross-encoder-testing/reranker-bert-tiny-gooaq-bce

Finetuned
(59)
this model