SetFit with BAAI/bge-small-en-v1.5

This is a SetFit model that can be used for Text Classification. This SetFit model uses BAAI/bge-small-en-v1.5 as the Sentence Transformer embedding model. A MultiOutputClassifier instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

  1. Fine-tuning a Sentence Transformer with contrastive learning.
  2. Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

  • Model Type: SetFit
  • Sentence Transformer body: BAAI/bge-small-en-v1.5
  • Classification head: a MultiOutputClassifier instance
  • Maximum Sequence Length: 512 tokens

Model Sources

Evaluation

Metrics

Label Accuracy
all 0.8485

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("TheoLvs/wsl-prescreening-multi-v0.0")
# Run inference
preds = model("Wetland carbon sequestration capacity shows non-linear response to restoration technique and hydrological regime This study examines carbon sequestration outcomes from 124 wetland restoration projects across North America, Europe, and Asia over a 15-year monitoring period. Using standardized carbon flux measurements and sediment coring, we quantified how restoration approach and hydrological management influence carbon accumulation rates. Results demonstrate that restoration technique explained 53% of variance in carbon sequestration outcomes, with significant interaction effects between technique and hydroperiod. Projects restoring natural hydrological fluctuations achieved 2.7 times higher carbon accumulation rates than those maintaining static water levels. Vegetation community composition emerged as a significant mediating variable, with diverse native assemblages sequestering 34% more carbon than simplified or non-native communities. Our findings indicate that wetland restoration prioritizing hydrological dynamism and diverse vegetation delivers superior climate mitigation benefits while simultaneously enhancing habitat value and water quality functions.")

Training Details

Training Set Metrics

Training set Min Median Max
Word count 90 191.8561 348

Training Hyperparameters

  • batch_size: (8, 8)
  • num_epochs: (5, 5)
  • max_steps: 5000
  • sampling_strategy: oversampling
  • body_learning_rate: (2e-05, 1e-05)
  • head_learning_rate: 0.01
  • loss: CosineSimilarityLoss
  • distance_metric: cosine_distance
  • margin: 0.25
  • end_to_end: False
  • use_amp: False
  • warmup_proportion: 0.1
  • l2_weight: 0.01
  • seed: 42
  • eval_max_steps: -1
  • load_best_model_at_end: False

Training Results

Epoch Step Training Loss Validation Loss
0.0006 1 0.158 -
0.0288 50 0.2511 -
0.0575 100 0.215 -
0.0863 150 0.1883 -
0.1151 200 0.165 -
0.1438 250 0.1274 -
0.1726 300 0.0801 -
0.2014 350 0.0635 -
0.2301 400 0.0427 -
0.2589 450 0.0355 -
0.2877 500 0.0337 -
0.3165 550 0.0271 -
0.3452 600 0.0069 -
0.3740 650 0.0032 -
0.4028 700 0.0033 -
0.4315 750 0.0027 -
0.4603 800 0.0022 -
0.4891 850 0.002 -
0.5178 900 0.0019 -
0.5466 950 0.0017 -
0.5754 1000 0.0017 -
0.6041 1050 0.0015 -
0.6329 1100 0.0015 -
0.6617 1150 0.0013 -
0.6904 1200 0.0013 -
0.7192 1250 0.0014 -
0.7480 1300 0.0012 -
0.7768 1350 0.0012 -
0.8055 1400 0.0011 -
0.8343 1450 0.0012 -
0.8631 1500 0.0011 -
0.8918 1550 0.0011 -
0.9206 1600 0.0011 -
0.9494 1650 0.001 -
0.9781 1700 0.001 -
1.0069 1750 0.001 -
1.0357 1800 0.001 -
1.0644 1850 0.0009 -
1.0932 1900 0.0009 -
1.1220 1950 0.0009 -
1.1507 2000 0.0009 -
1.1795 2050 0.0009 -
1.2083 2100 0.0009 -
1.2371 2150 0.0008 -
1.2658 2200 0.0009 -
1.2946 2250 0.0008 -
1.3234 2300 0.0008 -
1.3521 2350 0.0008 -
1.3809 2400 0.0008 -
1.4097 2450 0.0008 -
1.4384 2500 0.0008 -
1.4672 2550 0.0007 -
1.4960 2600 0.0007 -
1.5247 2650 0.0007 -
1.5535 2700 0.0007 -
1.5823 2750 0.0007 -
1.6110 2800 0.0007 -
1.6398 2850 0.0007 -
1.6686 2900 0.0007 -
1.6974 2950 0.0007 -
1.7261 3000 0.0006 -
1.7549 3050 0.0007 -
1.7837 3100 0.0007 -
1.8124 3150 0.0007 -
1.8412 3200 0.0007 -
1.8700 3250 0.0007 -
1.8987 3300 0.0006 -
1.9275 3350 0.0006 -
1.9563 3400 0.0006 -
1.9850 3450 0.0006 -
2.0138 3500 0.0006 -
2.0426 3550 0.0006 -
2.0713 3600 0.0006 -
2.1001 3650 0.0006 -
2.1289 3700 0.0006 -
2.1577 3750 0.0006 -
2.1864 3800 0.0006 -
2.2152 3850 0.0006 -
2.2440 3900 0.0006 -
2.2727 3950 0.0006 -
2.3015 4000 0.0006 -
2.3303 4050 0.0006 -
2.3590 4100 0.0006 -
2.3878 4150 0.0006 -
2.4166 4200 0.0005 -
2.4453 4250 0.0006 -
2.4741 4300 0.0005 -
2.5029 4350 0.0006 -
2.5316 4400 0.0006 -
2.5604 4450 0.0005 -
2.5892 4500 0.0005 -
2.6180 4550 0.0005 -
2.6467 4600 0.0005 -
2.6755 4650 0.0005 -
2.7043 4700 0.0005 -
2.7330 4750 0.0005 -
2.7618 4800 0.0005 -
2.7906 4850 0.0005 -
2.8193 4900 0.0005 -
2.8481 4950 0.0005 -
2.8769 5000 0.0005 -

Framework Versions

  • Python: 3.11.12
  • SetFit: 1.1.2
  • Sentence Transformers: 4.1.0
  • Transformers: 4.45.2
  • PyTorch: 2.6.0+cu124
  • Datasets: 3.6.0
  • Tokenizers: 0.20.3

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}
Downloads last month
16
Safetensors
Model size
33.4M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TheoLvs/wsl-prescreening-multi-v0.0

Finetuned
(192)
this model

Evaluation results