Menon nb-bert relevance scorer

Binary classifier built on top of NbAiLab/nb-bert-base. Used by Menon Economics to score procurement notices as RELEVANT or NOT_RELEVANT for Menon's lead pipeline.

How it works

The model takes the Norwegian-language project description (kort_beskrivelse) and returns a relevance score in [0, 1]. A tuned threshold (saved in threshold.json) converts the score into a binary label.

The training pipeline used:

  • description-only input (no tittel, no oppdragsgiver, no portal/country features) to avoid client- or country-identity shortcuts;
  • per-row weighting that downweights near-duplicate templated negatives and upweights international positives;
  • a stratified train/validation/test split with a held-out test set the model never saw during threshold tuning.

Empty / placeholder / non-Norwegian inputs are routed to needs_review rather than being scored, so the model only commits to a label on inputs it can reasonably judge.

Held-out test results (n = 1,214)

split precision recall F1
overall 0.76 0.89 0.82
international subset (n=8) 0.86 1.00 0.92

Threshold tuned on validation for recall ≥ 0.90: 0.2594 (saved in threshold.json).

Usage

from score import score_lead

# Norwegian input — gets a real score
score_lead("Anskaffelse av samfunnsøkonomisk analyse for evaluering...")
# → {"label": "RELEVANT", "score": 0.83, "threshold": 0.2594, "reason": "ok"}

# Empty / placeholder / non-Norwegian input — routed to review, not scored
score_lead("")
# → {"label": "needs_review", "score": None, "reason": "empty"}

score_lead("Se konkurransegrunnlag")
# → {"label": "needs_review", "score": None, "reason": "too_short(len=22)"}

score_lead("TRANSQ is a joint qualification system for transport suppliers.")
# → {"label": "needs_review", "score": None, "reason": "non_norwegian(en)"}

Important: input must be in Norwegian

The model assumes incoming descriptions are already in Norwegian Bokmål. The lead-scraper translates non-Norwegian leads upstream, so by the time a lead reaches this model in production it is in Norwegian.

If a description in another language slips through, it is intentionally flagged needs_review so a human can fetch a correct translation rather than the model returning a low-confidence guess. For one-off ad-hoc scoring of raw foreign text, translate it with any tool (DeepL / OpenAI / GPT / Google) before calling score_lead.

Requires:

  • transformers, torch, langdetect
  • No API keys needed.

Files in this repo

file purpose
model.safetensors, config.json Model weights + config
tokenizer.json, vocab.txt, etc. Tokenizer
threshold.json Tuned decision threshold
inference_rules.py needs_review() gate (empty / short / placeholder / non-Norwegian)
score.py End-to-end scoring function (use this)

Training data

Roughly 13,000 labeled procurement leads from doffin / mercell / TED / Nordisk ministerråd / hilma / FHF, with per-row weights encoding class balance, cluster-based deduplication of near-duplicate negatives, and an upweight on international positives. After filtering inputs that the needs_review gate would catch, about 12,100 rows were used for training.

The dataset was split 80 / 10 / 10 (train / validation / test), stratified by (Is_relevant, international) so the rare international examples are represented in every split.

Caveats

  • The international evaluation subset is small (~8 held-out positives). The 100% recall on that subset is encouraging but high-variance.
  • The needs_review gate accepts Danish and Swedish leniently — those languages are mutually intelligible with Norwegian Bokmål and the underlying model handles them well, so they pass through.
  • Production assumption: leads arrive translated. Historically about 5–7 non-Norwegian leads/month slip past the scraper; under this model they are routed to human review.
Downloads last month
72
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for RozaA/Menon-nb-bert-base-v2

Finetuned
(26)
this model