Menon nb-bert relevance scorer
Binary classifier built on top of NbAiLab/nb-bert-base.
Used by Menon Economics to score procurement notices as RELEVANT or NOT_RELEVANT for Menon's lead pipeline.
How it works
The model takes the Norwegian-language project description (kort_beskrivelse) and returns a relevance score in [0, 1]. A tuned threshold (saved in threshold.json) converts the score into a binary label.
The training pipeline used:
- description-only input (no
tittel, nooppdragsgiver, no portal/country features) to avoid client- or country-identity shortcuts; - per-row weighting that downweights near-duplicate templated negatives and upweights international positives;
- a stratified train/validation/test split with a held-out test set the model never saw during threshold tuning.
Empty / placeholder / non-Norwegian inputs are routed to needs_review rather than being scored, so the model only commits to a label on inputs it can reasonably judge.
Held-out test results (n = 1,214)
| split | precision | recall | F1 |
|---|---|---|---|
| overall | 0.76 | 0.89 | 0.82 |
| international subset (n=8) | 0.86 | 1.00 | 0.92 |
Threshold tuned on validation for recall ≥ 0.90: 0.2594 (saved in threshold.json).
Usage
from score import score_lead
# Norwegian input — gets a real score
score_lead("Anskaffelse av samfunnsøkonomisk analyse for evaluering...")
# → {"label": "RELEVANT", "score": 0.83, "threshold": 0.2594, "reason": "ok"}
# Empty / placeholder / non-Norwegian input — routed to review, not scored
score_lead("")
# → {"label": "needs_review", "score": None, "reason": "empty"}
score_lead("Se konkurransegrunnlag")
# → {"label": "needs_review", "score": None, "reason": "too_short(len=22)"}
score_lead("TRANSQ is a joint qualification system for transport suppliers.")
# → {"label": "needs_review", "score": None, "reason": "non_norwegian(en)"}
Important: input must be in Norwegian
The model assumes incoming descriptions are already in Norwegian Bokmål. The lead-scraper translates non-Norwegian leads upstream, so by the time a lead reaches this model in production it is in Norwegian.
If a description in another language slips through, it is intentionally flagged needs_review so a human can fetch a correct translation rather than the model returning a low-confidence guess. For one-off ad-hoc scoring of raw foreign text, translate it with any tool (DeepL / OpenAI / GPT / Google) before calling score_lead.
Requires:
transformers,torch,langdetect- No API keys needed.
Files in this repo
| file | purpose |
|---|---|
model.safetensors, config.json |
Model weights + config |
tokenizer.json, vocab.txt, etc. |
Tokenizer |
threshold.json |
Tuned decision threshold |
inference_rules.py |
needs_review() gate (empty / short / placeholder / non-Norwegian) |
score.py |
End-to-end scoring function (use this) |
Training data
Roughly 13,000 labeled procurement leads from doffin / mercell / TED / Nordisk ministerråd / hilma / FHF, with per-row weights encoding class balance, cluster-based deduplication of near-duplicate negatives, and an upweight on international positives. After filtering inputs that the needs_review gate would catch, about 12,100 rows were used for training.
The dataset was split 80 / 10 / 10 (train / validation / test), stratified by (Is_relevant, international) so the rare international examples are represented in every split.
Caveats
- The international evaluation subset is small (~8 held-out positives). The 100% recall on that subset is encouraging but high-variance.
- The
needs_reviewgate accepts Danish and Swedish leniently — those languages are mutually intelligible with Norwegian Bokmål and the underlying model handles them well, so they pass through. - Production assumption: leads arrive translated. Historically about 5–7 non-Norwegian leads/month slip past the scraper; under this model they are routed to human review.
- Downloads last month
- 72
Model tree for RozaA/Menon-nb-bert-base-v2
Base model
NbAiLab/nb-bert-base