URL-TITLE-classifier-preview
Model Overview
This is a preview version of a multi-label web classification model fine-tuned from Alibaba-NLP/gte-modernbert-base
. It classifies websites into multiple categories based on their URLs and titles.
The model supports 11 labels:Uncategorized
, News
, Entertainment
, Shop
, Chat
, Education
, Government
, Health
, Technology
, Work
, and Travel
.
- Developed by: Taimur Hasan
- Model Type: Multi-label Text Classification
- Status: Preview (under active development)
Architecture
- Fine-tuning Strategy: Unfroze the last 4 encoder layers and the pooler
- Problem Type: Multi-label classification
- Output Labels:
News
,Entertainment
,Shop
,Chat
,Education
,Government
,Health
,Technology
,Work
,Travel
,Uncategorized
- Input Format: Concatenated string:
"{url}:{title}"
Evaluation Metrics (Validation Data)
Metric | Value |
---|---|
Loss | 0.207 |
Hamming Loss | 0.083 |
Exact Match | 0.445 |
Precision (Micro) | 0.917 |
Recall (Micro) | 0.917 |
F1 Score (Micro) | 0.917 |
Precision (Macro) | 0.795 |
Recall (Macro) | 0.598 |
F1 Score (Macro) | 0.677 |
Precision (Weighted) | 0.798 |
Recall (Weighted) | 0.647 |
F1 Score (Weighted) | 0.711 |
ROC AUC (Micro) | 0.941 |
ROC AUC (Macro) | 0.928 |
PR AUC (Micro) | 0.815 |
PR AUC (Macro) | 0.765 |
Jaccard (Micro) | 0.848 |
Jaccard (Macro) | 0.520 |
Per-Label F1 Scores
Label | F1 Score |
---|---|
News | 0.605 |
Entertainment | 0.764 |
Shop | 0.704 |
Chat | 0.875 |
Education | 0.763 |
Government | 0.667 |
Health | 0.574 |
Technology | 0.738 |
Work | 0.527 |
Travel | 0.571 |
Uncategorized | 0.657 |
Note: This model is in preview and may not generalize well outside of its training dataset. Feedback and contributions are welcome.
- Downloads last month
- 130
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
1
Ask for provider support
Model tree for firefoxrecap/URL-TITLE-classifier
Base model
answerdotai/ModernBERT-base
Finetuned
Alibaba-NLP/gte-modernbert-base