FineWeb2-RoEdu-Classifier

FineWeb2-RoEdu-Classifier is a lightweight quality classifier for the Romanian language. It is designed to distinguish high-quality educational content from generic web text. The model was trained on data annotated by Gemma3 12B. More details can be found here.

Key Features

  • Educational Quality Scoring: The model assigns a scalar score (typically 0-5) to text, reflecting its educational value and coherence.
  • Topic, Format and Educational Level: The model also predicts additional signals that could be used for diversity filtering.
  • Distilled Knowledge: It is trained on Romanian web samples annotated by Gemma3 12B, effectively distilling the frontier model's judgment into a more efficient architecture.
  • Proven Effectiveness: We showed that used data curated by this classifier improved several metrics (ARC, HellaSwag).

Usage

You can find a demo here.

Downloads last month
11
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including OpenLLM-Ro/FineWeb2-RoEdu-Classifier