metadata
license: mit
datasets:
- nesemenpolkov/bookMeta-ru
language:
- ru
metrics:
- accuracy
base_model:
- deepvk/bert-base-uncased
- deepvk/USER-base
new_version: nesemenpolkov/ruGliner-bookMeta
pipeline_tag: token-classification
library_name: transformers
GLiNER-based Book Metadata Extraction Model
Model Description
This model is a fine-tuned GLiNER (Generalized Linear Named Entity Recognition) model designed for extracting structured metadata from book references and citations. The model is built on top of:
- GLiNER framework for zero-shot named entity recognition
- bert-base-uncased as the base transformer architecture
- USER-base (from deepvk) as an additional pretrained component
- Fine-tuned on the bookMeta dataset for book metadata extraction
Intended Use
The model is specifically designed to extract the following entities from book references and academic citations:
authors
- Book authors or editorstitle
- Book or article titlepublisher
- Publishing house or organizationyear
- Publication yearpages
- Page numbers or page count
How to Use
from gliner import GLiNER
# Load the model
model = GLiNER.from_pretrained("nesemenpolkov/ruGliner-bookMeta")
# Example text
text = "Азбука Морзе для чайников // Иванов П.П., Гущина И. А. 1999. 3 с."
# Define target labels
labels = ["authors", "title", "publisher", "year", "pages"]
# Predict entities
entities = model.predict_entities(text, labels)
# Display results
for entity in entities:
print(f"{entity['text']} => {entity['label']}")
Training Data
The model was fine-tuned on the bookMeta dataset containing annotated book references with the following characteristics:
Training Data
Dataset Structure
- Total samples: 10,000
- Train/Test split: 80%/20%
- Average entities per sample: 4.3
Annotation Guidelines
- authors ::= "Иванов А.А., Петров Б.Б."
- title ::= "Введение в машинное обучение"
- publisher ::= "Издательство МГУ" | "Springer"
- year ::= "2020" | "1999 г."
- pages ::= "с. 123-145" | "pp. 45-67"