metadata

license: mit
datasets:
  - nesemenpolkov/bookMeta-ru
language:
  - ru
metrics:
  - accuracy
base_model:
  - deepvk/bert-base-uncased
  - deepvk/USER-base
new_version: nesemenpolkov/ruGliner-bookMeta
pipeline_tag: token-classification
library_name: transformers

GLiNER-based Book Metadata Extraction Model

Model Description

This model is a fine-tuned GLiNER (Generalized Linear Named Entity Recognition) model designed for extracting structured metadata from book references and citations. The model is built on top of:

GLiNER framework for zero-shot named entity recognition
bert-base-uncased as the base transformer architecture
USER-base (from deepvk) as an additional pretrained component
Fine-tuned on the bookMeta dataset for book metadata extraction

Intended Use

The model is specifically designed to extract the following entities from book references and academic citations:

authors - Book authors or editors
title - Book or article title
publisher - Publishing house or organization
year - Publication year
pages - Page numbers or page count

How to Use

from gliner import GLiNER

# Load the model
model = GLiNER.from_pretrained("nesemenpolkov/ruGliner-bookMeta")

# Example text
text = "Азбука Морзе для чайников // Иванов П.П., Гущина И. А. 1999. 3 с."

# Define target labels
labels = ["authors", "title", "publisher", "year", "pages"]

# Predict entities
entities = model.predict_entities(text, labels)

# Display results
for entity in entities:
    print(f"{entity['text']} => {entity['label']}")

Training Data

The model was fine-tuned on the bookMeta dataset containing annotated book references with the following characteristics:

Training Data

Dataset Structure

Total samples: 10,000
Train/Test split: 80%/20%
Average entities per sample: 4.3

Annotation Guidelines

authors ::= "Иванов А.А., Петров Б.Б."
title ::= "Введение в машинное обучение"
publisher ::= "Издательство МГУ" | "Springer"
year ::= "2020" | "1999 г."
pages ::= "с. 123-145" | "pp. 45-67"