GLiNER-based Book Metadata Extraction Model

Model Description

This model is a fine-tuned GLiNER (Generalized Linear Named Entity Recognition) model designed for extracting structured metadata from book references and citations. The model is built on top of:

  • GLiNER framework for zero-shot named entity recognition
  • bert-base-uncased as the base transformer architecture
  • USER-base (from deepvk) as an additional pretrained component
  • Fine-tuned on the bookMeta dataset for book metadata extraction

Intended Use

The model is specifically designed to extract the following entities from book references and academic citations:

  • authors - Book authors or editors
  • title - Book or article title
  • publisher - Publishing house or organization
  • year - Publication year
  • pages - Page numbers or page count

How to Use

from gliner import GLiNER

# Load the model
model = GLiNER.from_pretrained("nesemenpolkov/ruGliner-bookMeta")

# Example text
text = "Азбука Морзе для чайников // Иванов П.П., Гущина И. А. 1999. 3 с."

# Define target labels
labels = ["authors", "title", "publisher", "year", "pages"]

# Predict entities
entities = model.predict_entities(text, labels)

# Display results
for entity in entities:
    print(f"{entity['text']} => {entity['label']}")

Training Data

The model was fine-tuned on the bookMeta dataset containing annotated book references with the following characteristics:

Training Data

Dataset Structure

  • Total samples: 10,000
  • Train/Test split: 80%/20%
  • Average entities per sample: 4.3

Annotation Guidelines

  • authors ::= "Иванов А.А., Петров Б.Б."
  • title ::= "Введение в машинное обучение"
  • publisher ::= "Издательство МГУ" | "Springer"
  • year ::= "2020" | "1999 г."
  • pages ::= "с. 123-145" | "pp. 45-67"
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nesemenpolkov/ruGliner-bookMeta

Base model

deepvk/USER-base
Finetuned
(1)
this model

Collection including nesemenpolkov/ruGliner-bookMeta