ruGliner-bookMeta / README.md
nesemenpolkov's picture
Update README.md
73f9fe9 verified
metadata
license: mit
datasets:
  - nesemenpolkov/bookMeta-ru
language:
  - ru
metrics:
  - accuracy
base_model:
  - deepvk/bert-base-uncased
  - deepvk/USER-base
new_version: nesemenpolkov/ruGliner-bookMeta
pipeline_tag: token-classification
library_name: transformers

GLiNER-based Book Metadata Extraction Model

Model Description

This model is a fine-tuned GLiNER (Generalized Linear Named Entity Recognition) model designed for extracting structured metadata from book references and citations. The model is built on top of:

  • GLiNER framework for zero-shot named entity recognition
  • bert-base-uncased as the base transformer architecture
  • USER-base (from deepvk) as an additional pretrained component
  • Fine-tuned on the bookMeta dataset for book metadata extraction

Intended Use

The model is specifically designed to extract the following entities from book references and academic citations:

  • authors - Book authors or editors
  • title - Book or article title
  • publisher - Publishing house or organization
  • year - Publication year
  • pages - Page numbers or page count

How to Use

from gliner import GLiNER

# Load the model
model = GLiNER.from_pretrained("nesemenpolkov/ruGliner-bookMeta")

# Example text
text = "Азбука Морзе для чайников // Иванов П.П., Гущина И. А. 1999. 3 с."

# Define target labels
labels = ["authors", "title", "publisher", "year", "pages"]

# Predict entities
entities = model.predict_entities(text, labels)

# Display results
for entity in entities:
    print(f"{entity['text']} => {entity['label']}")

Training Data

The model was fine-tuned on the bookMeta dataset containing annotated book references with the following characteristics:

Training Data

Dataset Structure

  • Total samples: 10,000
  • Train/Test split: 80%/20%
  • Average entities per sample: 4.3

Annotation Guidelines

  • authors ::= "Иванов А.А., Петров Б.Б."
  • title ::= "Введение в машинное обучение"
  • publisher ::= "Издательство МГУ" | "Springer"
  • year ::= "2020" | "1999 г."
  • pages ::= "с. 123-145" | "pp. 45-67"