--- license: mit datasets: - nesemenpolkov/bookMeta-ru language: - ru metrics: - accuracy base_model: - deepvk/bert-base-uncased - deepvk/USER-base new_version: nesemenpolkov/ruGliner-bookMeta pipeline_tag: token-classification library_name: transformers --- # GLiNER-based Book Metadata Extraction Model ## Model Description This model is a fine-tuned **GLiNER** (Generalized Linear Named Entity Recognition) model designed for extracting structured metadata from book references and citations. The model is built on top of: - **GLiNER framework** for zero-shot named entity recognition - **bert-base-uncased** as the base transformer architecture - **USER-base** (from deepvk) as an additional pretrained component - Fine-tuned on the **bookMeta** dataset for book metadata extraction ## Intended Use The model is specifically designed to extract the following entities from book references and academic citations: - `authors` - Book authors or editors - `title` - Book or article title - `publisher` - Publishing house or organization - `year` - Publication year - `pages` - Page numbers or page count ## How to Use ```python from gliner import GLiNER # Load the model model = GLiNER.from_pretrained("nesemenpolkov/ruGliner-bookMeta") # Example text text = "Азбука Морзе для чайников // Иванов П.П., Гущина И. А. 1999. 3 с." # Define target labels labels = ["authors", "title", "publisher", "year", "pages"] # Predict entities entities = model.predict_entities(text, labels) # Display results for entity in entities: print(f"{entity['text']} => {entity['label']}") ``` ## Training Data The model was fine-tuned on the **bookMeta** dataset containing annotated book references with the following characteristics: ## Training Data ### Dataset Structure - **Total samples**: 10,000 - **Train/Test split**: 80%/20% - **Average entities per sample**: 4.3 ### Annotation Guidelines - authors ::= "Иванов А.А., Петров Б.Б." - title ::= "Введение в машинное обучение" - publisher ::= "Издательство МГУ" | "Springer" - year ::= "2020" | "1999 г." - pages ::= "с. 123-145" | "pp. 45-67"