--- license: apache-2.0 language: - tr base_model: - dbmdz/bert-base-turkish-cased pipeline_tag: token-classification tags: - e-commerce - ner - named-entity-recognition - bert - nlp --- # Turkish BERT for Aspect Term Extraction This model is a fine-tuned version of [dbmdz/bert-base-turkish-cased](https://huggingface.co/dbmdz/bert-base-turkish-cased) specifically trained for aspect term extraction from Turkish e-commerce product reviews. ## Model Description - **Base Model**: dbmdz/bert-base-turkish-cased - **Task**: Token Classification (Aspect Term Extraction) - **Language**: Turkish - **Domain**: E-commerce product reviews ## Model Performance - **F1 Score**: 83% on test set - **Test Set Size**: 2,000 samples - **Training Set Size**: ~16,000 samples ## Training Details ### Training Data - **Dataset Size**: 16,000 reviews - **Data Source**: Private e-commerce product review dataset - **Domain**: E-commerce product reviews in Turkish - **Coverage**: Over 500 product categories ### Training Configuration - **Epochs**: 5 - **Task Type**: Token Classification - **Label Scheme**: BIO tagging - `B-ASPECT`: Beginning of an aspect term - `I-ASPECT`: Inside/continuation of an aspect term - `O`: Outside (not an aspect term) ### Training Loss The model showed consistent improvement across epochs: | Epoch | Loss | |-------|--------| | 1 | 0.1758 | | 2 | 0.1749 | | 3 | 0.1217 | | 4 | 0.1079 | | 5 | 0.0699 | ## Usage ### Option 1: Using Pipeline ```python from transformers import AutoTokenizer, AutoModelForTokenClassification from transformers import pipeline # Load model and tokenizer tokenizer = AutoTokenizer.from_pretrained("opdullah/bert-turkish-ecomm-aspect-extraction") model = AutoModelForTokenClassification.from_pretrained("opdullah/bert-turkish-ecomm-aspect-extraction") # Create pipeline aspect_extractor = pipeline("token-classification", model=model, tokenizer=tokenizer, aggregation_strategy="simple") # Example usage text = "Bu telefonun kamerası çok iyi ama bataryası yetersiz." results = aspect_extractor(text) print(results) ``` **Expected Output:** ```python [{'entity_group': 'ASPECT', 'score': 0.99498886, 'word': 'kamerası', 'start': 13, 'end': 21}, {'entity_group': 'ASPECT', 'score': 0.9970175, 'word': 'bataryası', 'start': 34, 'end': 43}] ``` ### Option 2: Manual Inference ```python import torch from transformers import AutoTokenizer, AutoModelForTokenClassification # Load model and tokenizer tokenizer = AutoTokenizer.from_pretrained("opdullah/bert-turkish-ecomm-aspect-extraction") model = AutoModelForTokenClassification.from_pretrained("opdullah/bert-turkish-ecomm-aspect-extraction") # Example text text = "Bu telefonun kamerası çok iyi ama bataryası yetersiz." # Tokenize input inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True) # Get predictions with torch.no_grad(): outputs = model(**inputs) predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) predicted_class_ids = predictions.argmax(dim=-1) # Convert predictions to labels tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0]) predicted_labels = [model.config.id2label[class_id.item()] for class_id in predicted_class_ids[0]] # Display results for token, label in zip(tokens, predicted_labels): if token not in ['[CLS]', '[SEP]', '[PAD]']: print(f"{token}: {label}") ``` **Expected Output:** ``` Bu: O telefonun: O kamerası: B-ASPECT çok: O iyi: O ama: O batarya: B-ASPECT ##sı: I-ASPECT yetersiz: O .: O ``` ### Option 3: Batch Inference ```python import torch from transformers import AutoTokenizer, AutoModelForTokenClassification # Load model and tokenizer tokenizer = AutoTokenizer.from_pretrained("opdullah/bert-turkish-ecomm-aspect-extraction") model = AutoModelForTokenClassification.from_pretrained("opdullah/bert-turkish-ecomm-aspect-extraction") # Example texts for batch processing texts = [ "Bu telefonun kamerası çok iyi ama bataryası yetersiz.", "Ürünün fiyatı uygun ancak kalitesi düşük.", "Teslimat hızı mükemmel, ambalaj da gayet sağlam." ] # Tokenize all texts inputs = tokenizer(texts, return_tensors="pt", truncation=True, padding=True) # Get predictions for all texts with torch.no_grad(): outputs = model(**inputs) predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) predicted_class_ids = predictions.argmax(dim=-1) # Process results for each text for i, text in enumerate(texts): print(f"\nText {i+1}: {text}") print("-" * 50) # Get tokens for this specific text tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][i]) predicted_labels = [model.config.id2label[class_id.item()] for class_id in predicted_class_ids[i]] # Display results for token, label in zip(tokens, predicted_labels): if token not in ['[CLS]', '[SEP]', '[PAD]']: print(f"{token}: {label}") ``` **Expected Output:** **Text 1:** Bu telefonun kamerası çok iyi ama bataryası yetersiz. ``` Bu: O telefonun: O kamerası: B-ASPECT çok: O iyi: O ama: O batarya: B-ASPECT ##sı: I-ASPECT yetersiz: O .: O ``` **Text 2:** Ürünün fiyatı uygun ancak kalitesi düşük. ``` Ürünün: O fiyatı: B-ASPECT uygun: O ancak: O kalitesi: B-ASPECT düşük: O .: O ``` **Text 3:** Teslimat hızı mükemmel, ambalaj da gayet sağlam. ``` Teslim: B-ASPECT ##at: I-ASPECT hızı: I-ASPECT mükemmel: O ,: O ambalaj: B-ASPECT da: O gayet: O sağlam: O .: O ``` ## Label Mapping ```python id2label = { 0: "O", 1: "B-ASPECT", 2: "I-ASPECT" } label2id = { "O": 0, "B-ASPECT": 1, "I-ASPECT": 2 } ``` ## Intended Use This model is designed for: - Extracting aspect terms from Turkish e-commerce product reviews - Identifying product features and attributes mentioned in reviews - Supporting aspect-based sentiment analysis pipelines ## Limitations - Trained specifically on e-commerce domain data - Performance may vary on other domains or text types - Limited to Turkish language - Based on private dataset, so reproducibility may be limited ## Citation If you use this model, please cite: ``` @misc{turkish-bert-aspect-extraction, title={Turkish BERT for Aspect Term Extraction}, author={Abdullah Koçak}, year={2025}, url={https://huggingface.co/opdullah/bert-turkish-ecomm-aspect-extraction} } ``` ## Base Model Citation ``` @misc{schweter2020bertbase, title={BERTurk - BERT models for Turkish}, author={Stefan Schweter}, year={2020}, publisher={Hugging Face}, url={https://huggingface.co/dbmdz/bert-base-turkish-cased} } ```