Turkish BERT for Aspect Term Extraction

This model is a fine-tuned version of dbmdz/bert-base-turkish-cased specifically trained for aspect term extraction from Turkish e-commerce product reviews.

Model Description

  • Base Model: dbmdz/bert-base-turkish-cased
  • Task: Token Classification (Aspect Term Extraction)
  • Language: Turkish
  • Domain: E-commerce product reviews

Model Performance

  • F1 Score: 83% on test set
  • Test Set Size: 2,000 samples
  • Training Set Size: ~16,000 samples

Training Details

Training Data

  • Dataset Size: 16,000 reviews
  • Data Source: Private e-commerce product review dataset
  • Domain: E-commerce product reviews in Turkish
  • Coverage: Over 500 product categories

Training Configuration

  • Epochs: 5
  • Task Type: Token Classification
  • Label Scheme: BIO tagging
    • B-ASPECT: Beginning of an aspect term
    • I-ASPECT: Inside/continuation of an aspect term
    • O: Outside (not an aspect term)

Training Loss

The model showed consistent improvement across epochs:

Epoch Loss
1 0.1758
2 0.1749
3 0.1217
4 0.1079
5 0.0699

Usage

Option 1: Using Pipeline

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("opdullah/bert-turkish-ecomm-aspect-extraction")
model = AutoModelForTokenClassification.from_pretrained("opdullah/bert-turkish-ecomm-aspect-extraction")

# Create pipeline
aspect_extractor = pipeline("token-classification", 
                           model=model, 
                           tokenizer=tokenizer,
                           aggregation_strategy="simple")

# Example usage
text = "Bu telefonun kamerası çok iyi ama bataryası yetersiz."
results = aspect_extractor(text)
print(results)

Expected Output:

[{'entity_group': 'ASPECT', 'score': 0.99498886, 'word': 'kamerası', 'start': 13, 'end': 21}, 
 {'entity_group': 'ASPECT', 'score': 0.9970175, 'word': 'bataryası', 'start': 34, 'end': 43}]

Option 2: Manual Inference

import torch
from transformers import AutoTokenizer, AutoModelForTokenClassification

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("opdullah/bert-turkish-ecomm-aspect-extraction")
model = AutoModelForTokenClassification.from_pretrained("opdullah/bert-turkish-ecomm-aspect-extraction")

# Example text
text = "Bu telefonun kamerası çok iyi ama bataryası yetersiz."

# Tokenize input
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)

# Get predictions
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_class_ids = predictions.argmax(dim=-1)

# Convert predictions to labels
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
predicted_labels = [model.config.id2label[class_id.item()] for class_id in predicted_class_ids[0]]

# Display results
for token, label in zip(tokens, predicted_labels):
    if token not in ['[CLS]', '[SEP]', '[PAD]']:
        print(f"{token}: {label}")

Expected Output:

Bu: O
telefonun: O
kamerası: B-ASPECT
çok: O
iyi: O
ama: O
batarya: B-ASPECT
##sı: I-ASPECT
yetersiz: O
.: O

Option 3: Batch Inference

import torch
from transformers import AutoTokenizer, AutoModelForTokenClassification

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("opdullah/bert-turkish-ecomm-aspect-extraction")
model = AutoModelForTokenClassification.from_pretrained("opdullah/bert-turkish-ecomm-aspect-extraction")

# Example texts for batch processing
texts = [
    "Bu telefonun kamerası çok iyi ama bataryası yetersiz.",
    "Ürünün fiyatı uygun ancak kalitesi düşük.",
    "Teslimat hızı mükemmel, ambalaj da gayet sağlam."
]

# Tokenize all texts
inputs = tokenizer(texts, return_tensors="pt", truncation=True, padding=True)

# Get predictions for all texts
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_class_ids = predictions.argmax(dim=-1)

# Process results for each text
for i, text in enumerate(texts):
    print(f"\nText {i+1}: {text}")
    print("-" * 50)
    
    # Get tokens for this specific text
    tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][i])
    predicted_labels = [model.config.id2label[class_id.item()] for class_id in predicted_class_ids[i]]
    
    # Display results
    for token, label in zip(tokens, predicted_labels):
        if token not in ['[CLS]', '[SEP]', '[PAD]']:
            print(f"{token}: {label}")

Expected Output:

Text 1: Bu telefonun kamerası çok iyi ama bataryası yetersiz.

Bu: O
telefonun: O
kamerası: B-ASPECT
çok: O
iyi: O
ama: O
batarya: B-ASPECT
##sı: I-ASPECT
yetersiz: O
.: O

Text 2: Ürünün fiyatı uygun ancak kalitesi düşük.

Ürünün: O
fiyatı: B-ASPECT
uygun: O
ancak: O
kalitesi: B-ASPECT
düşük: O
.: O

Text 3: Teslimat hızı mükemmel, ambalaj da gayet sağlam.

Teslim: B-ASPECT
##at: I-ASPECT
hızı: I-ASPECT
mükemmel: O
,: O
ambalaj: B-ASPECT
da: O
gayet: O
sağlam: O
.: O

Label Mapping

id2label = {
    0: "O",
    1: "B-ASPECT", 
    2: "I-ASPECT"
}

label2id = {
    "O": 0,
    "B-ASPECT": 1,
    "I-ASPECT": 2
}

Intended Use

This model is designed for:

  • Extracting aspect terms from Turkish e-commerce product reviews
  • Identifying product features and attributes mentioned in reviews
  • Supporting aspect-based sentiment analysis pipelines

Limitations

  • Trained specifically on e-commerce domain data
  • Performance may vary on other domains or text types
  • Limited to Turkish language
  • Based on private dataset, so reproducibility may be limited

Citation

If you use this model, please cite:

@misc{turkish-bert-aspect-extraction,
  title={Turkish BERT for Aspect Term Extraction},
  author={Abdullah Koçak},
  year={2025},
  url={https://huggingface.co/opdullah/bert-turkish-ecomm-aspect-extraction}
}

Base Model Citation

@misc{schweter2020bertbase,
  title={BERTurk - BERT models for Turkish},
  author={Stefan Schweter},
  year={2020},
  publisher={Hugging Face},
  url={https://huggingface.co/dbmdz/bert-base-turkish-cased}
}
Downloads last month
15
Safetensors
Model size
110M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for opdullah/bert-turkish-ecomm-aspect-extraction

Finetuned
(198)
this model

Collection including opdullah/bert-turkish-ecomm-aspect-extraction