---
license: apache-2.0
language:
- tr
base_model:
- dbmdz/bert-base-turkish-cased
pipeline_tag: token-classification
tags:
- e-commerce
- ner
- named-entity-recognition
- bert
- nlp
---
# Turkish BERT for Aspect Term Extraction

This model is a fine-tuned version of [dbmdz/bert-base-turkish-cased](https://huggingface.co/dbmdz/bert-base-turkish-cased) specifically trained for aspect term extraction from Turkish e-commerce product reviews.

## Model Description

- **Base Model**: dbmdz/bert-base-turkish-cased
- **Task**: Token Classification (Aspect Term Extraction)
- **Language**: Turkish
- **Domain**: E-commerce product reviews

## Model Performance

- **F1 Score**: 83% on test set
- **Test Set Size**: 2,000 samples
- **Training Set Size**: ~16,000 samples

## Training Details

### Training Data
- **Dataset Size**: 16,000 reviews
- **Data Source**: Private e-commerce product review dataset
- **Domain**: E-commerce product reviews in Turkish
- **Coverage**: Over 500 product categories

### Training Configuration
- **Epochs**: 5
- **Task Type**: Token Classification
- **Label Scheme**: BIO tagging
  - `B-ASPECT`: Beginning of an aspect term
  - `I-ASPECT`: Inside/continuation of an aspect term
  - `O`: Outside (not an aspect term)

### Training Loss
The model showed consistent improvement across epochs:

| Epoch | Loss   |
|-------|--------|
| 1     | 0.1758 |
| 2     | 0.1749 |
| 3     | 0.1217 |
| 4     | 0.1079 |
| 5     | 0.0699 |

## Usage

### Option 1: Using Pipeline

```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("opdullah/bert-turkish-ecomm-aspect-extraction")
model = AutoModelForTokenClassification.from_pretrained("opdullah/bert-turkish-ecomm-aspect-extraction")

# Create pipeline
aspect_extractor = pipeline("token-classification", 
                           model=model, 
                           tokenizer=tokenizer,
                           aggregation_strategy="simple")

# Example usage
text = "Bu telefonun kamerası çok iyi ama bataryası yetersiz."
results = aspect_extractor(text)
print(results)
```

**Expected Output:**
```python
[{'entity_group': 'ASPECT', 'score': 0.99498886, 'word': 'kamerası', 'start': 13, 'end': 21}, 
 {'entity_group': 'ASPECT', 'score': 0.9970175, 'word': 'bataryası', 'start': 34, 'end': 43}]
```

### Option 2: Manual Inference

```python
import torch
from transformers import AutoTokenizer, AutoModelForTokenClassification

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("opdullah/bert-turkish-ecomm-aspect-extraction")
model = AutoModelForTokenClassification.from_pretrained("opdullah/bert-turkish-ecomm-aspect-extraction")

# Example text
text = "Bu telefonun kamerası çok iyi ama bataryası yetersiz."

# Tokenize input
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)

# Get predictions
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_class_ids = predictions.argmax(dim=-1)

# Convert predictions to labels
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
predicted_labels = [model.config.id2label[class_id.item()] for class_id in predicted_class_ids[0]]

# Display results
for token, label in zip(tokens, predicted_labels):
    if token not in ['[CLS]', '[SEP]', '[PAD]']:
        print(f"{token}: {label}")
```

**Expected Output:**
```
Bu: O
telefonun: O
kamerası: B-ASPECT
çok: O
iyi: O
ama: O
batarya: B-ASPECT
##sı: I-ASPECT
yetersiz: O
.: O
```

### Option 3: Batch Inference

```python
import torch
from transformers import AutoTokenizer, AutoModelForTokenClassification

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("opdullah/bert-turkish-ecomm-aspect-extraction")
model = AutoModelForTokenClassification.from_pretrained("opdullah/bert-turkish-ecomm-aspect-extraction")

# Example texts for batch processing
texts = [
    "Bu telefonun kamerası çok iyi ama bataryası yetersiz.",
    "Ürünün fiyatı uygun ancak kalitesi düşük.",
    "Teslimat hızı mükemmel, ambalaj da gayet sağlam."
]

# Tokenize all texts
inputs = tokenizer(texts, return_tensors="pt", truncation=True, padding=True)

# Get predictions for all texts
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_class_ids = predictions.argmax(dim=-1)

# Process results for each text
for i, text in enumerate(texts):
    print(f"\nText {i+1}: {text}")
    print("-" * 50)
    
    # Get tokens for this specific text
    tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][i])
    predicted_labels = [model.config.id2label[class_id.item()] for class_id in predicted_class_ids[i]]
    
    # Display results
    for token, label in zip(tokens, predicted_labels):
        if token not in ['[CLS]', '[SEP]', '[PAD]']:
            print(f"{token}: {label}")
```

**Expected Output:**

**Text 1:** Bu telefonun kamerası çok iyi ama bataryası yetersiz.
```
Bu: O
telefonun: O
kamerası: B-ASPECT
çok: O
iyi: O
ama: O
batarya: B-ASPECT
##sı: I-ASPECT
yetersiz: O
.: O
```

**Text 2:** Ürünün fiyatı uygun ancak kalitesi düşük.
```
Ürünün: O
fiyatı: B-ASPECT
uygun: O
ancak: O
kalitesi: B-ASPECT
düşük: O
.: O
```

**Text 3:** Teslimat hızı mükemmel, ambalaj da gayet sağlam.
```
Teslim: B-ASPECT
##at: I-ASPECT
hızı: I-ASPECT
mükemmel: O
,: O
ambalaj: B-ASPECT
da: O
gayet: O
sağlam: O
.: O
```

## Label Mapping

```python
id2label = {
    0: "O",
    1: "B-ASPECT", 
    2: "I-ASPECT"
}

label2id = {
    "O": 0,
    "B-ASPECT": 1,
    "I-ASPECT": 2
}
```

## Intended Use

This model is designed for:
- Extracting aspect terms from Turkish e-commerce product reviews
- Identifying product features and attributes mentioned in reviews
- Supporting aspect-based sentiment analysis pipelines

## Limitations

- Trained specifically on e-commerce domain data
- Performance may vary on other domains or text types  
- Limited to Turkish language
- Based on private dataset, so reproducibility may be limited

## Citation

If you use this model, please cite:

```
@misc{turkish-bert-aspect-extraction,
  title={Turkish BERT for Aspect Term Extraction},
  author={Abdullah Koçak},
  year={2025},
  url={https://huggingface.co/opdullah/bert-turkish-ecomm-aspect-extraction}
}
```

## Base Model Citation

```
@misc{schweter2020bertbase,
  title={BERTurk - BERT models for Turkish},
  author={Stefan Schweter},
  year={2020},
  publisher={Hugging Face},
  url={https://huggingface.co/dbmdz/bert-base-turkish-cased}
}
```