Model Card

This is a domain-adapted version of
dbmdz/bert-base-turkish-cased.
We continued masked-language pre-training on the open-source
yeniguno/turkish_agriculture_corpus
to bias the model toward Turkish agricultural vocabulary and discourse while retaining its general-language abilities.

How to Get Started with the Model

import torch
from transformers import AutoModelForMaskedLM, AutoTokenizer

model_checkpoint = "yeniguno/bert-turkish-agriculture-mlm"

model = AutoModelForMaskedLM.from_pretrained(model_checkpoint)
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

text = "Sabah kahvaltıda babam, köyde bu hafta [MASK] hazırlığının başlayacağını söyledi."

inputs = tokenizer(text, return_tensors="pt")
token_logits = model(**inputs).logits

mask_token_index = torch.where(inputs["input_ids"] == tokenizer.mask_token_id)[1]
mask_token_logits = token_logits[0, mask_token_index, :]

# Pick the [MASK] candidates with the highest logits
top_5_tokens = torch.topk(mask_token_logits, 5, dim=1).indices[0].tolist()

for token in top_5_tokens:
    print(f"'>>> {text.replace(tokenizer.mask_token, tokenizer.decode([token]))}'")
Downloads last month
19
Safetensors
Model size
111M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yeniguno/bert-turkish-agriculture-mlm

Finetuned
(198)
this model

Dataset used to train yeniguno/bert-turkish-agriculture-mlm