PhoBERT Model for Vietnamese Poem Analysis
This model was fine-tuned from vinai/phobert-base on kienhoang123/Vietnamese_Poem_Analysis_VN to analyze Vietnamese poetry across multiple dimensions.
Model Details
- Base Model: vinai/phobert-base
- Training Data: Vietnamese poem analysis dataset
- Architecture: Custom PhoBERT with multiple classification heads
- Tasks: Multi-label classification for:
- Emotion detection
- Metaphor identification
- Setting analysis
- Motion detection
- Prompt presence
Model Architecture
The model extends PhoBERT with 5 binary classification heads, each predicting the presence/absence of specific poetic elements.
Usage
⚠️ Important: This model uses a custom architecture. You need to define the model class before loading:
import torch
import torch.nn as nn
from transformers import AutoTokenizer, AutoModel
class PhoBERTForPoetryAnalysis(nn.Module):
def __init__(self, bert_model_name):
super().__init__()
self.encoder = AutoModel.from_pretrained(bert_model_name)
hidden_size = self.encoder.config.hidden_size
# Classification heads
self.emotion_classifier = nn.Linear(hidden_size, 1)
self.metaphor_classifier = nn.Linear(hidden_size, 1)
self.setting_classifier = nn.Linear(hidden_size, 1)
self.motion_classifier = nn.Linear(hidden_size, 1)
self.prompt_classifier = nn.Linear(hidden_size, 1)
self.dropout = nn.Dropout(0.1)
def forward(self, input_ids=None, attention_mask=None, labels=None, return_dict=None):
outputs = self.encoder(input_ids=input_ids, attention_mask=attention_mask, return_dict=True)
pooled_output = outputs.last_hidden_state[:, 0]
pooled_output = self.dropout(pooled_output)
emotion_logits = self.emotion_classifier(pooled_output)
metaphor_logits = self.metaphor_classifier(pooled_output)
setting_logits = self.setting_classifier(pooled_output)
motion_logits = self.motion_classifier(pooled_output)
prompt_logits = self.prompt_classifier(pooled_output)
all_logits = torch.cat([
emotion_logits, metaphor_logits, setting_logits,
motion_logits, prompt_logits
], dim=1)
return {"logits": all_logits}
# Load the model
tokenizer = AutoTokenizer.from_pretrained("kienhoang123/PhoBERT_Poem_Analysis_Instruct")
model = PhoBERTForPoetryAnalysis("vinai/phobert-base")
# Load the fine-tuned weights
model.load_state_dict(torch.load("pytorch_model.bin", map_location='cpu'))
model.eval()
# Example usage
poem = "Your Vietnamese poem here"
instruction = "Nhiệm vụ: Tạo cảm xúc, ẩn dụ, bối cảnh, chuyển động và gợi ý cho nội dung sau.\nNội dung: " + poem
inputs = tokenizer(instruction, return_tensors="pt", padding=True, truncation=True, max_length=128)
with torch.no_grad():
outputs = model(**inputs)
logits = outputs["logits"]
predictions = torch.sigmoid(logits) > 0.5
# Interpret results
fields = ["emotion", "metaphor", "setting", "motion", "prompt"]
results = {}
for i, field in enumerate(fields):
results[field] = predictions[0][i].item()
print(results)
Training Details
- Base Model: vinai/phobert-base
- Fine-tuning approach: Multi-task learning with binary classification heads
- Input format: Instruction + poem content
- Output: Binary predictions for 5 poetic elements
Citation
If you use this model, please cite the original PhoBERT paper:
@inproceedings{phobert,
title = {PhoBERT: Pre-trained language models for Vietnamese},
author = {Dat Quoc Nguyen and Anh Tuan Nguyen},
booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2020},
year = {2020},
pages = {1037--1042}
}
- Downloads last month
- 8
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for kienhoang123/PhoBERT_Poem_Analysis_Instruct
Base model
vinai/phobert-base