PhoBERT Model for Vietnamese Poem Analysis

This model was fine-tuned from vinai/phobert-base on kienhoang123/Vietnamese_Poem_Analysis_VN to analyze Vietnamese poetry across multiple dimensions.

Model Details

Base Model: vinai/phobert-base
Training Data: Vietnamese poem analysis dataset
Architecture: Custom PhoBERT with multiple classification heads
Tasks: Multi-label classification for:
- Emotion detection
- Metaphor identification
- Setting analysis
- Motion detection
- Prompt presence

Model Architecture

The model extends PhoBERT with 5 binary classification heads, each predicting the presence/absence of specific poetic elements.

Usage

⚠️ Important: This model uses a custom architecture. You need to define the model class before loading:

import torch
import torch.nn as nn
from transformers import AutoTokenizer, AutoModel

class PhoBERTForPoetryAnalysis(nn.Module):
    def __init__(self, bert_model_name):
        super().__init__()
        self.encoder = AutoModel.from_pretrained(bert_model_name)
        hidden_size = self.encoder.config.hidden_size
        
        # Classification heads
        self.emotion_classifier = nn.Linear(hidden_size, 1)
        self.metaphor_classifier = nn.Linear(hidden_size, 1)
        self.setting_classifier = nn.Linear(hidden_size, 1)
        self.motion_classifier = nn.Linear(hidden_size, 1)
        self.prompt_classifier = nn.Linear(hidden_size, 1)
        self.dropout = nn.Dropout(0.1)
        
    def forward(self, input_ids=None, attention_mask=None, labels=None, return_dict=None):
        outputs = self.encoder(input_ids=input_ids, attention_mask=attention_mask, return_dict=True)
        pooled_output = outputs.last_hidden_state[:, 0]
        pooled_output = self.dropout(pooled_output)
        
        emotion_logits = self.emotion_classifier(pooled_output)
        metaphor_logits = self.metaphor_classifier(pooled_output)
        setting_logits = self.setting_classifier(pooled_output)
        motion_logits = self.motion_classifier(pooled_output)
        prompt_logits = self.prompt_classifier(pooled_output)
        
        all_logits = torch.cat([
            emotion_logits, metaphor_logits, setting_logits, 
            motion_logits, prompt_logits
        ], dim=1)
        
        return {"logits": all_logits}

# Load the model
tokenizer = AutoTokenizer.from_pretrained("kienhoang123/PhoBERT_Poem_Analysis_Instruct")
model = PhoBERTForPoetryAnalysis("vinai/phobert-base")

# Load the fine-tuned weights
model.load_state_dict(torch.load("pytorch_model.bin", map_location='cpu'))
model.eval()

# Example usage
poem = "Your Vietnamese poem here"
instruction = "Nhiệm vụ: Tạo cảm xúc, ẩn dụ, bối cảnh, chuyển động và gợi ý cho nội dung sau.\nNội dung: " + poem

inputs = tokenizer(instruction, return_tensors="pt", padding=True, truncation=True, max_length=128)

with torch.no_grad():
    outputs = model(**inputs)
    
logits = outputs["logits"]
predictions = torch.sigmoid(logits) > 0.5

# Interpret results
fields = ["emotion", "metaphor", "setting", "motion", "prompt"]
results = {}
for i, field in enumerate(fields):
    results[field] = predictions[0][i].item()
    
print(results)

Training Details

Base Model: vinai/phobert-base
Fine-tuning approach: Multi-task learning with binary classification heads
Input format: Instruction + poem content
Output: Binary predictions for 5 poetic elements

Citation

If you use this model, please cite the original PhoBERT paper:

@inproceedings{phobert,
    title = {PhoBERT: Pre-trained language models for Vietnamese},
    author = {Dat Quoc Nguyen and Anh Tuan Nguyen},
    booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2020},
    year = {2020},
    pages = {1037--1042}
}

kienhoang123
/

PhoBERT_Poem_Analysis_Instruct