Model Card for Model ID
This fine-tuned BERT model is a multilabel multiclass classifier designed to predict the genre of a movie based on its summary. It has been specifically trained to classify movies into one or more of the following genres: Drama, Action, Comedy, Animation, and Crime. The model leverages the capabilities of the BERT architecture to understand and interpret the nuances of movie summaries, providing accurate and potentially multiple genre predictions for each movie.
Model Details
Model Description
- Developed by: Sinanmz
- Model type: Multiclass Multilabel Classifier
- Language(s) (NLP): English
- License: MIT
- Finetuned from model: google-bert/bert-base-uncased
Model Sources
- Repository: https://github.com/Sinanmz/MIR
Uses
This BERT-based multilabel multiclass classifier is designed to predict the genre(s) of a movie based on its summary. It can be utilized in various applications, including but not limited to:
- Content Recommendation Systems: Enhancing the accuracy of movie recommendation engines by predicting genres from summaries, allowing for better personalization.
- Movie Cataloging: Assisting in the organization and tagging of movies in large databases or streaming platforms.
- Search Optimization: Improving search results by classifying movies into multiple genres, thereby providing more relevant hits for user queries.
- Content Filtering: Helping users find movies that match their preferences by identifying and categorizing movies into multiple genres.
Foreseeable Users:
- Streaming Services: To enhance content recommendation algorithms and search functionalities.
- Movie Database Administrators: To automate the process of tagging and organizing movies.
- Developers: Building applications that require genre classification from textual summaries.
Affected Parties:
- Viewers/Consumers: Benefiting from improved content recommendations and search results.
- Content Creators: Gaining better visibility through accurate classification and tagging of their work.
- Platform Operators: Improving user engagement and satisfaction with more personalized and accurate content delivery.
How to Get Started with the Model
Use the code below to get started with the model.
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained('Sinanmz/Movie_Genre_Classifier')
model = AutoModelForSequenceClassification.from_pretrained('Sinanmz/Movie_Genre_Classifier')
# Example movie summary (summary of Dune: Part Two)
movie_summary = """Paul Atreides unites with Chani and the Fremen while on a warpath of
revenge against the conspirators who destroyed his family. Facing a choice between the
love of his life and the fate of the known universe, he endeavors to prevent a terrible
future only he can foresee."""
# Tokenize the input
inputs = tokenizer(movie_summary, return_tensors="pt", truncation=True, padding=True)
# Get model predictions
outputs = model(**inputs)
logits = outputs.logits
# Convert logits to probabilities
probs = torch.sigmoid(logits)
# Print the predicted genres
genre_labels = ["Action", "Drama", "Comedy", "Animation", "Crime"]
predicted_genres = [genre_labels[i] for i in range(len(genre_labels)) if probs[0][i] >= 0.5]
print(f"Predicted genres: {predicted_genres}")
# Output:
# Predicted genres: ['Action', 'Drama']
Evaluation
Metrics
The evaluation metrics used for this model include precision, recall, and F1-score. These metrics were chosen because they provide a comprehensive view of the model's performance, particularly in a multilabel classification setting where it is important to understand not only how many correct predictions were made but also the balance between precision (accuracy of the positive predictions) and recall (the ability to find all positive instances).
Results
Summary
Below are the classification reports for the train, validation, and test splits of the dataset.
Classification Report for Train Split:
precision recall f1-score support
Action 1.00 1.00 1.00 1655
Drama 1.00 1.00 1.00 4109
Comedy 1.00 1.00 1.00 2094
Animation 1.00 1.00 1.00 669
Crime 1.00 1.00 1.00 1284
micro avg 1.00 1.00 1.00 9811
macro avg 1.00 1.00 1.00 9811
weighted avg 1.00 1.00 1.00 9811
samples avg 1.00 1.00 1.00 9811
Classification Report for Val Split:
precision recall f1-score support
Action 0.70 0.73 0.71 220
Drama 0.77 0.84 0.80 507
Comedy 0.69 0.54 0.61 260
Animation 0.59 0.44 0.50 80
Crime 0.72 0.66 0.69 165
micro avg 0.73 0.71 0.72 1232
macro avg 0.70 0.64 0.66 1232
weighted avg 0.72 0.71 0.71 1232
samples avg 0.75 0.74 0.71 1232
Classification Report for Test Split:
precision recall f1-score support
Action 0.62 0.66 0.64 191
Drama 0.80 0.85 0.82 520
Comedy 0.69 0.58 0.63 260
Animation 0.60 0.49 0.54 78
Crime 0.65 0.67 0.66 154
micro avg 0.72 0.71 0.72 1203
macro avg 0.67 0.65 0.66 1203
weighted avg 0.72 0.71 0.71 1203
samples avg 0.75 0.75 0.72 1203
The results indicate that the model performs well on the training data with high precision, recall, and F1-scores across all genres. However, there is a drop in performance on the validation and test splits, highlighting areas where the model could be further improved to generalize better to unseen data.
Model Card Authors
Sina Namazi
Model Card Contact
- Github: github.com/Sinanmz
- Hugging Face: huggingface.co/Sinanmz
- Downloads last month
- 6