|
--- |
|
language: |
|
- en |
|
metrics: |
|
- accuracy |
|
tags: |
|
- sklearn |
|
- machine learning |
|
- movie-genre-prediction |
|
- multi-class classification |
|
--- |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
The goal of the competition is to design a predictive model that accurately classifies movies into their respective genres based on their titles and synopses. |
|
|
|
The model takes in inputs such as movie_name and synopsis as a whole string and outputs the predicted genre of the movie. |
|
|
|
|
|
|
|
- **Developed by:** [Shalaka Thorat] |
|
- **Shared by:** [Data Driven Science- Movie Genre Prediction Contest: competitions/movie-genre-prediction] |
|
- **Language:** [Python] |
|
- **Tags:** [Python, NLP, Sklearn, NLTK, Machine Learning, Multi-class Classification, Supervised Learning] |
|
|
|
### Model Sources |
|
|
|
- **Repository:** [competitions/movie-genre-prediction] |
|
|
|
## Training Details |
|
|
|
We have used Multinomial Naive Bayes Algorithm to work well with Sparse Vectorized data, which consists of movie_name and synopsis. |
|
The output of the model is a class (out of 10 classes) of the genre. |
|
|
|
### Training Data |
|
|
|
All the Training and Test Data can be found here: |
|
|
|
[competitions/movie-genre-prediction] |
|
|
|
#### Preprocessing |
|
|
|
1) Label Encoding |
|
2) Tokenization |
|
3) TF-IDF Vectorization |
|
4) Preprocessing of digits, special characters, symbols, extra spaces and stop words from textual data |
|
|
|
## Evaluation |
|
|
|
The evaluation metric used is [Accuracy] as specified in the competition. |
|
|
|
|