Update README.md
Browse files
README.md
CHANGED
@@ -86,6 +86,10 @@ Compared to the 4.76M model, this larger variant shows significant improvements:
|
|
86 |
- **MIDI-Only**: Limited to MIDI format; cannot process audio recordings or sheet music
|
87 |
- **Cultural Bias**: Training data may reflect Western classical music traditions
|
88 |
|
|
|
|
|
|
|
|
|
89 |
### Recommendations for Use
|
90 |
- Validate results with musicological expertise, especially for Classical period identification
|
91 |
- Use confidence thresholds to filter low-confidence predictions
|
@@ -172,7 +176,11 @@ The following hyperparameters were used during training:
|
|
172 |
| 0.5582 | 2.9911 | 58000 | 1.0269 | 0.6593 | 0.5103 |
|
173 |
|
174 |
### Training Analysis
|
175 |
-
|
|
|
|
|
|
|
|
|
176 |
|
177 |
### Framework versions
|
178 |
|
|
|
86 |
- **MIDI-Only**: Limited to MIDI format; cannot process audio recordings or sheet music
|
87 |
- **Cultural Bias**: Training data may reflect Western classical music traditions
|
88 |
|
89 |
+
Below is the confusion matrix for the best-performing checkpoint, visually highlighting these misclassifications (click to enlarge):
|
90 |
+
|
91 |
+
[<img src="confusion_matrix_best.png" alt="Confusion Matrix" width="500"/>](confusion_matrix_best.png)
|
92 |
+
|
93 |
### Recommendations for Use
|
94 |
- Validate results with musicological expertise, especially for Classical period identification
|
95 |
- Use confidence thresholds to filter low-confidence predictions
|
|
|
176 |
| 0.5582 | 2.9911 | 58000 | 1.0269 | 0.6593 | 0.5103 |
|
177 |
|
178 |
### Training Analysis
|
179 |
+
Below is the full training metrics plot, showing loss, accuracy, and F1-score trends over the entire training process (click to enlarge):
|
180 |
+
|
181 |
+
[<img src="training_metrics.png" alt="Training Metrics" width="500"/>](training_metrics.png)
|
182 |
+
|
183 |
+
The training shows decent convergence with the model reaching its best performance around step 40,000 (epoch 2.06). The larger model capacity allows for faster learning and better final performance compared to the 4.76M variant. The training loss decreases more rapidly while validation metrics show stable improvement, indicating effective use of the increased model capacity without overfitting. The model achieves its peak F1 score of 0.5121 at step 40,000, which was selected as the best checkpoint.
|
184 |
|
185 |
### Framework versions
|
186 |
|