Update README.md
Browse files
README.md
CHANGED
@@ -58,7 +58,7 @@ This model was trained on the following datasets:
|
|
58 |
|
59 |
This model was trained in two main phases:
|
60 |
- Knesset based pre-training - over all ~4700h of data - 3 epochs, ~54h run
|
61 |
-
- Mixed post-training over all crowd-transcribe-v5 (
|
62 |
- Interleaving of datasets with sampling probs: (0.9, 0.025, 0.075) respectively
|
63 |
- Note that crowd-transcribe-v5 has about 5x shorter samples on average thus the over-sampling.
|
64 |
|
|
|
58 |
|
59 |
This model was trained in two main phases:
|
60 |
- Knesset based pre-training - over all ~4700h of data - 3 epochs, ~54h run
|
61 |
+
- Mixed post-training over all crowd-transcribe-v5 (300h), crowd-recital-whisper-training (50h) and highest-quality filtered knessets data (150h) - 1 epoch
|
62 |
- Interleaving of datasets with sampling probs: (0.9, 0.025, 0.075) respectively
|
63 |
- Note that crowd-transcribe-v5 has about 5x shorter samples on average thus the over-sampling.
|
64 |
|