ivrit-ai
/

whisper-large-v3

Automatic Speech Recognition

Model card Files Files and versions

yoad commited on May 22

Commit

766847c

·

verified ·

1 Parent(s): 2e368ff

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -58,7 +58,7 @@ This model was trained on the following datasets:
 This model was trained in two main phases:
 - Knesset based pre-training - over all ~4700h of data - 3 epochs, ~54h run
-- Mixed post-training over all crowd-transcribe-v5 (~300h), crowd-recital-whisper-training (~50h) and highest-quality filtered knessets data (~150h) - 1 epoch
  - Interleaving of datasets with sampling probs: (0.9, 0.025, 0.075) respectively
  - Note that crowd-transcribe-v5 has about 5x shorter samples on average thus the over-sampling.

 This model was trained in two main phases:
 - Knesset based pre-training - over all ~4700h of data - 3 epochs, ~54h run
+- Mixed post-training over all crowd-transcribe-v5 (300h), crowd-recital-whisper-training (50h) and highest-quality filtered knessets data (150h) - 1 epoch
  - Interleaving of datasets with sampling probs: (0.9, 0.025, 0.075) respectively
  - Note that crowd-transcribe-v5 has about 5x shorter samples on average thus the over-sampling.