There are so many that I'm still debugging it since 2 days. I might come up with something new later on.
For the time being, I found these pitfalls:
tokenizer = Wav2Vec2CTCTokenizer.from_pretrained("./", unk_token="[UNK]", pad_token="[PAD]", word_delimiter_token="|")
is wrong. This raises an exception if you don't still have a tokenizer.
2)You don't pass the vocab_file
as input, the tokenizer is not pretrained (as at that point you don't yet have it ). It should be
# TODO: make sure that tokenizer is restored if pretrained
tokenizer = transformers.Wav2Vec2CTCTokenizer(
vocab_file=vocab_path,
unk_token = "[UNK]",
pad_token = "[PAD]",
word_delimiter_token = "|"
)
but what do I know.
tokenizer=processor.feature_extractor
is deprecated,processing_class
is needed.there is no fine tuning here: we're simply retraining everything (which is not the most intelligent thing to do: we're training from scratch the CTC head but with the same lr finetuning on the whole model).
Overall, it's not entirely your fault, since you're working with the badly organized and terribly documented HuggingFace library. You clearly didn't test the code, which is kind of a bummer. Maybe some of the errors are not due to your code, but you could have tested it before posting this.