token limit - warning
Hi
as mentioned in Issue #2
"the model can't handle inputs of longer than 512 tokens "
is there a warning that i can get in cases i exceed the limit?
i split to sentences and in most cases its well withing the limit, but there are exceptions - any way to flag these exceptions before i run the "dictabert-morph" model ?
maybe running the tokenizer only (without the morphology) and is i reach 512 tokens i knpow i probably need to split before runing the morph model?
Right now the code automatically truncates the sentence to 512 tokens, if it exceeds the length.
A good solution would be to run the tokenizer on its own and see if the number tokens exceed 512 tokens.
Alternatively, if you have a preferred way which would need to be added into the interface, feel free to make the modifications and open a PR, we welcome contributions :)