See transformers で複数のトークナイザーを一つのプロセッサーで扱う.
https://zenn.dev/platina/articles/732feb7c3e9852
Example usage
from transformers import AutoProcessor
processor = AutoProcessor.from_pretrained(
"p1atdev/multi-tokenizers-processor-sample",
trust_remote_code=True,
commit_hash="111e8a30609fb5bc13e16d08f7c49196b23d5056"
)
print(processor(
text_1="テキスト1",
text_2="テキスト2",
))
# {'input_ids': tensor([[ 1, 43412, 28745]]), 'attention_mask': tensor([[1, 1, 1]]), 'input_ids_2': tensor([[56833, 61803, 70534, 17]]), 'attention_mask_2': tensor([[1, 1, 1, 1]])}