IMHO dense layer is not necessary for token classification as described in https://qiita.com/KoichiYasuoka/items/751c02216a65d105d3d2

Thanks, I will test it today :)

I ran some experiments, and I am seeing a performance degregation with this solution:

Configuration Run 1 Run 2 Run 3 Run 4 Run 5 Avg.
bs=16,e=10,lr=1e-05 95.71 95.42 95.53 95.56 95.43 95.53
bs=16,e=10,lr=1e-05 - No Dense 95.24 95.23 95.01 94.98 95.18 95.12

Thank you for trying and now I understand that dense makes it better...

KoichiYasuoka changed pull request status to closed

Sign up or log in to comment