Text Generation
Transformers
Safetensors
English
olmoe
conversational
Inference Endpoints

try this little model with the problems in this repository -> https://github.com/cpldcpu/MisguidedAttention

#3
by maxgreco - opened

It performs much better than bigger models (reasoning and not) and very fast at fp16 (over 65 token/s) with a Tesla P40 24 gb! I'm impressed!

Sign up or log in to comment