Post
2211
Llama 4 Maverick got worse scores than Llama 3.1 405B in human alignment.
I used CPU for inferencing from this size of a model (402B), and it ran fast. Being a mixture of experts it may be useful for CPU inference and having a big context useful for RAG. For beneficial answers there are other alternatives.
Still it managed to beat Grok 3. I had so much expectations for Grok 3 because X is holding more beneficial ideas in my opinion.
It got worse health scores compared to 3.1 and better bitcoin scores. I could post some comparisons of answers between the two. With which model should I publish comparisons? Llama 3.1 or Grok 3 or something else?
https://sheet.zohopublic.com/sheet/published/mz41j09cc640a29ba47729fed784a263c1d08
I used CPU for inferencing from this size of a model (402B), and it ran fast. Being a mixture of experts it may be useful for CPU inference and having a big context useful for RAG. For beneficial answers there are other alternatives.
Still it managed to beat Grok 3. I had so much expectations for Grok 3 because X is holding more beneficial ideas in my opinion.
It got worse health scores compared to 3.1 and better bitcoin scores. I could post some comparisons of answers between the two. With which model should I publish comparisons? Llama 3.1 or Grok 3 or something else?
https://sheet.zohopublic.com/sheet/published/mz41j09cc640a29ba47729fed784a263c1d08