Court

PickleOnMayo

courtashdale

AI & ML interests

None yet

Recent Activity

replied to m-ric's post 3 days ago

Diffusion LLMs are coming for autoregressive LLMs ⚡️⚡️ Inception Labs' new diffusion model demolishes all leading LLMs on generation speed, with equal quality ! Inception Labs was founded a few months ago, and they're not sleeping: after dropping a code model, they just published Mercury chat, a diffusion-based chat model that reaches 1000 tokens / second on H100, i.e. 10x more than models of equivalent performance on the same hardware! What's the breakthrough? Well instead, of generating tokens left-to-right like the more common autoregressive LLMs, diffusion models generate their blocks of text all at once, and successive steps refine the whole text. Diffusion models being really fast at isn't new, we have had some promising results on this by Google already back in May with Gemini Diffusion, and Mercury themselves had already published their coding model a few months ago But being that good quality is new - and now Inception Labs just proved that their models work well in chat too, which could have been challenging given that's streaming generation is well suited to left-to-right generation. They have a playground available at chat dot inceptionlabs dot ai, I recommend giving it a try!

reacted to m-ric's post with 🚀 3 days ago

View all activity

Organizations

None yet

replied to m-ric's post 3 days ago

That is so cool! Woah that's fast, you really see it on the playground!

reacted to m-ric's post with 🚀🚀 3 days ago

Post

2509

Diffusion LLMs are coming for autoregressive LLMs ⚡️⚡️ Inception Labs' new diffusion model demolishes all leading LLMs on generation speed, with equal quality !

Inception Labs was founded a few months ago, and they're not sleeping: after dropping a code model, they just published Mercury chat, a diffusion-based chat model that reaches 1000 tokens / second on H100, i.e. 10x more than models of equivalent performance on the same hardware!

What's the breakthrough? Well instead, of generating tokens left-to-right like the more common autoregressive LLMs, diffusion models generate their blocks of text all at once, and successive steps refine the whole text.

Diffusion models being really fast at isn't new, we have had some promising results on this by Google already back in May with Gemini Diffusion, and Mercury themselves had already published their coding model a few months ago

But being that good quality is new - and now Inception Labs just proved that their models work well in chat too, which could have been challenging given that's streaming generation is well suited to left-to-right generation.

They have a playground available at chat dot inceptionlabs dot ai, I recommend giving it a try!