Celina's picture

Celina

celinah

AI & ML interests

inference, on-device and image generation

Recent Activity

reacted to m-ric's post with πŸ”₯ 6 days ago
After 6 years, BERT, the workhorse of encoder models, finally gets a replacement: π—ͺ𝗲𝗹𝗰𝗼𝗺𝗲 π— π—Όπ—±π—²π—Ώπ—»π—•π—˜π—₯𝗧! πŸ€— We talk a lot about ✨Generative AI✨, meaning "Decoder version of the Transformers architecture", but this is only one of the ways to build LLMs: encoder models, that turn a sentence in a vector, are maybe even more widely used in industry than generative models. The workhorse for this category has been BERT since its release in 2018 (that's prehistory for LLMs). It's not a fancy 100B parameters supermodel (just a few hundred millions), but it's an excellent workhorse, kind of a Honda Civic for LLMs. Many applications use BERT-family models - the top models in this category cumulate millions of downloads on the Hub. ➑️ Now a collaboration between Answer.AI and LightOn just introduced BERT's replacement: ModernBERT. π—§π—Ÿ;𝗗π—₯: πŸ›οΈ Architecture changes: β‡’ First, standard modernizations: - Rotary positional embeddings (RoPE) - Replace GeLU with GeGLU, - Use Flash Attention 2 ✨ The team also introduced innovative techniques like alternating attention instead of full attention, and sequence packing to get rid of padding overhead. πŸ₯‡ As a result, the model tops the game of encoder models: It beats previous standard DeBERTaV3 for 1/5th the memory footprint, and runs 4x faster! Read the blog post πŸ‘‰ https://huggingface.co/blog/modernbert
upvoted a paper 6 days ago
Qwen2.5 Technical Report
View all activity

Organizations

Hugging Face's profile picture Hugging Face OSS Metrics's profile picture Blog-explorers's profile picture Hugging Face for Computer Vision's profile picture MLX Community's profile picture Social Post Explorers's profile picture open/ acc's profile picture DDUF's profile picture

celinah's activity

reacted to m-ric's post with πŸ”₯ 6 days ago
view post
Post
1632
After 6 years, BERT, the workhorse of encoder models, finally gets a replacement: π—ͺ𝗲𝗹𝗰𝗼𝗺𝗲 π— π—Όπ—±π—²π—Ώπ—»π—•π—˜π—₯𝗧! πŸ€—

We talk a lot about ✨Generative AI✨, meaning "Decoder version of the Transformers architecture", but this is only one of the ways to build LLMs: encoder models, that turn a sentence in a vector, are maybe even more widely used in industry than generative models.

The workhorse for this category has been BERT since its release in 2018 (that's prehistory for LLMs).

It's not a fancy 100B parameters supermodel (just a few hundred millions), but it's an excellent workhorse, kind of a Honda Civic for LLMs.

Many applications use BERT-family models - the top models in this category cumulate millions of downloads on the Hub.

➑️ Now a collaboration between Answer.AI and LightOn just introduced BERT's replacement: ModernBERT.

π—§π—Ÿ;𝗗π—₯:
πŸ›οΈ Architecture changes:
β‡’ First, standard modernizations:
- Rotary positional embeddings (RoPE)
- Replace GeLU with GeGLU,
- Use Flash Attention 2
✨ The team also introduced innovative techniques like alternating attention instead of full attention, and sequence packing to get rid of padding overhead.

πŸ₯‡ As a result, the model tops the game of encoder models:
It beats previous standard DeBERTaV3 for 1/5th the memory footprint, and runs 4x faster!

Read the blog post πŸ‘‰ https://huggingface.co/blog/modernbert
  • 1 reply
Β·
reacted to FranckAbgrall's post with πŸ”₯ 6 days ago
view post
Post
1045
πŸ†• It should now be easier to identify discussions or pull requests where repository owners are participating on HF, let us know it that helps πŸ’¬πŸ€—
  • 1 reply
Β·