Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
10
1
22
Makar Vlasov
Makar7
Follow
ubergarm's profile picture
Gargaz's profile picture
mohamedemov's profile picture
5 followers
·
28 following
AI & ML interests
None yet
Recent Activity
replied
to
mlabonne
's
post
about 12 hours ago
https://huggingface.co/LiquidAI open-sources a new generation of edge LLMs! 🥳 Based on a new hybrid architecture, these 350M, 700M, and 1.2B models are both fast and performant, ideal for on-device deployment. I recommend fine-tuning them to power your next edge application. We already provide Colab notebooks to guide you. More to come soon! 📝 Blog post: https://www.liquid.ai/blog/liquid-foundation-models-v2-our-second-series-of-generative-ai-models 🤗 Models: https://huggingface.co/collections/LiquidAI/lfm2-686d721927015b2ad73eaa38
reacted
to
Ruurd
's
post
with 🔥
about 1 month ago
The past year I have been trying to get diffusion models to work for language generation, without having to retrain a LLM from scratch. And recently, we finally succeeded: We introduce "LAD: LoRA-Adapted Denoiser", a method to convert a LLaMA model into a text diffusion model using LoRA finetuning and structured input corruption. 🎯 Try the demo and read the write-up here! https://ruurdkuiper.github.io/tini-lad/ Unlike autoregressive (word-for-word) models like ChatGPT, diffusion models iteratively refine a noised sequence. However, most current diffusion approaches rely on all-parameter retraining and repeatedly remasking tokens, which is costly and slow during both training and inference! 🧠 With LAD: - We can finetune an autoregressive model for diffusive generation in just 10 hours on a single GPU. - Test-time compute is fully adjustable: fewer steps means faster outputs while more steps improve output quality. - Due to our unique noising schedule, remasking is not always needed during inference. All tokens are attended to in each iteration! 🔍 LAD is built using: – A frozen LLaMA-8B backbone – Structured noising: token swaps, duplications, replacements, span shifts – Modified attention masks for bidirectional decoding 💡 We show that even small, fast-trained models can perform diffusive generation — with competitive benchmark performance, perplexity and more flexible test-time behavior than traditional transformers.
liked
a model
about 1 month ago
deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
View all activity
Organizations
None yet
models
0
None public yet
datasets
0
None public yet