Granite Data Collection This collection has a set of artifacts which are related to curating and evaluating datasets used for Granite models • 16 items • Updated 16 minutes ago • 3
Granite Data Collection This collection has a set of artifacts which are related to curating and evaluating datasets used for Granite models • 16 items • Updated 16 minutes ago • 3
Granite Data Collection This collection has a set of artifacts which are related to curating and evaluating datasets used for Granite models • 16 items • Updated 16 minutes ago • 3
Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence Paper • 2502.09927 • Published 14 days ago
Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping Paper • 2501.06589 • Published Jan 11
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published 24 days ago • 195
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published 24 days ago • 195
view post Post 2136 Tried my hand at simplifying the derivations of Direct Preference Optimization.I cover how one can reformulate RLHF into DPO. The idea of implicit reward modeling is chef's kiss.Blog: https://huggingface.co/blog/ariG23498/rlhf-to-dpo See translation 👍 4 4 + Reply
view post Post 1936 Timm ❤️ TransformersWtih the latest version of transformers you can now use any timm model with the familiar transformers API.Blog Post: https://huggingface.co/blog/timm-transformersRepository with examples: https://github.com/ariG23498/timm-wrapper-examplesCollection: ariG23498/timmwrapper-6777b85f1e8d085d3f1374a1 See translation 🚀 10 10 + Reply
Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models Paper • 2409.04787 • Published Sep 7, 2024 • 1
Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations Paper • 2403.06009 • Published Mar 9, 2024
view post Post 5310 VLMs are going through quite an open revolution AND on-device friendly sizes:1. Google DeepMind w/ PaliGemma2 - 3B, 10B & 28B: google/paligemma-2-release-67500e1e1dbfdd4dee27ba482. OpenGVLabs w/ InternVL 2.5 - 1B, 2B, 4B, 8B, 26B, 38B & 78B: https://huggingface.co/collections/OpenGVLab/internvl-25-673e1019b66e2218f68d7c1c3. Qwen w/ Qwen 2 VL - 2B, 7B & 72B: Qwen/qwen2-vl-66cee7455501d7126940800d4. Microsoft w/ FlorenceVL - 3B & 8B: https://huggingface.co/jiuhai5. Moondream2 w/ 0.5B: https://huggingface.co/vikhyatk/What a time to be alive! 🔥 See translation 🔥 11 11 🚀 4 4 + Reply