view article Article The New and Fresh analytics in Inference Endpoints By erikkaum and 4 others • 13 days ago • 18
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM 22 days ago • 363
Running 2.4k 2.4k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
view article Article From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub Feb 12 • 55
view article Article Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference Jan 16 • 71
view article Article Train 400x faster Static Embedding Models with Sentence Transformers Jan 15 • 167
view post Post 1771 A while ago I started experimenting with compiling the Python interpreter to WASM.To build a secure, fast, and lightweight sandbox for code execution — ideal for running LLM-generated Python code.- Send code simply as a POST request- 1-2ms startup timesHack away:https://github.com/ErikKaum/runner 🔥 8 8 👀 6 6 + Reply
Running 61 61 Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks 📝 Evaluate multilingual models using FineTasks
view article Article Releasing Outlines-core 0.1.0: structured generation in Rust and Python By erikkaum and 6 others • Oct 22, 2024 • 44
view article Article Releasing Outlines-core 0.1.0: structured generation in Rust and Python By erikkaum and 6 others • Oct 22, 2024 • 44
view post Post 1104 This week in Inference Endpoints - thx @erikkaum for the update!👀 https://huggingface.co/blog/erikkaum/endpoints-changelog 1 reply · 🚀 1 1 👍 1 1 🔥 1 1 ❤️ 1 1 + Reply