Code Llama

company
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

Narsilย 
posted an update 13 days ago
view post
Post
1385
Me: This function is too slow. Find a faster algorithm.
Cursor: Hold my beer.

Me: *Slacking off with colleagues*
Cursor: Ping.

Me: ๐Ÿคฏ

loubnabnlย 
posted an update about 1 month ago
andrewrreedย 
posted an update 6 months ago
view post
Post
2964
๐Ÿš€ Supercharge your LLM apps with Langfuse on Hugging Face Spaces!

Langfuse brings end-to-end observability and tooling to accelerate your dev workflow from experiments through production

Now available as a Docker Space directly on the HF Hub! ๐Ÿค—

๐Ÿ” Trace everything: monitor LLM calls, retrieval, and agent actions with popular frameworks
1โƒฃ One-click deployment: on Spaces with persistent storage and integrated OAuth
๐Ÿ›  Simple Prompt Management: Version, edit, and update without redeployment
โœ… Intuitive Evals: Collect user feedback, run model/prompt evaluations, and improve quality
๐Ÿ“Š Dataset Creation: Build datasets directly from production data to enhance future performance

Kudos to the Langfuse team for this collab and the awesome, open-first product theyโ€™re building! ๐Ÿ‘ @marcklingen @Clemo @MJannik

๐Ÿ”— Space: langfuse/langfuse-template-space
๐Ÿ”— Docs: https://huggingface.co/docs/hub/spaces-sdks-docker-langfuse
  • 1 reply
ยท
Narsilย 
posted an update 6 months ago
view post
Post
1710
Performance leap: TGI v3 is out. Processes 3x more tokens, 13x faster than vLLM on long prompts. Zero config !



3x more tokens.

By reducing our memory footprint, weโ€™re able to ingest many more tokens and more dynamically than before. A single L4 (24GB) can handle 30k tokens on llama 3.1-8B, while vLLM gets barely 10k. A lot of work went into reducing the footprint of the runtime and its effect are best seen on smaller constrained environments.
13x faster

On long prompts (200k+ tokens) conversation replies take 27.5s in vLLM, while it takes only 2s in TGI. How so ? We keep the initial conversation around, so when a new reply comes in, we can answer almost instantly. The overhead of the lookup is ~5us. Thanks @Dani รซl de Kok for the beast data structure.
Zero config

Thatโ€™s it. Remove all the flags your are using and youโ€™re likely to get the best performance. By evaluating the hardware and model, TGI carefully selects automatic values to give best performance. In production, we donโ€™t have any flags anymore in our deployments. We kept all existing flags around, they may come in handy in niche scenarios.

Read more: https://huggingface.co/docs/text-generation-inference/conceptual/chunking
loubnabnlย 
posted an update 7 months ago
view post
Post
3712
Making SmolLM2 reproducible: open-sourcing our training & evaluation toolkit ๐Ÿ› ๏ธ https://github.com/huggingface/smollm/

- Pre-training code with nanotron
- Evaluation suite with lighteval
- Synthetic data generation using distilabel (powers our new SFT dataset HuggingFaceTB/smoltalk)
- Post-training scripts with TRL & the alignment handbook
- On-device tools with llama.cpp for summarization, rewriting & agents

Apache 2.0 licensed. V2 pre-training data mix coming soon!

Which other tools should we add next?
andrewrreedย 
posted an update 7 months ago
view post
Post
1090
Trace LLM calls with Arize AI's Phoenix observability dashboards on Hugging Face Spaces! ๐Ÿš€

โœจ I just added a new recipe to the Open-Source AI Cookbook that shows you how to:
1๏ธโƒฃ Deploy Phoenix on HF Spaces with persistent storage in a few clicks
2๏ธโƒฃ Configure LLM tracing with the ๐—ฆ๐—ฒ๐—ฟ๐˜ƒ๐—ฒ๐—ฟ๐—น๐—ฒ๐˜€๐˜€ ๐—œ๐—ป๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐—ฐ๐—ฒ ๐—”๐—ฃ๐—œ
3๏ธโƒฃ Observe multi-agent application runs with the CrewAI integration

๐—ข๐—ฏ๐˜€๐—ฒ๐—ฟ๐˜ƒ๐—ฎ๐—ฏ๐—ถ๐—น๐—ถ๐˜๐˜† ๐—ถ๐˜€ ๐—ฐ๐—ฟ๐˜‚๐—ฐ๐—ถ๐—ฎ๐—น for building robust LLM apps.

Phoenix makes it easy to visualize trace data, evaluate performance, and track down issues. Give it a try!

๐Ÿ”— Cookbook recipe: https://huggingface.co/learn/cookbook/en/phoenix_observability_on_hf_spaces
๐Ÿ”— Phoenix docs: https://docs.arize.com/phoenix
ArthurZย 
posted an update 7 months ago
view post
Post
4458
Native tensor parallel has landed in transformers!!! https://github.com/huggingface/transformers/pull/34184 thanks a lot to the torch team for their support!

Contributions are welcome to support more models! ๐Ÿ”ฅ
loubnabnlย 
posted an update about 1 year ago
view post
Post
5895
๐Ÿท FineWeb technical report is out and so is ๐Ÿ“š FineWeb-Edu, a 1.3 trillion tokens dataset that outperforms all other open web datasets, with remarkable improvements on educational benchmarksย such as MMLU, ARC, and OpenBookQA.

Technical report: HuggingFaceFW/blogpost-fineweb-v1
Dataset: HuggingFaceFW/fineweb-edu

We used Llama 3 generations to train an educational quality classifier, filtering the 15 trillion tokens of FineWeb to select only those with high educational value (an approach also used in Llama 3 and Phi-3 training datasets). We're releasing both FineWeb-Edu and the classifier, along with a larger, less heavily filtered version containing 5.4 trillion tokens.

You can find more details about the dataset and the experiments we ran in the FineWeb technical report, It's a 45-minute read but it contains all the secret sauce for building high quality web datasets.

Enjoy!