SaylorTwift (Nathan Habib)

reacted to eliebak's post with 🔥 14 days ago

Post

4668

Kimi K2 tech report is full of gems as always. Here are my notes on it:

> MuonClip: Pretty crazy how after 70k the training stabilizes and the QK-clip is basically inactive. There is also no loss in perf with QK-clip which is not trivial at all (at small scale but with aggressive threshold). Also a cool explanation of why muon makes the logit explode in appendix E (tl;dr is that muon makes the singular value of the update matrix higher)
> Sparsity scaling laws to justify their ratio, they have a very solid training infra that allows the model to be trained at this sparsity level, they could have increased even more but as sparsity increases the training becomes less efficient.
> They diminish the number of attention heads to make it more efficient for long context since attention heads are a big bottleneck for long context. They also remove 2 of the 3 "first dense" layers in the dsv3 arch.

With the sparsity and attention heads (divided by 2) they achieve 83% increased flops compared to deepseek v3 arch at 128k.

> Data: Rephrasing is KEY. They do a lot more synthetic data generation and rephrase their corpus to have different styles, for longer documents they do it by chunk. I'm (half) surprised by the fact that ONLY 1 epoch (assuming same number of training tokens I think?) of data rephrased 10 times has better accuracy than 10 epochs of the same data rephrased once.
> They do rewriting for Math and Knowledge, for Math they apply the ShallowMath recipe and instruct the model to rephrase in a "learning note" style
> They talk about diversity and probably have some internal stuff/eval to test that, as always still a bit unclear for me how to properly measure that.

The infra is also very nice, quick summary:
> PP=16 (1F1B schedule, a bit custom), EP=16, zero1
> No FP8 computation but for storage of specific layers, selective recomputation for inexpensive block, activation offloading to CPU

reacted to nicolay-r's post with 🔥 about 2 months ago

Post

3549

🚀 For those who interested in summarization of the long textual reports in medical domain 📝🩺, @Xiaolihai and I delighted to share that we experiment with distillation tuning adaptation for Qwen-2.5 0.5B. We use reports from the MultiClinSum dataset and pass it through 72B version to retrieve report explanations in order to initiate ditillation tuning for 0.5B model. We experiment with passages written in English, French, Portuguese, and Spanish.

🔑 We find that using distil-technique results in 2-4% performance increment on fine-tuning and similar improvements for reports in English (non-official and official evaluation). For the other it results in systems that perform similar to the convential tuning (standard) (see result below).

Dataset: https://zenodo.org/records/15459174
Competition: https://participants-area.bioasq.org/general_information/MultiClinSum/
Github: https://github.com/nicolay-r/distil-tuning-llm
model: nicolay-r/qwen25-05b-multiclinsum-distil

3 replies

·

reacted to loubnabnl's post with ❤️ 3 months ago

Post

3911

SmolVLM is now available on PocketPal — you can run it offline on your smartphone to interpret the world around you. 🌍📱

And check out this real-time camera demo by @ngxson , powered by llama.cpp:
https://github.com/ngxson/smolvlm-realtime-webcam
https://x.com/pocketpal_ai

3 replies

·

reacted to AdinaY's post with 🚀 3 months ago

Post

1850

Data quality is the new frontier for LLM performance.

Ultra-FineWeb 📊 a high-quality bilingual dataset released by OpenBMB

openbmb/Ultra-FineWeb

✨ MIT License
✨ 1T English + 120B Chinese tokens
✨ Efficient model-driven filtering

2 replies

·

reacted to codelion's post with 🚀 3 months ago

Post

2487

Introducing Pivotal Token Search (PTS): A new technique for targeted LLM alignment

Excited to share Pivotal Token Search (PTS), a technique for identifying and optimizing critical decision points in LLM generations!

GitHub repository: https://github.com/codelion/pts

What is PTS?
PTS helps identify specific "pivotal tokens" that dramatically shift the probability of a successful generation. Unlike traditional DPO which treats all tokens equally, PTS focuses optimization on the tokens that actually matter for success.

Inspired by Microsoft's recent Phi-4 paper (which used this technique to achieve SOTA reasoning with only 14B parameters), PTS is especially effective for:
- Mathematical reasoning
- Coding tasks
- Multi-step problem solving
- Any domain where specific decision points strongly impact outcomes

What we're releasing today: codelion/pivotal-token-search-68241145d8b8502122f3ce4f

1. Open-source code:
- Complete implementation of the PTS algorithm
- Data generation pipelines
- Usage examples and documentation

2. Huggingface resources:
- Datasets collection: https://huggingface.co/datasets?other=pts
* Pre-generated preference pairs for various domains
* Ready to use in your DPO training pipelines

- Models collection: https://huggingface.co/models?other=pts
* Pre-trained models fine-tuned with PTS
* Specialized versions for different reasoning tasks

The algorithm is straightforward to implement and can significantly improve your model's reasoning capabilities. Check out the repository for details on getting started!

We welcome feedback, contributions, and collaborations. Let us know if you use PTS in your projects!

reacted to fdaudens's post with 👀 3 months ago

Post

813

Hey! I built an AI Agent to query the FOIA API for a workshop at the Hacks/Hackers Summit in Baltimore and you can do it too!

It’s a quick proof of concept to demo what agents can do, how to design workflows, and how to approach the coding side. TWant a fun project to learn how AI agents work? I built one that queries the FOIA API — and you can too!

It's a quick proof of concept I did for a workshop at the Hacks/Hackers Summit in Baltimore, demonstrating what agents can do, how to design workflows, and approaches to coding them.

- Slides https://docs.google.com/presentation/d/1lbf5K0yi213N7uxGnVKJdGWq2i0GayWj4vIcLkVlwD8/edit?usp=sharing
- Colab notebook https://colab.research.google.com/drive/1iw0qZyTni_6BcK0jj1x6gTfjm85NlaGv
- Gradio app: https://huggingface.co/spaces/JournalistsonHF/foia-agent
- MCP version to plug into Claude, Cursor, etc: https://huggingface.co/spaces/JournalistsonHF/foia-mcp-tools

Feel free to use the Gradio app for real FOIA requests, but also to improve it (I'm far from being a good coder) or adapt it for other countries.

And shout-out to everyone who powered through the workshop! 😅

1 reply

·

reacted to clem's post with 🚀🤗 4 months ago

Post

3020

Just crossed half a million public apps on Hugging Face. A new public app is created every minute these days 🤯🤯🤯

What's your favorite? http://hf.co/spaces

3 replies

·

reacted to clem's post with 👍🔥 6 months ago

Post

5963

Super happy to welcome Nvidia as our latest enterprise hub customer. They have almost 2,000 team members using Hugging Face, and close to 20,000 followers of their org. Can't wait to see what they'll open-source for all of us in the coming months!

Nvidia's org:

nvidia
Enterprise hub: https://huggingface.co/enterprise

reacted to elliesleightholm's post with 🤗 9 months ago

Post

2822

I made a beginners guide to Hugging Face Spaces 🤗 I hope it's useful to some of you :)

YouTube video: https://www.youtube.com/watch?v=xqdTFyRdtjQ

Blog: https://www.marqo.ai/blog/how-to-create-a-hugging-face-space

8 replies

·

posted an update 9 months ago

Post

907

How do I test an LLM for my unique needs?
If you work in finance, law, or medicine, generic benchmarks are not enough.
This blog post uses Argilla, Distilllabel and 🌤️Lighteval to generate evaluation dataset and evaluate models.

https://github.com/argilla-io/argilla-cookbook/blob/main/domain-eval/README.md

reacted to Symbol-LLM's post with 🔥 9 months ago

Post

1218

🥳 Thrilled to introduce our recent efforts on bootstrapping VLMs for multi-modal chain-of-thought reasoning !

📕 Title: Vision-Language Models Can Self-Improve Reasoning via Reflection

🔗 Link: Vision-Language Models Can Self-Improve Reasoning via Reflection (2411.00855)

😇Takeaways:

- We found that VLMs can self-improve reasoning performance through a reflection mechanism, and importantly, this approach can scale through test-time computing.

- Evaluation on comprehensive and diverse Vision-Language reasoning tasks are included !

reacted to cfahlgren1's post with ❤️ 9 months ago

Post

3265

You can clean and format datasets entirely in the browser with a few lines of SQL.

In this post, I replicate the process @mlabonne used to clean the new microsoft/orca-agentinstruct-1M-v1 dataset.

The cleaning process consists of:
- Joining the separate splits together / add split column
- Converting string messages into list of structs
- Removing empty system prompts

https://huggingface.co/blog/cfahlgren1/the-beginners-guide-to-cleaning-a-dataset

Here's his new cleaned dataset: mlabonne/orca-agentinstruct-1M-v1-cleaned

1 reply

·

replied to rizzware's post 12 months ago

Hi! Lighteval makes it easy to compare model enhancements, such as different prompting or fine-tuning. You can change the prompts for a given task or even create a new task using a different prompt, generation size, stop words, etc.
All you need to create a new task is listed in the lighteval readme.
Do you have a more specific use case in mind so that we can eventually help you ?

reacted to yunusserhat's post with 🚀 about 1 year ago

Post

3385

Hello everyone,

I am pleased to announce that I have founded the University of Glasgow organization on Huggingface. If you are affiliated with the University of Glasgow or have a relative who is, you can log in through the relevant link.

UniversityofGlasgow

1 reply

·

reacted to fdaudens's post with ❤️ about 1 year ago

Post

1899

Look at that 👀

Actual benchmarks have become too easy for recent models, much like grading high school students on middle school problems makes little sense. So the team worked on a new version of the Open LLM Leaderboard with new benchmarks.

Stellar work from @clefourrier @SaylorTwift and the team!

👉 Read the blog post: open-llm-leaderboard/blog
👉 Explore the leaderboard: open-llm-leaderboard/open_llm_leaderboard

1 reply

·

reacted to fdaudens's post with 👍 about 1 year ago

Post

3533

Finally, a good handwriting recognition tool?

I'm impressed by Microsoft's latest vision model, Florence-2 microsoft/Florence-2-large

The results are really good, boasting a remarkably low error rate, as you can see with this letter from George W. Bush to Bill Clinton!

🚀🔒 What’s even better? You can run it locally on your device, ensuring your data stays 100% safe.

👉 Try it out here: gokaygokay/Florence-2

1 reply

·

reacted to MrOvkill's post with 🔥 about 1 year ago

Post

3332

Hello!

I've made a little evaluation dataset for LLMs that require advanced and convoluted logical reasoning. It's composed of 81 unique paradoxes, with admittedly a couple in the same category ( absolutes. ) It's available here: MrOvkill/pdox

**Update**: I have upgraded the dataset to v3, ( don't worry about v2, it can be forgotten... ) and placed in a separate repo here:
MrOvkill/pdox-reversed

Enjoy & Have fun!
-<3

12 replies

·

reacted to thomwolf's post with 🚀 about 1 year ago

Post

4637

[New crazy blog post alert] We are releasing an extensive blog post on the science of creating high quality web-scale datasets, detailing all the steps and learnings that came in our recent 15 trillion tokens 🍷FineWeb release

Inspired by the distill.pub interactive graphics papers, we settled to write the most extensive, enjoyable and in-depth tech report we could draft on so prepare for a 45-mmin read with interactive graphics and all.

And it's not all, in this article we also introduce 📚FineWeb-Edu a filtered subset of Common Crawl with 1.3T tokens containing only web pages with very high educational content. Up to our knowledge, FineWeb-Edu out-performs all openly release web-scale datasets by a significant margin on knowledge- and reasoning-intensive benchmarks like MMLU, ARC, and OpenBookQA

We also make a number of surprising observations on the "quality" of the internet it-self which may challenge some of the general assumptions on web data (not saying more, I'll let you draw your conclusions ;)

HuggingFaceFW/blogpost-fineweb-v1

1 reply

·

Nathan Habib

AI & ML interests

Recent Activity

Organizations

Nathan Habib

AI & ML interests

Recent Activity

Organizations

SaylorTwift's activity