Nathan Habib

SaylorTwift

AI & ML interests

None yet

Recent Activity

Organizations

Hugging Face's profile picture Evaluation datasets's profile picture Hugging Test Lab's profile picture HuggingFaceGECLM's profile picture BigCode's profile picture Hugging Face H4's profile picture BigCode Data's profile picture Hugging Face Smol Cluster's profile picture Open LLM Leaderboard's profile picture huggingPartyParis's profile picture Qwen's profile picture gg-hf's profile picture Nanotron Research's profile picture FineData's profile picture HF-contamination-detection's profile picture Top Contributors: Dataset Downloads's profile picture hsramall's profile picture La Leaderboard's profile picture gg-tt's profile picture HuggingFaceEval's profile picture Novel Challenge's profile picture LLHF's profile picture SLLHF's profile picture lbhf's profile picture Lighteval testing org's profile picture Coordination Nationale pour l'IA's profile picture open-llm-leaderboard-react's profile picture Prompt Leaderboard's profile picture wut?'s profile picture Your Bench's profile picture Open R1's profile picture gg-hf-g's profile picture OpenEvals's profile picture arc-agi-community's profile picture LightEval Internal Testing's profile picture

SaylorTwift's activity

reacted to loubnabnl's post with ❀️ 16 days ago
reacted to AdinaY's post with πŸš€ 16 days ago
view post
Post
1839
Data quality is the new frontier for LLM performance.

Ultra-FineWeb πŸ“Š a high-quality bilingual dataset released by OpenBMB

openbmb/Ultra-FineWeb

✨ MIT License
✨ 1T English + 120B Chinese tokens
✨ Efficient model-driven filtering
  • 2 replies
Β·
reacted to codelion's post with πŸš€ 16 days ago
view post
Post
2385
Introducing Pivotal Token Search (PTS): A new technique for targeted LLM alignment

Excited to share Pivotal Token Search (PTS), a technique for identifying and optimizing critical decision points in LLM generations!

GitHub repository: https://github.com/codelion/pts

What is PTS?
PTS helps identify specific "pivotal tokens" that dramatically shift the probability of a successful generation. Unlike traditional DPO which treats all tokens equally, PTS focuses optimization on the tokens that actually matter for success.

Inspired by Microsoft's recent Phi-4 paper (which used this technique to achieve SOTA reasoning with only 14B parameters), PTS is especially effective for:
- Mathematical reasoning
- Coding tasks
- Multi-step problem solving
- Any domain where specific decision points strongly impact outcomes

What we're releasing today: codelion/pivotal-token-search-68241145d8b8502122f3ce4f

1. Open-source code:
- Complete implementation of the PTS algorithm
- Data generation pipelines
- Usage examples and documentation

2. Huggingface resources:
- Datasets collection: https://huggingface.co/datasets?other=pts
* Pre-generated preference pairs for various domains
* Ready to use in your DPO training pipelines

- Models collection: https://huggingface.co/models?other=pts
* Pre-trained models fine-tuned with PTS
* Specialized versions for different reasoning tasks

The algorithm is straightforward to implement and can significantly improve your model's reasoning capabilities. Check out the repository for details on getting started!

We welcome feedback, contributions, and collaborations. Let us know if you use PTS in your projects!
reacted to fdaudens's post with πŸ‘€ 19 days ago
view post
Post
786
Hey! I built an AI Agent to query the FOIA API for a workshop at the Hacks/Hackers Summit in Baltimore and you can do it too!

It’s a quick proof of concept to demo what agents can do, how to design workflows, and how to approach the coding side. TWant a fun project to learn how AI agents work? I built one that queries the FOIA API β€” and you can too!

It's a quick proof of concept I did for a workshop at the Hacks/Hackers Summit in Baltimore, demonstrating what agents can do, how to design workflows, and approaches to coding them.

- Slides https://docs.google.com/presentation/d/1lbf5K0yi213N7uxGnVKJdGWq2i0GayWj4vIcLkVlwD8/edit?usp=sharing
- Colab notebook https://colab.research.google.com/drive/1iw0qZyTni_6BcK0jj1x6gTfjm85NlaGv
- Gradio app: https://huggingface.co/spaces/JournalistsonHF/foia-agent
- MCP version to plug into Claude, Cursor, etc: https://huggingface.co/spaces/JournalistsonHF/foia-mcp-tools

Feel free to use the Gradio app for real FOIA requests, but also to improve it (I'm far from being a good coder) or adapt it for other countries.

And shout-out to everyone who powered through the workshop! πŸ˜…
  • 1 reply
Β·
reacted to clem's post with πŸš€πŸ€— about 1 month ago
view post
Post
2975
Just crossed half a million public apps on Hugging Face. A new public app is created every minute these days 🀯🀯🀯

What's your favorite? http://hf.co/spaces
  • 3 replies
Β·
reacted to clem's post with πŸ‘πŸ”₯ 3 months ago
view post
Post
5942
Super happy to welcome Nvidia as our latest enterprise hub customer. They have almost 2,000 team members using Hugging Face, and close to 20,000 followers of their org. Can't wait to see what they'll open-source for all of us in the coming months!

Nvidia's org: nvidia
Enterprise hub: https://huggingface.co/enterprise
reacted to elliesleightholm's post with πŸ€— 6 months ago
posted an update 6 months ago
reacted to Symbol-LLM's post with πŸ”₯ 6 months ago
view post
Post
1163
πŸ₯³ Thrilled to introduce our recent efforts on bootstrapping VLMs for multi-modal chain-of-thought reasoning !

πŸ“• Title: Vision-Language Models Can Self-Improve Reasoning via Reflection

πŸ”— Link: Vision-Language Models Can Self-Improve Reasoning via Reflection (2411.00855)

πŸ˜‡Takeaways:

- We found that VLMs can self-improve reasoning performance through a reflection mechanism, and importantly, this approach can scale through test-time computing.

- Evaluation on comprehensive and diverse Vision-Language reasoning tasks are included !
reacted to cfahlgren1's post with ❀️ 6 months ago
view post
Post
3253
You can clean and format datasets entirely in the browser with a few lines of SQL.

In this post, I replicate the process @mlabonne used to clean the new microsoft/orca-agentinstruct-1M-v1 dataset.

The cleaning process consists of:
- Joining the separate splits together / add split column
- Converting string messages into list of structs
- Removing empty system prompts

https://huggingface.co/blog/cfahlgren1/the-beginners-guide-to-cleaning-a-dataset

Here's his new cleaned dataset: mlabonne/orca-agentinstruct-1M-v1-cleaned
  • 1 reply
Β·
replied to rizzware's post 9 months ago
view reply

Hi! Lighteval makes it easy to compare model enhancements, such as different prompting or fine-tuning. You can change the prompts for a given task or even create a new task using a different prompt, generation size, stop words, etc.
All you need to create a new task is listed in the lighteval readme.
Do you have a more specific use case in mind so that we can eventually help you ?

reacted to yunusserhat's post with πŸš€ 10 months ago
view post
Post
3329
Hello everyone,

I am pleased to announce that I have founded the University of Glasgow organization on Huggingface. If you are affiliated with the University of Glasgow or have a relative who is, you can log in through the relevant link.

UniversityofGlasgow
  • 1 reply
Β·
reacted to fdaudens's post with ❀️ 11 months ago
view post
Post
1897
Look at that πŸ‘€

Actual benchmarks have become too easy for recent models, much like grading high school students on middle school problems makes little sense. So the team worked on a new version of the Open LLM Leaderboard with new benchmarks.

Stellar work from @clefourrier @SaylorTwift and the team!

πŸ‘‰ Read the blog post: open-llm-leaderboard/blog
πŸ‘‰ Explore the leaderboard: open-llm-leaderboard/open_llm_leaderboard
  • 1 reply
Β·
reacted to fdaudens's post with πŸ‘ 12 months ago
view post
Post
3281
Finally, a good handwriting recognition tool?

I'm impressed by Microsoft's latest vision model, Florence-2 microsoft/Florence-2-large

The results are really good, boasting a remarkably low error rate, as you can see with this letter from George W. Bush to Bill Clinton!

πŸš€πŸ”’ What’s even better? You can run it locally on your device, ensuring your data stays 100% safe.

πŸ‘‰ Try it out here: gokaygokay/Florence-2
  • 1 reply
Β·
reacted to MrOvkill's post with πŸ”₯ 12 months ago
view post
Post
3331
Hello!

I've made a little evaluation dataset for LLMs that require advanced and convoluted logical reasoning. It's composed of 81 unique paradoxes, with admittedly a couple in the same category ( absolutes. ) It's available here: MrOvkill/pdox

**Update**: I have upgraded the dataset to v3, ( don't worry about v2, it can be forgotten... ) and placed in a separate repo here:
MrOvkill/pdox-reversed

Enjoy & Have fun!
-<3
Β·
reacted to thomwolf's post with πŸš€πŸ”₯ about 1 year ago
view post
Post
4624
[New crazy blog post alert] We are releasing an extensive blog post on the science of creating high quality web-scale datasets, detailing all the steps and learnings that came in our recent 15 trillion tokens 🍷FineWeb release

Inspired by the distill.pub interactive graphics papers, we settled to write the most extensive, enjoyable and in-depth tech report we could draft on so prepare for a 45-mmin read with interactive graphics and all.

And it's not all, in this article we also introduce πŸ“šFineWeb-Edu a filtered subset of Common Crawl with 1.3T tokens containing only web pages with very high educational content. Up to our knowledge, FineWeb-Edu out-performs all openly release web-scale datasets by a significant margin on knowledge- and reasoning-intensive benchmarks like MMLU, ARC, and OpenBookQA

We also make a number of surprising observations on the "quality" of the internet it-self which may challenge some of the general assumptions on web data (not saying more, I'll let you draw your conclusions ;)

HuggingFaceFW/blogpost-fineweb-v1
  • 1 reply
Β·
reacted to clem's post with πŸ€— about 1 year ago