Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
AdinaYΒ 
posted an update 5 days ago
Post
1800
Data quality is the new frontier for LLM performance.

Ultra-FineWeb πŸ“Š a high-quality bilingual dataset released by OpenBMB

openbmb/Ultra-FineWeb

✨ MIT License
✨ 1T English + 120B Chinese tokens
✨ Efficient model-driven filtering

wheres the data in the dataset... lol

Β·

coming soon