David Berenstein

davidberenstein1957

AI & ML interests

Everything NLP and knowledge graphs

Articles

Organizations

davidberenstein1957's activity

reacted to their post with πŸ‘€ 5 days ago
view post
Post
663
The Synthetic Data Generator now directly integrates with Argilla, so you can generate and curate your own high-quality datasets from pure natural language!

Up next -> include dataset generation for text classification.
Other suggestions? Let us know.

Space: argilla/synthetic-data-generator


reacted to their post with βž•β€οΈ 5 days ago
view post
Post
1663
You can now build a custom text classifier without days of human labeling!

πŸ‘ LLMs work reasonably well as text classifiers.
πŸ‘Ž They are expensive to run at scale and their performance drops in specialized domains.

πŸ‘ Purpose-built classifiers have low latency and can potentially run on CPU.
πŸ‘Ž They require labeled training data.

Combine the best of both worlds: the automatic labeling capabilities of LLMs and the high-quality annotations from human experts to train and deploy a specialized model.

Blog: https://huggingface.co/blog/sdiazlor/custom-text-classifier-ai-human-feedback
posted an update 5 days ago
view post
Post
1925
Import any dataset from the Hub and configure your labeling tasks without needing any code!

Really excited about extending the Hugging Face Hub integration with many more streamlined features and workflows, and we would love to hear your feedback and ideas, so don't feel shy and reach out 🫢🏽

https://huggingface.co/blog/argilla-ui-hub
reacted to their post with πŸ‘€πŸš€πŸ€— 6 days ago
view post
Post
2942
Vector Search (most) datasets on the Hugging Face Hub πŸ”¦

Powered by: Polars, DuckDB, Gradio and model2vec (lightning-fast embeddings by StΓ©phan Tulkens).

Should work fast enough for datasets up to 100K.

davidberenstein1957/vectorsearch-hub-datasets
posted an update 6 days ago
view post
Post
2942
Vector Search (most) datasets on the Hugging Face Hub πŸ”¦

Powered by: Polars, DuckDB, Gradio and model2vec (lightning-fast embeddings by StΓ©phan Tulkens).

Should work fast enough for datasets up to 100K.

davidberenstein1957/vectorsearch-hub-datasets
posted an update 11 days ago
view post
Post
1717
⚑️ LLMs do a good job at NER, but don't you want to do learn how to do more with less?

Go from 🐒 -> πŸ‡

If you want a small model to perform well on your problem, you need to fine-tune it.

Bootstrap with a teacher model.

Correct potential mistakes to get high-quality data.

Fine-tune your student model

Go more accurate and more efficient.

Free signup: https://lu.ma/zx2t7irs
posted an update 23 days ago
view post
Post
1663
You can now build a custom text classifier without days of human labeling!

πŸ‘ LLMs work reasonably well as text classifiers.
πŸ‘Ž They are expensive to run at scale and their performance drops in specialized domains.

πŸ‘ Purpose-built classifiers have low latency and can potentially run on CPU.
πŸ‘Ž They require labeled training data.

Combine the best of both worlds: the automatic labeling capabilities of LLMs and the high-quality annotations from human experts to train and deploy a specialized model.

Blog: https://huggingface.co/blog/sdiazlor/custom-text-classifier-ai-human-feedback
reacted to nroggendorff's post with 😎 23 days ago
view post
Post
1248
100 followers? When did that happen?
reacted to m-ric's post with πŸ‘€ 23 days ago
view post
Post
1678
By far the coolest release of the day!
> The Open LLM Leaderboard, most comprehensive suite for comparing Open LLMs on many benchmarks, just released a comparator tool that lets you dig into the detail of differences between any models.

Here's me checking how the new Llama-3.1-Nemotron-70B that we've heard so much compares to the original Llama-3.1-70B. πŸ€”πŸ”Ž

Try it out here πŸ‘‰ open-llm-leaderboard/comparator
  • 2 replies
Β·
posted an update 25 days ago
view post
Post
663
The Synthetic Data Generator now directly integrates with Argilla, so you can generate and curate your own high-quality datasets from pure natural language!

Up next -> include dataset generation for text classification.
Other suggestions? Let us know.

Space: argilla/synthetic-data-generator


posted an update 25 days ago
view post
Post
2491
Don't use an LLM when you can use a much cheaper model.

The problem is that no one tells you how to actually do it.

Just picking a pre-trained model (e.g., BERT) and throwing it at your problem won't work!

If you want a small model to perform well on your problem, you need to fine-tune it.

And to fine-tune it, you need data.

The good news is that you don't need a lot of data but instead high-quality data for your specific problem.

In the latest livestream, I showed you guys how to get started with Argilla on the Hub! Hope to see you at the next one.

https://www.youtube.com/watch?v=BEe7shiG3rY
posted an update about 1 month ago
view post
Post
1208
Thursday 10 October 17:00 CEST, I will show a good way to get started with a text classification project on the Hugging Face Hub with Argilla and Setfit.

Signup here: https://lu.ma/31mecp34
reacted to their post with πŸ”₯πŸ€— about 1 month ago
posted an update about 1 month ago
posted an update about 1 month ago
view post
Post
847
We've got a number of great community meetups coming up again where we'll be discussing the basics of getting started and using Argilla for TextCat, TokenCat/NER and RAG. We will walk you through common scenario's and everything you might need to know to get your projects started.

First meetup that is coming up: Setting up a text classification project using Argilla and SetFit!

Deploy Argilla on Spaces
Vibe check your dataset
Configure and create an Argilla dataset
Add records
Add zero-shot suggestions
Evaluate model suggestions in Argilla
Train a SetFit model

Hope to see all of you guys there and looking forward to your questions and AI use cases. Don't be shy about bringing your own issues and questions to the table. We would love to answer them.

Sign up here: https://lu.ma/31mecp34
reacted to Jaward's post with πŸ”₯ about 1 month ago
view post
Post
1933
This is supercool!!
LlaVA-3D: adds 3D-awareness to LVMs without compromising 2D understanding capabilities.

Method: they developed a unified architecture that maps 2D clip patch features to their corresponding positions in 3D space - enabling joint 2D and 3D vision-language instruction tuning.

Project: https://zcmax.github.io/projects/LLaVA-3D/