de-Rodrigo (de Rodrigo)

posted an update 16 days ago

Post

396

MERIT Dataset 🎒📃🏆 Updates: The Token Classification Version is Now Live on the Hub!

This new version extends the previous dataset by providing richer labels that include word bounding boxes alongside the already available images. 🚀

We can't wait to see how you use this update! Give it a try, and let us know your thoughts, questions, or any cool projects you build with it. 💡

Resources:

- Dataset: de-Rodrigo/merit
- Code and generation pipeline: https://github.com/nachoDRT/MERIT-Dataset
- Paper: The MERIT Dataset: Modelling and Efficiently Rendering Interpretable Transcripts (2409.00447)

reacted to rwightman's post with 🔥 5 months ago

Post

1294

The timm leaderboard timm/leaderboard has been updated with the ability to select different hardware benchmark sets: RTX4090, RTX3090, two different CPUs along with some NCHW / NHWC layout and torch.compile (dynamo) variations.

Also worth pointing out, there are three rather newish 'test' models that you'll see at the top of any samples/sec comparison:
* test_vit ( timm/test_vit.r160_in1k)
* test_efficientnet ( timm/test_efficientnet.r160_in1k)
* test_byobnet ( timm/test_byobnet.r160_in1k, a mix of resnet, darknet, effnet/regnet like blocks)

They are < 0.5M params, insanely fast and originally intended for unit testing w/ real weights. They have awful ImageNet top-1, it's rare to have anyone bother to train a model this small on ImageNet (the classifier is roughly 30-70% of the param count!). However, they are FAST on very limited hadware and you can fine-tune them well on small data. Could be the model you're looking for?

reacted to yjernite's post with ❤️ 5 months ago

Post

👷🏽‍♀️📚🔨 Announcing the Foundation Model Development Cheatsheet!

My first 🤗Post🤗 ever to announce the release of a fantastic collaborative resource to support model developers across the full development stack: The FM Development Cheatsheet available here: https://fmcheatsheet.org/

The cheatsheet is a growing database of the many crucial resources coming from open research and development efforts to support the responsible development of models. This new resource highlights essential yet often underutilized tools in order to make it as easy as possible for developers to adopt best practices, covering among other aspects:
🧑🏼‍🤝‍🧑🏼 data selection, curation, and governance;
📖 accurate and limitations-aware documentation;
⚡ energy efficiency throughout the training phase;
📊 thorough capability assessments and risk evaluations;
🌏 environmentally and socially conscious deployment strategies.

We strongly encourage developers working on creating and improving models to make full use of the tools listed here, and to help keep the resource up to date by adding the resources that you yourself have developed or found useful in your own practice 🤗

Congrats to all the participants in this effort for the release! Read more about it from:
@Shayne - https://twitter.com/ShayneRedford/status/1763215814860186005
@hails and @stellaathena - https://blog.eleuther.ai/fm-dev-cheatsheet/
@alon-albalak - http://nlp.cs.ucsb.edu/blog/a-new-guide-for-the-responsible-development-of-foundation-models.html

And also to @gabrielilharco @sayashk @kklyman @kylel @mbrauh @fauxneticien @avi-skowron @Bertievidgen Laura Weidinger, Arvind Narayanan, @VictorSanh @Davlan @percyliang Rishi Bommasani, @breakend @sasha 🔥

1 reply

·

reacted to maxiw's post with 👍 5 months ago

Post

2838

The new Qwen-2 VL models seem to perform quite well in object detection. You can prompt them to respond with bounding boxes in a reference frame of 1k x 1k pixels and scale those boxes to the original image size.

You can try it out with my space maxiw/Qwen2-VL-Detection

4 replies

·

reacted to andito's post with 😎 5 months ago

Post

1621

🚀 Introducing Hugging Face's Multilingual Speech-to-Speech! 🎤
💬Our modular, cross-platform pipeline to run GPT4o-like experiences on device can now seamlessly switch languages mid-conversation with an imperceptible 100ms delay.

🌟 Building on an amazing early reception with 2600 stars on GitHub 🌟
🚀 We are expanding the library to support multiple languages
🔥 Try it out with a flag: --language fr
🤯 Or don't set the flag and let the system detect the language

💡 What feature should we add next?

1 reply

·

posted an update 5 months ago

Post

1028

A few weeks ago, we uploaded the MERIT Dataset 🎒📃🏆 into Hugging Face 🤗!

Now, we are excited to share the Merit Dataset paper via arXiv! 📃💫
The MERIT Dataset: Modelling and Efficiently Rendering Interpretable Transcripts (2409.00447)

The MERIT Dataset is a fully synthetic, labeled dataset created for training and benchmarking LLMs on Visually Rich Document Understanding tasks. It is also designed to help detect biases and improve interpretability in LLMs, where we are actively working. 🔧🔨

MERIT contains synthetically rendered students' transcripts of records from different schools in English and Spanish. We plan to expand the dataset into different contexts (synth medical/insurance documents, synth IDS, etc.) Want to collaborate? Do you have any feedback? 🧐

Resources:

- Dataset: de-Rodrigo/merit
- Code and generation pipeline: https://github.com/nachoDRT/MERIT-Dataset

PD: We are grateful to Hugging Face 🤗 for providing the fantastic tools and resources we find in the platform and, more specifically, to @nielsr for sharing the fine-tuning/inference scripts we have used in our benchmark.

reacted to danaaubakirova's post with 🚀 5 months ago

Post

1378

The Document AI team ( @Molbap , @rwightman , @danaaubakirova ) at Hugging Face is developing a new multimodal data augmentation pipeline utilising both visual and textual aspects of document images.

Check out my latest blog post for more details:
https://huggingface.co/blog/danaaubakirova/doc-augmentation

Please, share your thoughts and suggestions with us.
And stay tuned for the updates!

de Rodrigo PRO

AI & ML interests

Recent Activity

Organizations

de-Rodrigo's activity