de Rodrigo's picture

de Rodrigo PRO

de-Rodrigo

AI & ML interests

Synthetic Datasets, Multimodal LLMs, Computer Vision

Recent Activity

updated a model about 10 hours ago
de-Rodrigo/idefics2-merit
updated a model 1 day ago
de-Rodrigo/donut-merit
updated a model 1 day ago
de-Rodrigo/idefics2-merit
View all activity

Organizations

The Hidden Gallery's profile picture CICLAB Comillas ICAI's profile picture

de-Rodrigo's activity

posted an update 16 days ago
view post
Post
396
MERIT Dataset πŸŽ’πŸ“ƒπŸ† Updates: The Token Classification Version is Now Live on the Hub!

This new version extends the previous dataset by providing richer labels that include word bounding boxes alongside the already available images. πŸš€

We can't wait to see how you use this update! Give it a try, and let us know your thoughts, questions, or any cool projects you build with it. πŸ’‘

Resources:

- Dataset: de-Rodrigo/merit
- Code and generation pipeline: https://github.com/nachoDRT/MERIT-Dataset
- Paper: The MERIT Dataset: Modelling and Efficiently Rendering Interpretable Transcripts (2409.00447)
reacted to rwightman's post with πŸ”₯ 5 months ago
view post
Post
1294
The timm leaderboard timm/leaderboard has been updated with the ability to select different hardware benchmark sets: RTX4090, RTX3090, two different CPUs along with some NCHW / NHWC layout and torch.compile (dynamo) variations.

Also worth pointing out, there are three rather newish 'test' models that you'll see at the top of any samples/sec comparison:
* test_vit ( timm/test_vit.r160_in1k)
* test_efficientnet ( timm/test_efficientnet.r160_in1k)
* test_byobnet ( timm/test_byobnet.r160_in1k, a mix of resnet, darknet, effnet/regnet like blocks)

They are < 0.5M params, insanely fast and originally intended for unit testing w/ real weights. They have awful ImageNet top-1, it's rare to have anyone bother to train a model this small on ImageNet (the classifier is roughly 30-70% of the param count!). However, they are FAST on very limited hadware and you can fine-tune them well on small data. Could be the model you're looking for?
reacted to yjernite's post with ❀️ 5 months ago
view post
Post
πŸ‘·πŸ½β€β™€οΈπŸ“šπŸ”¨ Announcing the Foundation Model Development Cheatsheet!

My first πŸ€—PostπŸ€— ever to announce the release of a fantastic collaborative resource to support model developers across the full development stack: The FM Development Cheatsheet available here: https://fmcheatsheet.org/

The cheatsheet is a growing database of the many crucial resources coming from open research and development efforts to support the responsible development of models. This new resource highlights essential yet often underutilized tools in order to make it as easy as possible for developers to adopt best practices, covering among other aspects:
πŸ§‘πŸΌβ€πŸ€β€πŸ§‘πŸΌ data selection, curation, and governance;
πŸ“– accurate and limitations-aware documentation;
⚑ energy efficiency throughout the training phase;
πŸ“Š thorough capability assessments and risk evaluations;
🌏 environmentally and socially conscious deployment strategies.

We strongly encourage developers working on creating and improving models to make full use of the tools listed here, and to help keep the resource up to date by adding the resources that you yourself have developed or found useful in your own practice πŸ€—

Congrats to all the participants in this effort for the release! Read more about it from:
@Shayne - https://twitter.com/ShayneRedford/status/1763215814860186005
@hails and @stellaathena - https://blog.eleuther.ai/fm-dev-cheatsheet/
@alon-albalak - http://nlp.cs.ucsb.edu/blog/a-new-guide-for-the-responsible-development-of-foundation-models.html

And also to @gabrielilharco @sayashk @kklyman @kylel @mbrauh @fauxneticien @avi-skowron @Bertievidgen Laura Weidinger, Arvind Narayanan, @VictorSanh @Davlan @percyliang Rishi Bommasani, @breakend @sasha πŸ”₯
  • 1 reply
Β·
reacted to maxiw's post with πŸ‘ 5 months ago
view post
Post
2838
The new Qwen-2 VL models seem to perform quite well in object detection. You can prompt them to respond with bounding boxes in a reference frame of 1k x 1k pixels and scale those boxes to the original image size.

You can try it out with my space maxiw/Qwen2-VL-Detection

Β·
reacted to andito's post with 😎 5 months ago
view post
Post
1621
πŸš€ Introducing Hugging Face's Multilingual Speech-to-Speech! 🎀
πŸ’¬Our modular, cross-platform pipeline to run GPT4o-like experiences on device can now seamlessly switch languages mid-conversation with an imperceptible 100ms delay.

🌟 Building on an amazing early reception with 2600 stars on GitHub 🌟
πŸš€ We are expanding the library to support multiple languages
πŸ”₯ Try it out with a flag: --language fr
🀯 Or don't set the flag and let the system detect the language

πŸ’‘ What feature should we add next?
  • 1 reply
Β·
posted an update 5 months ago
view post
Post
1028
A few weeks ago, we uploaded the MERIT Dataset πŸŽ’πŸ“ƒπŸ† into Hugging Face πŸ€—!

Now, we are excited to share the Merit Dataset paper via arXiv! πŸ“ƒπŸ’«
The MERIT Dataset: Modelling and Efficiently Rendering Interpretable Transcripts (2409.00447)

The MERIT Dataset is a fully synthetic, labeled dataset created for training and benchmarking LLMs on Visually Rich Document Understanding tasks. It is also designed to help detect biases and improve interpretability in LLMs, where we are actively working. πŸ”§πŸ”¨

MERIT contains synthetically rendered students' transcripts of records from different schools in English and Spanish. We plan to expand the dataset into different contexts (synth medical/insurance documents, synth IDS, etc.) Want to collaborate? Do you have any feedback? 🧐

Resources:

- Dataset: de-Rodrigo/merit
- Code and generation pipeline: https://github.com/nachoDRT/MERIT-Dataset

PD: We are grateful to Hugging Face πŸ€— for providing the fantastic tools and resources we find in the platform and, more specifically, to @nielsr for sharing the fine-tuning/inference scripts we have used in our benchmark.
reacted to danaaubakirova's post with πŸš€ 5 months ago