I <3 your model cards
Victor Mustar PRO
victor
AI & ML interests
Building the UX of this website
Recent Activity
liked
a model
about 6 hours ago
openfree/flux-chatgpt-ghibli-lora
upvoted
a
paper
about 8 hours ago
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU
Inference via Dynamic-Length Float
liked
a dataset
about 12 hours ago
starvector/svg-stack
Organizations
victor's activity

replied to
prithivMLmods's
post
about 19 hours ago

reacted to
prithivMLmods's
post with 🔥
about 19 hours ago
Post
1013
Dropping the domain-specific downstream image classification content moderation models, including the anime image type classification, GeoSceneNet, indoor-outdoor scene classification, and black-and-white vs. colored image classification models, along with the datasets. 🔥
╰┈➤Models :
+ GeoSceneNet : prithivMLmods/Multilabel-GeoSceneNet
+ IndoorOutdoorNet : prithivMLmods/IndoorOutdoorNet
+ B&W vs Colored : prithivMLmods/BnW-vs-Colored-Detection
+ Anime Image Type : prithivMLmods/Anime-Classification-v1.0
+ Multilabel Portrait : prithivMLmods/Multilabel-Portrait-SigLIP2
╰┈➤Datasets :
- GeoSceneNet : prithivMLmods/Multilabel-GeoSceneNet-16K
- IndoorOutdoorNet : prithivMLmods/IndoorOutdoorNet-20K
- BnW vs Colored : prithivMLmods/BnW-vs-Colored-10K
- Multilabel Portrait : prithivMLmods/Multilabel-Portrait-18K
╰┈➤Collections :
> Multilabel Image Classification Datasets : prithivMLmods/multilabel-image-classification-datasets-6809aa64637f45d4c47fa6ca
> Model Collection : prithivMLmods/siglip2-content-filters-models-v2-68053a958c42ef17a3a3f4d1
For raw ZIP files or more information about the datasets, visit: https://www.kaggle.com/prithivsakthiur/datasets
╰┈➤Models :
+ GeoSceneNet : prithivMLmods/Multilabel-GeoSceneNet
+ IndoorOutdoorNet : prithivMLmods/IndoorOutdoorNet
+ B&W vs Colored : prithivMLmods/BnW-vs-Colored-Detection
+ Anime Image Type : prithivMLmods/Anime-Classification-v1.0
+ Multilabel Portrait : prithivMLmods/Multilabel-Portrait-SigLIP2
╰┈➤Datasets :
- GeoSceneNet : prithivMLmods/Multilabel-GeoSceneNet-16K
- IndoorOutdoorNet : prithivMLmods/IndoorOutdoorNet-20K
- BnW vs Colored : prithivMLmods/BnW-vs-Colored-10K
- Multilabel Portrait : prithivMLmods/Multilabel-Portrait-18K
╰┈➤Collections :
> Multilabel Image Classification Datasets : prithivMLmods/multilabel-image-classification-datasets-6809aa64637f45d4c47fa6ca
> Model Collection : prithivMLmods/siglip2-content-filters-models-v2-68053a958c42ef17a3a3f4d1
Note: The anime scene type dataset is not mentioned in the list because it is private and only accessible to members of the DeepGHS organization.
For raw ZIP files or more information about the datasets, visit: https://www.kaggle.com/prithivsakthiur/datasets

reacted to
orasul's
post with 👍
about 19 hours ago
Post
1380
hi, it is deki, and now I am open sourced.
An Android AI agent powered by open-source ML model, 𝗱𝗲𝗸𝗶, was fully open-sourced.
It understands what’s on your screen and can perform tasks based on your voice or text commands.
Some examples:
* "Write my friend "some_name" in WhatsApp that I'll be 15 minutes late"
* "Open Twitter in the browser and write a post about something"
* "Read my latest notifications"
* "Write a linkedin post about something"
Currently, it works only on Android — but support for other OS is planned.
The ML and backend codes were also fully open-sourced.
Video prompt example:
"Open linkedin, tap post and write: hi, it is deki, and now I am open sourced. But don't send, just return"
License: GPLv3
You can find other AI agent demos or usage examples, like, code generation or object detection in github.
Github: https://github.com/RasulOs/deki
An Android AI agent powered by open-source ML model, 𝗱𝗲𝗸𝗶, was fully open-sourced.
It understands what’s on your screen and can perform tasks based on your voice or text commands.
Some examples:
* "Write my friend "some_name" in WhatsApp that I'll be 15 minutes late"
* "Open Twitter in the browser and write a post about something"
* "Read my latest notifications"
* "Write a linkedin post about something"
Currently, it works only on Android — but support for other OS is planned.
The ML and backend codes were also fully open-sourced.
Video prompt example:
"Open linkedin, tap post and write: hi, it is deki, and now I am open sourced. But don't send, just return"
License: GPLv3
You can find other AI agent demos or usage examples, like, code generation or object detection in github.
Github: https://github.com/RasulOs/deki

reacted to
YerbaPage's
post with 🔥
2 days ago
Post
1865
Curated list of **Repository-level Code Generation** papers & benchmarks! 🔥
Stay ahead with the latest in:
✅ Repo-level Issue Resolution (SWE-bench, Agents)
✅ Repo-level Code Completion (Repo understanding)
✅ Datasets & Benchmarks
👉 Check it out: https://github.com/YerbaPage/Awesome-Repo-Level-Code-Generation 🔥
Stay ahead with the latest in:
✅ Repo-level Issue Resolution (SWE-bench, Agents)
✅ Repo-level Code Completion (Repo understanding)
✅ Datasets & Benchmarks
👉 Check it out: https://github.com/YerbaPage/Awesome-Repo-Level-Code-Generation 🔥

reacted to
ProCreations's
post with 🔥
2 days ago
Post
1916
Come check out my new dataset Mistake to Meaning as an attempt to help smaller models understand user typos better! Hope you guys enjoy it
ProCreations/Mistake-To-Meaning
ProCreations/Mistake-To-Meaning

reacted to
davidberenstein1957's
post with 🚀
3 days ago
Post
2050
🔥 Announcing FLUX-Juiced: The Fastest Image Generation Endpoint (2.6x faster)!
Optimisations are widely applied and can reduce inference time, but their impact on quality often remains unclear, so we decided to challenge the status quo and create our own optimised version of FLUX.1[dev] called FLUX-juiced.
Blog: https://huggingface.co/blog/PrunaAI/flux-fastest-image-generation-endpoint
Optimisations are widely applied and can reduce inference time, but their impact on quality often remains unclear, so we decided to challenge the status quo and create our own optimised version of FLUX.1[dev] called FLUX-juiced.
Blog: https://huggingface.co/blog/PrunaAI/flux-fastest-image-generation-endpoint

reacted to
AdinaY's
post with 🔥
3 days ago
Post
2676
MAGI-1 🪄 the autoregressive diffusion video model, released by Sand AI
sand-ai/MAGI-1
✨ 24B with Apache 2.0
✨ Strong temporal consistency
✨ Benchmark-topping performance
sand-ai/MAGI-1
✨ 24B with Apache 2.0
✨ Strong temporal consistency
✨ Benchmark-topping performance

reacted to
shekkizh's
post with 👀
3 days ago
Post
1679
Think AGI is just around the corner? Not so fast.
When OpenAI released its Computer-Using Agent (CUA) API, I happened to be playing Wordle 🧩 and thought, why not see how the model handles it?
Spoiler: Wordle turned out to be a surprisingly effective benchmark.
So Romain Cosentino Ph.D. and I dug in and analyzed the results of several hundred runs.
🔑 Takeaways
1️⃣ Even the best computer-using models struggle with simple, context-dependent tasks.
2️⃣ Visual perception and reasoning remain major hurdles for multimodal agents.
3️⃣ Real-world use cases reveal significant gaps between hype and reality. Perception accuracy drops to near zero by the last turn 📉
🔗 Read our arxiv article for more details https://www.arxiv.org/abs/2504.15434
When OpenAI released its Computer-Using Agent (CUA) API, I happened to be playing Wordle 🧩 and thought, why not see how the model handles it?
Spoiler: Wordle turned out to be a surprisingly effective benchmark.
So Romain Cosentino Ph.D. and I dug in and analyzed the results of several hundred runs.
🔑 Takeaways
1️⃣ Even the best computer-using models struggle with simple, context-dependent tasks.
2️⃣ Visual perception and reasoning remain major hurdles for multimodal agents.
3️⃣ Real-world use cases reveal significant gaps between hype and reality. Perception accuracy drops to near zero by the last turn 📉
🔗 Read our arxiv article for more details https://www.arxiv.org/abs/2504.15434

reacted to
ProCreations's
post with 🔥
3 days ago
Post
1332
🤖 IntellIte‑Chat v1.0 (Coming Soon)
A compact chat model built for speed, efficiency, and simplicity.
IntellIte‑Chat v1.0 is the debut model in the IntellIte series—a lightweight conversational transformer crafted to be fast, memory-efficient, and easy to work with. It’s designed for devs and enthusiasts who want sharp results without huge resource demands.
No fluff. Just chats.
⸻
🎯 Target Specs
• Pretraining Tokens: 4 billion
• Context Length: 16,384 tokens
⸻
🧠 Parameters & Architecture
• Model Size: ~100M parameters
• Architecture: Modified GPT-NeoX
• Focus: Chat performance with low latency and efficient memory use
⸻
🧃 Support the Build
Every dollar you donate is an extra amount of VRAM I get to work with. 😅
This project is fully independent and entirely self-funded. If you want to help bring it to life:
👉 https://buymeacoffee.com/procreations
⸻
💛 Early Supporters
All early supporters will be credited here when the model launches.
Even the smallest support means the world and pushes this project forward.
Special thanks to:
Maybe you?
⸻
🛠️ Development Status
• Architecture Design: Completed ✅
• Dataset Planning: Completed ✅
• Training Code: Near Completion 🛠️
• Training Launch: Starting Soon ⏳
• Evaluation Setup: Coming soon 🔜
• Final Release: Coming soon 🔜
⸻
Built to chat. Built on a budget. Built to prove what small models can do.
A compact chat model built for speed, efficiency, and simplicity.
IntellIte‑Chat v1.0 is the debut model in the IntellIte series—a lightweight conversational transformer crafted to be fast, memory-efficient, and easy to work with. It’s designed for devs and enthusiasts who want sharp results without huge resource demands.
No fluff. Just chats.
⸻
🎯 Target Specs
• Pretraining Tokens: 4 billion
• Context Length: 16,384 tokens
⸻
🧠 Parameters & Architecture
• Model Size: ~100M parameters
• Architecture: Modified GPT-NeoX
• Focus: Chat performance with low latency and efficient memory use
⸻
🧃 Support the Build
Every dollar you donate is an extra amount of VRAM I get to work with. 😅
This project is fully independent and entirely self-funded. If you want to help bring it to life:
👉 https://buymeacoffee.com/procreations
⸻
💛 Early Supporters
All early supporters will be credited here when the model launches.
Even the smallest support means the world and pushes this project forward.
Special thanks to:
Maybe you?
⸻
🛠️ Development Status
• Architecture Design: Completed ✅
• Dataset Planning: Completed ✅
• Training Code: Near Completion 🛠️
• Training Launch: Starting Soon ⏳
• Evaluation Setup: Coming soon 🔜
• Final Release: Coming soon 🔜
⸻
Built to chat. Built on a budget. Built to prove what small models can do.

reacted to
clem's
post with 🔥
3 days ago
Post
3709
Energy is a massive constraint for AI but do you even know what energy your chatGPT convos are using?
We're trying to change this by releasing ChatUI-energy, the first interface where you see in real-time what energy your AI conversations consume. Great work from @jdelavande powered by spaces & TGI, available for a dozen of open-source models like Llama, Mistral, Qwen, Gemma and more.
jdelavande/chat-ui-energy
Should all chat interfaces have this? Just like ingredients have to be shown on products you buy, we need more transparency in AI for users!
We're trying to change this by releasing ChatUI-energy, the first interface where you see in real-time what energy your AI conversations consume. Great work from @jdelavande powered by spaces & TGI, available for a dozen of open-source models like Llama, Mistral, Qwen, Gemma and more.
jdelavande/chat-ui-energy
Should all chat interfaces have this? Just like ingredients have to be shown on products you buy, we need more transparency in AI for users!

reacted to
linoyts's
post with 👍
3 days ago
Post
2386
We just shipped HiDream Image LoRA fine-tuning to diffusers🧨
HiDream's sota capabilities (and mit license) bring a lot of potential to explore with fine-tunes 🔥
- more upgrades and features soon!
- code, weights and config example 👇
🧶Yarn art lora: linoyts/HiDream-yarn-art-LoRA
code: https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_hidream.md
HiDream's sota capabilities (and mit license) bring a lot of potential to explore with fine-tunes 🔥
- more upgrades and features soon!
- code, weights and config example 👇
🧶Yarn art lora: linoyts/HiDream-yarn-art-LoRA
code: https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_hidream.md

reacted to
clem's
post with 🤗
3 days ago
Post
2791
Just crossed half a million public apps on Hugging Face. A new public app is created every minute these days 🤯🤯🤯
What's your favorite? http://hf.co/spaces
What's your favorite? http://hf.co/spaces

reacted to
bhalajin's
post with 🔥
3 days ago
Post
1593
###### CVPR2025 Workshop Challenge Alert ######
🫠 Between deadlines, rebuttals, and existential crises??? "We got you!!!!"
📢 Our new CVPR25 multi-modal challenge is online !!!
🍽️ Dishcovery: VLM MetaFood Challenge!!!! 🍽️
😋🧫 Can your groundbreaking VLM understand the difference between sushi styles, pasta types, or cooking methods from just image + caption pairs?
🌐 Our Task: Match fine-grained images to food descriptions
Challenge Highlights:
📦 400K food image-caption pairs, a little taste to get you started !!!
🔬 Got a SoTA VLM? Come test it on our challenging test sets !!!
🎯 Challenge for everyone! Easy to use SigLIP baseline is provided !!!
🔍 Real, synthetic, noisy data – just like real life - Will your VLM redefine how people track their diets??? ( 🗣️ We believe so!!! )
🔗 Join the challenge: https://www.kaggle.com/competitions/dishcovery-vlm-mtf-cvpr-2025
🗓️ Deadline: Phase I: 4th of May, 2025 - Phase II: 10th of May, 2025
👉 Workshop website: https://sites.google.com/view/cvpr-metafood-2025
#CVPR25 #ComputerVision #CV #Deeplearning #DL #VisionLanguage #VLM #multimodal #FoundationModels
🫠 Between deadlines, rebuttals, and existential crises??? "We got you!!!!"
📢 Our new CVPR25 multi-modal challenge is online !!!
🍽️ Dishcovery: VLM MetaFood Challenge!!!! 🍽️
😋🧫 Can your groundbreaking VLM understand the difference between sushi styles, pasta types, or cooking methods from just image + caption pairs?
🌐 Our Task: Match fine-grained images to food descriptions
Challenge Highlights:
📦 400K food image-caption pairs, a little taste to get you started !!!
🔬 Got a SoTA VLM? Come test it on our challenging test sets !!!
🎯 Challenge for everyone! Easy to use SigLIP baseline is provided !!!
🔍 Real, synthetic, noisy data – just like real life - Will your VLM redefine how people track their diets??? ( 🗣️ We believe so!!! )
🔗 Join the challenge: https://www.kaggle.com/competitions/dishcovery-vlm-mtf-cvpr-2025
🗓️ Deadline: Phase I: 4th of May, 2025 - Phase II: 10th of May, 2025
👉 Workshop website: https://sites.google.com/view/cvpr-metafood-2025
#CVPR25 #ComputerVision #CV #Deeplearning #DL #VisionLanguage #VLM #multimodal #FoundationModels

reacted to
luigi12345's
post with 🔥
3 days ago
Post
2347
SkyReels-V2 INFINITE VIDEO🔥♾️🎬 UNLIMITED duration video generation model by Skywork.
> “Finally is here. An Open-Source model that achieves what we all have waiting for: Infinite Length Videos.’’😮
Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought (2504.05599)
Model: Skywork/SkyReels-V2-T2V-14B-720P
✨ 1.3B & 14B
✨ Generates infinite length videos using Diffusion Forcing with diffusion models + autoregressive methods
> “Finally is here. An Open-Source model that achieves what we all have waiting for: Infinite Length Videos.’’😮
Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought (2504.05599)
Model: Skywork/SkyReels-V2-T2V-14B-720P
✨ 1.3B & 14B
✨ Generates infinite length videos using Diffusion Forcing with diffusion models + autoregressive methods

reacted to
nyuuzyou's
post with 👍
10 days ago
Post
5555
🇷🇺 Russian Forum Messages Dataset -
nyuuzyou/ruforum
Collection of approximately 58 million Russian forum messages featuring:
- Complete message content from Russian online forums spanning 2010-2025
- Comprehensive metadata including unique message IDs and timestamps
- Full text content preserving original user discussions and interactions
- Monolingual dataset focused exclusively on Russian language content
This dataset offers a unique textual archive of Russian online conversations suitable for text generation, sentiment analysis, and language modeling research. Released to the public domain under CC0 1.0 license.
Collection of approximately 58 million Russian forum messages featuring:
- Complete message content from Russian online forums spanning 2010-2025
- Comprehensive metadata including unique message IDs and timestamps
- Full text content preserving original user discussions and interactions
- Monolingual dataset focused exclusively on Russian language content
This dataset offers a unique textual archive of Russian online conversations suitable for text generation, sentiment analysis, and language modeling research. Released to the public domain under CC0 1.0 license.

reacted to
AdinaY's
post with ❤️
10 days ago
Post
3182
🔥 New reasoning models from the Chinese community, by Skywork 天工-昆仑万维
Skywork/skywork-or1-67fa1bcb41b436ef2def76b9
✨Skywork OR1-Math-7B > Optimized for math reasoning
✨Skywork-OR1-7B-preview > Excels in math & coding
✨Skywork-OR1-32B-preview > Matches Deepseek-R1 on math (AIME24/25) and coding (LiveCodeBench)
Released under the Apache 2.0 license 🥳
Final version coming in 2 weeks!
Skywork/skywork-or1-67fa1bcb41b436ef2def76b9
✨Skywork OR1-Math-7B > Optimized for math reasoning
✨Skywork-OR1-7B-preview > Excels in math & coding
✨Skywork-OR1-32B-preview > Matches Deepseek-R1 on math (AIME24/25) and coding (LiveCodeBench)
Released under the Apache 2.0 license 🥳
Final version coming in 2 weeks!

reacted to
thomwolf's
post with 🚀
10 days ago
Post
4387
If you've followed the progress of robotics in the past 18 months, you've likely noticed how robotics is increasingly becoming the next frontier that AI will unlock.
At Hugging Face—in robotics and across all AI fields—we believe in a future where AI and robots are open-source, transparent, and affordable; community-built and safe; hackable and fun. We've had so much mutual understanding and passion working with the Pollen Robotics team over the past year that we decided to join forces!
You can already find our open-source humanoid robot platform Reachy 2 on the Pollen website and the Pollen community and people here on the hub at
pollen-robotics
We're so excited to build and share more open-source robots with the world in the coming months!
At Hugging Face—in robotics and across all AI fields—we believe in a future where AI and robots are open-source, transparent, and affordable; community-built and safe; hackable and fun. We've had so much mutual understanding and passion working with the Pollen Robotics team over the past year that we decided to join forces!
You can already find our open-source humanoid robot platform Reachy 2 on the Pollen website and the Pollen community and people here on the hub at

We're so excited to build and share more open-source robots with the world in the coming months!

reacted to
bartowski's
post with 👍
10 days ago
Post
11071
Access requests enabled for latest GLM models
While a fix is being implemented (https://github.com/ggml-org/llama.cpp/pull/12957) I want to leave the models up for visibility and continued discussion, but want to prevent accidental downloads of known broken models (even though there are settings that could fix it at runtime for now)
With this goal, I've enabled access requests. I don't really want your data, so I'm sorry that I don't think there's a way around that? But that's what I'm gonna do for now, and I'll remove the gate when a fix is up and verified and I have a chance to re-convert and quantize!
Hope you don't mind in the mean time :D
While a fix is being implemented (https://github.com/ggml-org/llama.cpp/pull/12957) I want to leave the models up for visibility and continued discussion, but want to prevent accidental downloads of known broken models (even though there are settings that could fix it at runtime for now)
With this goal, I've enabled access requests. I don't really want your data, so I'm sorry that I don't think there's a way around that? But that's what I'm gonna do for now, and I'll remove the gate when a fix is up and verified and I have a chance to re-convert and quantize!
Hope you don't mind in the mean time :D

reacted to
prithivMLmods's
post with 👍
10 days ago
Post
2521
Try out the demo for Multimodal OCR featuring the implementation of models including
🤗Multimodal OCR Space : prithivMLmods/Multimodal-OCR
📦The models implemented in this Space are:
+ Qwen2VL OCR : prithivMLmods/Qwen2-VL-OCR-2B-Instruct
+ Qwen2VL OCR2 : prithivMLmods/Qwen2-VL-OCR2-2B-Instruct
+ RolmOCR : reducto/RolmOCR
RolmOCR
and Qwen2VL OCR
. The use case showcases image-text-to-text conversion and video understanding support for the RolmOCR model ! 🚀🤗Multimodal OCR Space : prithivMLmods/Multimodal-OCR
📦The models implemented in this Space are:
+ Qwen2VL OCR : prithivMLmods/Qwen2-VL-OCR-2B-Instruct
[ or ]
+ Qwen2VL OCR2 : prithivMLmods/Qwen2-VL-OCR2-2B-Instruct
+ RolmOCR : reducto/RolmOCR
Qwen2VL OCR supports only image-text-to-text in the space.