Victor Mustar's picture

Victor Mustar PRO

victor

AI & ML interests

Building the UX of this website

Recent Activity

Organizations

Hugging Face's profile picture Google's profile picture Competitions's profile picture Safetensors's profile picture 21 RNN's profile picture Spaces-explorers's profile picture Text Generation Inference's profile picture CVPR Demo Track's profile picture Spaces Examples's profile picture Hugging Chat's profile picture Webhooks Explorers (BETA)'s profile picture lora concepts library's profile picture Scanned Tokens's profile picture Huggingface Projects's profile picture hf admins's profile picture Hugging Face OSS Metrics's profile picture Stable Diffusion Dreambooth Concepts Library's profile picture Core ML Projects's profile picture temp-org's profile picture Blog-explorers's profile picture Mustarz's profile picture Open LLM Leaderboard's profile picture Enterprise Explorers's profile picture The Collectionists's profile picture ZeroGPU Explorers's profile picture Hugging Face Tools's profile picture TstOrg141's profile picture Stable Video benchmark's profile picture Social Post Explorers's profile picture Dev Mode Explorers's profile picture LLHF's profile picture SLLHF's profile picture Self-serve FTW's profile picture Inference Explorers's profile picture

victor's activity

replied to prithivMLmods's post about 19 hours ago
reacted to prithivMLmods's post with 🔥 about 19 hours ago
view post
Post
1013
Dropping the domain-specific downstream image classification content moderation models, including the anime image type classification, GeoSceneNet, indoor-outdoor scene classification, and black-and-white vs. colored image classification models, along with the datasets. 🔥

╰┈➤Models :
+ GeoSceneNet : prithivMLmods/Multilabel-GeoSceneNet
+ IndoorOutdoorNet : prithivMLmods/IndoorOutdoorNet
+ B&W vs Colored : prithivMLmods/BnW-vs-Colored-Detection
+ Anime Image Type : prithivMLmods/Anime-Classification-v1.0
+ Multilabel Portrait : prithivMLmods/Multilabel-Portrait-SigLIP2

╰┈➤Datasets :
- GeoSceneNet : prithivMLmods/Multilabel-GeoSceneNet-16K
- IndoorOutdoorNet : prithivMLmods/IndoorOutdoorNet-20K
- BnW vs Colored : prithivMLmods/BnW-vs-Colored-10K
- Multilabel Portrait : prithivMLmods/Multilabel-Portrait-18K

╰┈➤Collections :
> Multilabel Image Classification Datasets : prithivMLmods/multilabel-image-classification-datasets-6809aa64637f45d4c47fa6ca
> Model Collection : prithivMLmods/siglip2-content-filters-models-v2-68053a958c42ef17a3a3f4d1

Note: The anime scene type dataset is not mentioned in the list because it is private and only accessible to members of the DeepGHS organization.

For raw ZIP files or more information about the datasets, visit: https://www.kaggle.com/prithivsakthiur/datasets
  • 1 reply
·
reacted to orasul's post with 👍 about 19 hours ago
view post
Post
1380
hi, it is deki, and now I am open sourced.

An Android AI agent powered by open-source ML model, 𝗱𝗲𝗸𝗶, was fully open-sourced.

It understands what’s on your screen and can perform tasks based on your voice or text commands.

Some examples:
* "Write my friend "some_name" in WhatsApp that I'll be 15 minutes late"
* "Open Twitter in the browser and write a post about something"
* "Read my latest notifications"
* "Write a linkedin post about something"

Currently, it works only on Android — but support for other OS is planned.

The ML and backend codes were also fully open-sourced.

Video prompt example:

"Open linkedin, tap post and write: hi, it is deki, and now I am open sourced. But don't send, just return"

License: GPLv3

You can find other AI agent demos or usage examples, like, code generation or object detection in github.

Github: https://github.com/RasulOs/deki
  • 1 reply
·
reacted to YerbaPage's post with 🔥 2 days ago
view post
Post
1865
Curated list of **Repository-level Code Generation** papers & benchmarks! 🔥

Stay ahead with the latest in:
✅ Repo-level Issue Resolution (SWE-bench, Agents)
✅ Repo-level Code Completion (Repo understanding)
✅ Datasets & Benchmarks

👉 Check it out: https://github.com/YerbaPage/Awesome-Repo-Level-Code-Generation 🔥
reacted to ProCreations's post with 🔥 2 days ago
view post
Post
1916
Come check out my new dataset Mistake to Meaning as an attempt to help smaller models understand user typos better! Hope you guys enjoy it

ProCreations/Mistake-To-Meaning
posted an update 3 days ago
view post
Post
2192
DIA TTS is just amazing - please share your funniest gens (here is mine) 😂
nari-labs/Dia-1.6B
reacted to davidberenstein1957's post with 🚀 3 days ago
view post
Post
2050
🔥 Announcing FLUX-Juiced: The Fastest Image Generation Endpoint (2.6x faster)!

Optimisations are widely applied and can reduce inference time, but their impact on quality often remains unclear, so we decided to challenge the status quo and create our own optimised version of FLUX.1[dev] called FLUX-juiced.

Blog: https://huggingface.co/blog/PrunaAI/flux-fastest-image-generation-endpoint
reacted to AdinaY's post with 🔥 3 days ago
view post
Post
2676
MAGI-1 🪄 the autoregressive diffusion video model, released by Sand AI

sand-ai/MAGI-1

✨ 24B with Apache 2.0
✨ Strong temporal consistency
✨ Benchmark-topping performance
  • 1 reply
·
reacted to shekkizh's post with 👀 3 days ago
view post
Post
1679
Think AGI is just around the corner? Not so fast.

When OpenAI released its Computer-Using Agent (CUA) API, I happened to be playing Wordle 🧩 and thought, why not see how the model handles it?
Spoiler: Wordle turned out to be a surprisingly effective benchmark.
So Romain Cosentino Ph.D. and I dug in and analyzed the results of several hundred runs.

🔑 Takeaways
1️⃣ Even the best computer-using models struggle with simple, context-dependent tasks. 
2️⃣ Visual perception and reasoning remain major hurdles for multimodal agents.
3️⃣ Real-world use cases reveal significant gaps between hype and reality. Perception accuracy drops to near zero by the last turn 📉

🔗 Read our arxiv article for more details https://www.arxiv.org/abs/2504.15434
  • 1 reply
·
reacted to ProCreations's post with 🔥 3 days ago
view post
Post
1332
🤖 IntellIte‑Chat v1.0 (Coming Soon)

A compact chat model built for speed, efficiency, and simplicity.

IntellIte‑Chat v1.0 is the debut model in the IntellIte series—a lightweight conversational transformer crafted to be fast, memory-efficient, and easy to work with. It’s designed for devs and enthusiasts who want sharp results without huge resource demands.

No fluff. Just chats.



🎯 Target Specs
• Pretraining Tokens: 4 billion
• Context Length: 16,384 tokens



🧠 Parameters & Architecture
• Model Size: ~100M parameters
• Architecture: Modified GPT-NeoX
• Focus: Chat performance with low latency and efficient memory use



🧃 Support the Build
Every dollar you donate is an extra amount of VRAM I get to work with. 😅
This project is fully independent and entirely self-funded. If you want to help bring it to life:
👉 https://buymeacoffee.com/procreations



💛 Early Supporters
All early supporters will be credited here when the model launches.
Even the smallest support means the world and pushes this project forward.

Special thanks to:
Maybe you?



🛠️ Development Status
• Architecture Design: Completed ✅
• Dataset Planning: Completed ✅
• Training Code: Near Completion 🛠️
• Training Launch: Starting Soon ⏳
• Evaluation Setup: Coming soon 🔜
• Final Release: Coming soon 🔜



Built to chat. Built on a budget. Built to prove what small models can do.
reacted to clem's post with 🔥 3 days ago
view post
Post
3709
Energy is a massive constraint for AI but do you even know what energy your chatGPT convos are using?

We're trying to change this by releasing ChatUI-energy, the first interface where you see in real-time what energy your AI conversations consume. Great work from @jdelavande powered by spaces & TGI, available for a dozen of open-source models like Llama, Mistral, Qwen, Gemma and more.

jdelavande/chat-ui-energy

Should all chat interfaces have this? Just like ingredients have to be shown on products you buy, we need more transparency in AI for users!
  • 3 replies
·
reacted to linoyts's post with 👍 3 days ago
reacted to clem's post with 🤗 3 days ago
view post
Post
2791
Just crossed half a million public apps on Hugging Face. A new public app is created every minute these days 🤯🤯🤯

What's your favorite? http://hf.co/spaces
  • 3 replies
·
reacted to bhalajin's post with 🔥 3 days ago
view post
Post
1593
###### CVPR2025 Workshop Challenge Alert ######

🫠 Between deadlines, rebuttals, and existential crises??? "We got you!!!!"

📢 Our new CVPR25 multi-modal challenge is online !!!

🍽️ Dishcovery: VLM MetaFood Challenge!!!! 🍽️


😋🧫 Can your groundbreaking VLM understand the difference between sushi styles, pasta types, or cooking methods from just image + caption pairs?

🌐 Our Task: Match fine-grained images to food descriptions


Challenge Highlights:

📦 400K food image-caption pairs, a little taste to get you started !!!

🔬 Got a SoTA VLM? Come test it on our challenging test sets !!!

🎯 Challenge for everyone! Easy to use SigLIP baseline is provided !!!

🔍 Real, synthetic, noisy data – just like real life - Will your VLM redefine how people track their diets??? ( 🗣️ We believe so!!! )


🔗 Join the challenge: https://www.kaggle.com/competitions/dishcovery-vlm-mtf-cvpr-2025

🗓️ Deadline: Phase I: 4th of May, 2025 - Phase II: 10th of May, 2025

👉 Workshop website: https://sites.google.com/view/cvpr-metafood-2025


#CVPR25 #ComputerVision #CV #Deeplearning #DL #VisionLanguage #VLM #multimodal #FoundationModels
reacted to luigi12345's post with 🔥 3 days ago
view post
Post
2347
SkyReels-V2 INFINITE VIDEO🔥♾️🎬 UNLIMITED duration video generation model by Skywork.

> “Finally is here. An Open-Source model that achieves what we all have waiting for: Infinite Length Videos.’’😮

Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought (2504.05599)

Model: Skywork/SkyReels-V2-T2V-14B-720P

✨ 1.3B & 14B
✨ Generates infinite length videos using Diffusion Forcing with diffusion models + autoregressive methods
reacted to nyuuzyou's post with 👍 10 days ago
view post
Post
5555
🇷🇺 Russian Forum Messages Dataset - nyuuzyou/ruforum

Collection of approximately 58 million Russian forum messages featuring:

- Complete message content from Russian online forums spanning 2010-2025
- Comprehensive metadata including unique message IDs and timestamps
- Full text content preserving original user discussions and interactions
- Monolingual dataset focused exclusively on Russian language content

This dataset offers a unique textual archive of Russian online conversations suitable for text generation, sentiment analysis, and language modeling research. Released to the public domain under CC0 1.0 license.
reacted to AdinaY's post with ❤️ 10 days ago
view post
Post
3182
🔥 New reasoning models from the Chinese community, by Skywork 天工-昆仑万维

Skywork/skywork-or1-67fa1bcb41b436ef2def76b9

✨Skywork OR1-Math-7B > Optimized for math reasoning
✨Skywork-OR1-7B-preview > Excels in math & coding
✨Skywork-OR1-32B-preview > Matches Deepseek-R1 on math (AIME24/25) and coding (LiveCodeBench)

Released under the Apache 2.0 license 🥳
Final version coming in 2 weeks!
reacted to thomwolf's post with 🚀 10 days ago
view post
Post
4387
If you've followed the progress of robotics in the past 18 months, you've likely noticed how robotics is increasingly becoming the next frontier that AI will unlock.

At Hugging Face—in robotics and across all AI fields—we believe in a future where AI and robots are open-source, transparent, and affordable; community-built and safe; hackable and fun. We've had so much mutual understanding and passion working with the Pollen Robotics team over the past year that we decided to join forces!

You can already find our open-source humanoid robot platform Reachy 2 on the Pollen website and the Pollen community and people here on the hub at pollen-robotics

We're so excited to build and share more open-source robots with the world in the coming months!
  • 1 reply
·
reacted to bartowski's post with 👍 10 days ago
view post
Post
11071
Access requests enabled for latest GLM models

While a fix is being implemented (https://github.com/ggml-org/llama.cpp/pull/12957) I want to leave the models up for visibility and continued discussion, but want to prevent accidental downloads of known broken models (even though there are settings that could fix it at runtime for now)

With this goal, I've enabled access requests. I don't really want your data, so I'm sorry that I don't think there's a way around that? But that's what I'm gonna do for now, and I'll remove the gate when a fix is up and verified and I have a chance to re-convert and quantize!

Hope you don't mind in the mean time :D
reacted to prithivMLmods's post with 👍 10 days ago
view post
Post
2521
Try out the demo for Multimodal OCR featuring the implementation of models including RolmOCR and Qwen2VL OCR. The use case showcases image-text-to-text conversion and video understanding support for the RolmOCR model ! 🚀

🤗Multimodal OCR Space : prithivMLmods/Multimodal-OCR

📦The models implemented in this Space are:
+ Qwen2VL OCR : prithivMLmods/Qwen2-VL-OCR-2B-Instruct [ or ]
+ Qwen2VL OCR2 : prithivMLmods/Qwen2-VL-OCR2-2B-Instruct
+ RolmOCR : reducto/RolmOCR

Qwen2VL OCR supports only image-text-to-text in the space.