AI & ML interests

Building interactive demos to scikit-learn examples 🧡

Recent Activity

sklearn-docs's activity

merve 
posted an update about 5 hours ago
merve 
posted an update 1 day ago
prithivMLmods 
posted an update 3 days ago
view post
Post
3874
OpenAI, Google, Hugging Face, and Anthropic have released guides and courses on building agents, prompting techniques, scaling AI use cases, and more. Below are 10+ minimalistic guides and courses that may help you in your progress. 📖

⤷ Agents Companion : https://www.kaggle.com/whitepaper-agent-companion
⤷ Building Effective Agents : https://www.anthropic.com/engineering/building-effective-agents
⤷ Guide to building agents by OpenAI : https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf
⤷ Prompt engineering by Google : https://www.kaggle.com/whitepaper-prompt-engineering
⤷ Google: 601 real-world gen AI use cases : https://cloud.google.com/transform/101-real-world-generative-ai-use-cases-from-industry-leaders
⤷ Prompt engineering by IBM : https://www.ibm.com/think/topics/prompt-engineering-guide
⤷ Prompt Engineering by Anthropic : https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview
⤷ Scaling AI use cases : https://cdn.openai.com/business-guides-and-resources/identifying-and-scaling-ai-use-cases.pdf
⤷ Prompting Guide 101 : https://services.google.com/fh/files/misc/gemini-for-google-workspace-prompting-guide-101.pdf
⤷ AI in the Enterprise by OpenAI : https://cdn.openai.com/business-guides-and-resources/ai-in-the-enterprise.pdf

by HF🤗 :
⤷ AI Agents Course by Huggingface : https://huggingface.co/learn/agents-course/unit0/introduction
⤷ Smol-agents Docs : https://huggingface.co/docs/smolagents/en/tutorials/building_good_agents
⤷ MCP Course by Huggingface : https://huggingface.co/learn/mcp-course/unit0/introduction
⤷ Other Course (LLM, Computer Vision, Deep RL, Audio, Diffusion, Cookbooks, etc..) : https://huggingface.co/learn
  • 2 replies
·
merve 
posted an update 3 days ago
view post
Post
1860
HOT: MiMo-VL new 7B vision LMs by Xiaomi surpassing gpt-4o (Mar), competitive in GUI agentic + reasoning tasks ❤️‍🔥 XiaomiMiMo/mimo-vl-68382ccacc7c2875500cd212

not only that, but also MIT license & usable with transformers 🔥
prithivMLmods 
posted an update 4 days ago
view post
Post
2045
Just made a demo for Cosmos-Reason1, a physical AI model that understands physical common sense and generates appropriate embodied decisions in natural language through long chain-of-thought reasoning. Also added video understanding support to it. 🤗🚀

✦ Try the demo here : prithivMLmods/DocScope-R1

⤷ Cosmos-Reason1-7B : nvidia/Cosmos-Reason1-7B
⤷ docscopeOCR-7B-050425-exp : prithivMLmods/docscopeOCR-7B-050425-exp
⤷ Captioner-Relaxed : Ertugrul/Qwen2.5-VL-7B-Captioner-Relaxed

⤷ Multimodal Implementations : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

⤷ GitHub :
https://github.com/PRITHIVSAKTHIUR/Cosmos-x-DocScope
https://github.com/PRITHIVSAKTHIUR/Nvidia-Cosmos-Reason1-Demo.

To know more about it, visit the model card of the respective model. !!
merve 
posted an update 4 days ago
view post
Post
2628
introducing: VLM vibe eval 🪭 visionLMsftw/VLMVibeEval

vision LMs are saturated over benchmarks, so we built vibe eval 💬

> compare different models with refreshed in-the-wild examples in different categories 🤠
> submit your favorite model for eval
no numbers -- just vibes!
merve 
posted an update 6 days ago
view post
Post
2500
emerging trend: models that can understand image + text and generate image + text

don't miss out ⤵️
> MMaDA: single 8B diffusion model aligned with CoT (reasoning!) + UniGRPO Gen-Verse/MMaDA
> BAGEL: 7B MoT model based on Qwen2.5, SigLIP-so-400M, Flux VAE ByteDance-Seed/BAGEL
both by ByteDance! 😱

I keep track of all any input → any output models here https://huggingface.co/collections/merve/any-to-any-models-6822042ee8eb7fb5e38f9b62
  • 1 reply
·
merve 
posted an update 7 days ago
view post
Post
3084
what happened in open AI past week? so many vision LM & omni releases 🔥 merve/releases-23-may-68343cb970bbc359f9b5fb05

multimodal 💬🖼️
> new moondream (VLM) is out: it's 4-bit quantized (with QAT) version of moondream-2b, runs on 2.5GB VRAM at 184 tps with only 0.6% drop in accuracy (OS) 🌚
> ByteDance released BAGEL-7B, an omni model that understands and generates both image + text. they also released Dolphin, a document parsing VLM 🐬 (OS)
> Google DeepMind dropped MedGemma in I/O, VLM that can interpret medical scans, and Gemma 3n, an omni model with competitive LLM performance

> MMaDa is a new 8B diffusion language model that can generate image and text



LLMs
> Mistral released Devstral, a 24B coding assistant (OS) 👩🏻‍💻
> Fairy R1-32B is a new reasoning model -- distilled version of DeepSeek-R1-Distill-Qwen-32B (OS)
> NVIDIA released ACEReason-Nemotron-14B, new 14B math and code reasoning model
> sarvam-m is a new Indic LM with hybrid thinking mode, based on Mistral Small (OS)
> samhitika-0.0.1 is a new Sanskrit corpus (BookCorpus translated with Gemma3-27B)

image generation 🎨
> MTVCrafter is a new human motion animation generator
  • 1 reply
·
merve 
posted an update 11 days ago
view post
Post
2564
Google released MedGemma on I/O'25 👏 google/medgemma-release-680aade845f90bec6a3f60c4

> 4B and 27B instruction fine-tuned vision LMs and a 4B pre-trained vision LM for medicine
> available with transformers from the get-go 🤗

they also released a cool demo for scan reading ➡️ google/rad_explain

use with transformers ⤵️
  • 1 reply
·
merve 
posted an update 11 days ago
view post
Post
3077
Bu post'u çevirebilirsiniz 🤗💗
·
merve 
posted an update 11 days ago
view post
Post
2361
tis the year of any-to-any/omni models 🤠
ByteDance-Seed/BAGEL-7B-MoT 7B native multimodal model that understands and generates both image + text

it outperforms leading VLMs like Qwen 2.5-VL 👏 and has Apache 2.0 license 😱
prithivMLmods 
posted an update 13 days ago
view post
Post
2233
Got access to Google's all-new Gemini Diffusion a state-of-the-art text diffusion model. It delivers the performance of Gemini 2.0 Flash-Lite at 5x the speed, generating over 1000 tokens in a fraction of a second and producing impressive results. Below are some initial outputs generated using the model. ♊🔥

Gemini Diffusion Playground ✦ : https://deepmind.google.com/frontiers/gemini-diffusion

Get Access Here : https://docs.google.com/forms/d/1aLm6J13tAkq4v4qwGR3z35W2qWy7mHiiA0wGEpecooo/viewform?edit_requested=true

🔗 To know more, visit: https://deepmind.google/models/gemini-diffusion/
  • 1 reply
·
merve 
posted an update 13 days ago
view post
Post
1710
NVIDIA released new vision reasoning model for robotics: Cosmos-Reason1-7B 🤖 nvidia/cosmos-reason1-67c9e926206426008f1da1b7

> first reasoning model for robotics
> based on Qwen 2.5-VL-7B, use with Hugging Face transformers or vLLM 🤗
> comes with SFT & alignment datasets and a new benchmark 👏
prithivMLmods 
posted an update 14 days ago
view post
Post
2250
The more optimized explicit content filters with lightweight 𝙜𝙪𝙖𝙧𝙙 models trained based on siglip2 patch16 512 and vit patch16 224 for illustration and explicit content classification for content moderation in social media, forums, and parental controls for safer browsing environments. this version fixes the issues in the previous release, which lacked sufficient resources. 🚀

⤷ Models :
→ siglip2 mini explicit content : prithivMLmods/siglip2-mini-explicit-content [recommended]
→ vit mini explicit content : prithivMLmods/vit-mini-explicit-content

⤷ Building image safety-guard models : strangerguardhf

⤷ Datasets :
→ nsfw multidomain classification : strangerguardhf/NSFW-MultiDomain-Classification
→ nsfw multidomain classification v2.0 : strangerguardhf/NSFW-MultiDomain-Classification-v2.0

⤷ Collection :
→ Updated Versions [05192025] : prithivMLmods/explicit-content-filters-682aaa4733e378561925ca2b
→ Previous Versions : prithivMLmods/siglip2-content-filters-042025-final-680fe4aa1a9d589bf2c915ff

Find a collections inside the collection.👆

To know more about it, visit the model card of the respective model.
  • 1 reply
·
merve 
posted an update 14 days ago
view post
Post
2570
It was the week of video generation at @huggingface , on top of many new LLMs, VLMs and more!
Let’s have a wrap 🌯 merve/may-16-releases-682aeed23b97eb0fe965345c

LLMs 💬
> Alibaba Qwen released WorldPM-72B, new World Preference Model trained with 15M preference samples (OS)
> II-Medical-8B, new LLM for medical reasoning that comes in 8B by Intelligent-Internet
> TRAIL is a new dataset by Patronus for trace error reasoning for agents (OS)

Multimodal 🖼️💬
> Salesforce Research released BLIP3o, a new any-to-any model with image-text input and image-text output 💬it’s based on an image encoder, a text decoder and a DiT, and comes in 8B
> They also released pre-training and fine-tuning datasets
> MMMG is a multimodal generation benchmark for image, audio, text (interleaved)

Image Generation ⏯️
> Alibaba Wan-AI released Wan2.1-VACE, video foundation model for image and text to video, video-to-audio and more tasks, comes in 1.3B and 14B (OS)
> ZuluVision released MoviiGen1.1, new cinematic video generation model based on Wan 2.1 14B (OS)
> multimodalart released isometric-skeumorphic-3d-bnb, an isometric 3D asset generator (like AirBnB assets) based on Flux
> LTX-Video-0.9.7-distilled is a new real-time video generation (text and image to video) model by Lightricks
> Hidream_t2i_human_preference is a new text-to-image preference dataset by Rapidata with 195k human responses from 38k annotators

Audio 🗣️
> stabilityai released stable-audio-open-small new text-to-audio model
> TEN-framework released ten-vad, voice activity detection model (OS)

merve 
posted an update 17 days ago
prithivMLmods 
posted an update 18 days ago
view post
Post
2686
Models for detecting images generated by diffusion models (Flux.1, SDXL, ..) are trained or fine-tuned using image classification models for content moderation. These models use datasets available on the Hub. For identifying AI-generated images or moderating visual content, the recommended model is OpenSDI-Flux.1-SigLIP2.😺🧨

Models : prithivMLmods/OpenSDI-Flux.1-SigLIP2 [Best approach for AI [Diffusion Generated] vs. real image classification] prithivMLmods/OpenSDI-SD2.1-SigLIP2 prithivMLmods/OpenSDI-SD3-SigLIP2 prithivMLmods/OpenSDI-SD1.5-SigLIP2 prithivMLmods/OpenSDI-SDXL-SigLIP2

Datasets : nebula/OpenSDI_test madebyollin/megalith-10m

Collection : prithivMLmods/opensdi-diffusion-generated-image-classification-682488a3a3e5be7083db3383

Find a collections inside the collection.👆

To know more about it, visit the model card of the respective model.
prithivMLmods 
posted an update 19 days ago
view post
Post
2019
Dropping some image classification models for content moderation and classifiers trained with datasets available on the Hub. All are fine-tuned on the siglip2 backbone, (competitions AIOrNot, Imagenette, and Driver-Drowsiness). Models and datasets are listed below:

🤗Models :
AI or Not : prithivMLmods/AIorNot-SigLIP2
Driver Drowsiness Detection : prithivMLmods/DOZE-GUARD-RLDD
Subset 10 ImageNet : prithivMLmods/IMAGENETTE

🥊Datasets :
+ competitions/aiornot
+ akahana/Driver-Drowsiness-Dataset
+ frgfm/imagenette

🔗Collection :
[The previous collection of models is also listed in the same collection, so you can find more models focused on image classification tasks.]

- prithivMLmods/multiclass-image-classification-05142025-68234c8010a9350a4d6739b5

Find a collections inside the collection.🤪👆

To know more about it, visit the model card of the respective model.
merve 
posted an update 21 days ago
view post
Post
5019
VLMS 2025 UPDATE 🔥

We just shipped a blog on everything latest on vision language models, including
🤖 GUI agents, agentic VLMs, omni models
📑 multimodal RAG
⏯️ video LMs
🤏🏻 smol models
..and more! https://huggingface.co/blog/vlms-2025
  • 1 reply
·