AI & ML interests

None defined yet.

Recent Activity

prithivMLmods 
posted an update 1 day ago
view post
Post
1336
Comparing: DeepCaption-VLA-7B, built on Qwen2.5-VL-7B-Instruct, is tailored for image captioning and vision-language attribution, focusing on precise, descriptive captions of visual properties, object attributes, and scene details. In contrast, Qwen2.5-VL-7B-Abliterated-Caption-it is fine-tuned for abliterated captioning, generating highly detailed descriptions across diverse visual categories.

Models🤗
✦ DeepCaption-VLA-7B : prithivMLmods/DeepCaption-VLA-7B
✦ Qwen2.5-VL-7B-Abliterated-Caption-it : prithivMLmods/Qwen2.5-VL-7B-Abliterated-Caption-it

Spaces⛵
➜ VisionScope-R2 : prithivMLmods/VisionScope-R2
➜ Qwen2.5-VL-Outpost : prithivMLmods/Qwen2.5-VL-Outpost

Collection🗞️
DeepCaption attr. : prithivMLmods/deepcaption-attr-68b041172ebcb867e45c556a
VL Abliterated-Caption : prithivMLmods/vl-abliterated-caption-68a0443b63182e97a15c47a3
Multimodal VLMs - Until July'25 : prithivMLmods/multimodal-vlms-until-july25-688312e6b840e1e156f13027
Multimodal VLMs - Aug'25 : prithivMLmods/multimodal-vlms-until-july25-688312e6b840e1e156f13027

GitHub↗️
> DeepCaption-VLA-7B [4bit-notebook demo] : https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/DeepCaption-VLA-7B%5B4bit%20-%20notebook%20demo%5D/DeepCaption-VLA-7B.ipynb
> Qwen2.5-VL-3B-Abliterated-Caption-it(caption) : https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/Qwen2.5-VL-3B-Abliterated-Caption-it(caption)/Qwen2_5_VL_3B_Abliterated_Caption_it.ipynb

The community GPU grant was given by Hugging Face — special thanks to them. 🤗🚀

To know more about it, visit the app page or the respective model page!!
burtenshaw 
posted an update 1 day ago
view post
Post
1369
The open source AI community is just made of people who are passionate and care about their work. So we thought it would be cool to share our favourite icons of the community with a fun award.

Winners get free Hugging Face Pro Subscriptions, Merchandise, or compute credits for the hub.

🔗 Follow and nominate here: community-spotlight

This is a new initiative to recognise and celebrate the incredible work being done by community members. It's all about inspiring more collaboration and innovation in the world of machine learning and AI.

They're highlighting contributors in four key areas:
- model creators: building and sharing innovative and state-of-the-art models.
- educators: sharing knowledge through posts, articles, demos, and events.
- tool builders: creating the libraries, frameworks, and applications that we all use.
- community champions: supporting and mentoring others in forums.

Know someone who deserves recognition? Nominate them by opening a post in the Hugging Face community forum.
  • 1 reply
·
sergiopaniego 
posted an update 2 days ago
view post
Post
3715
You can now supercharge your TRL training pipelines with kernels

👷 kernels is new library to load optimized compute kernels directly from the Hub

Combined with TRL, it makes you developer experience smoother & faster.

Check out the new guide to learn more! 🕺

Learn ➡️ https://huggingface.co/docs/trl/main/en/kernels_hub
eliebak 
posted an update 4 days ago
view post
Post
2842
Super excited to announce that our research team at Hugging Face will be doing an AMA on reddit r/LocalLLaMA.

Come ask any questions to the team behind SmolLM, FineWeb and more! And who knows, maybe there’ll be a shiny new release to talk about?

Thursday 4th September, 8AM-11AM PST 🤗

science
prithivMLmods 
posted an update 4 days ago
view post
Post
5348
FastVLMs by Apple are the talk of the week for edge device VLMs and also for consumer-grade VLMs on the Hub. They have some impressive demos available on the Hub for live captioning and inference tasks. Meanwhile, I’m still exploring one of the coolest edge-device multimodal releases—Liquid AI’s LFM2-VL (450M and 1.6B). I’ve also made a live camera video inference demo, which is capable of running on Colab’s free-tier T4 GPU.

🤗Live Captioning Notebooks:
➠ LiquidAI LFM2 VL 1.6B Live Cam: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/LiquidAI-LFM2-VL-Live-Cam/LiquidAI_LFM2_VL_1_6B_Live_Cam.ipynb

➠ LiquidAI LFM2 VL 450M Live Cam: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/LiquidAI-LFM2-VL-Live-Cam/LiquidAI_LFM2_VL_450M_Live_Cam.ipynb

✨I also made a demo for the FastVLM Live Captioning Notebook.
➠ FastVLM 0.5B Live Cam: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/Apple-FastVLM-0.5B-Live-Cam/apple_FastVLM_0_5B_live_cam.ipynb

↗️For more notebooks, kindly visit the following repositories.
➠ Multimodal Outpost Notebooks: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks

Feel free to fork, modify, and explore!
prithivMLmods 
posted an update 9 days ago
view post
Post
3390
Introducing prithivMLmods/DeepCaption-VLA-7B, a multimodal VLM designed for reasoning with long-shot captions (Captioning and Vision-Language Attribution). It focuses on defining visual properties, object attributes, and scene details across a wide spectrum of images and aspect ratios, generating attribute-rich image captions. The model supports creative, artistic, and technical applications that require detailed descriptions. 🤗🔥

✦︎ Models: prithivMLmods/DeepCaption-VLA-7B, also includes prithivMLmods/DeepAttriCap-VLA-3B, an experimental model for vision-language attribution.

✦︎ Try the demo here: prithivMLmods/VisionScope-R2

✦︎ Try it now on Google Colab, with support for T4 GPUs in 4-bit quant_type: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/DeepCaption-VLA-7B%5B4bit%20-%20notebook%20demo%5D/DeepCaption-VLA-7B.ipynb

✦︎ Collection: prithivMLmods/deepcaption-attr-68b041172ebcb867e45c556a

.
.
.

To know more about it, visit the model card of the respective model. !!
  • 4 replies
·
mitkox 
posted an update 9 days ago
view post
Post
260
Hermes4 70B synthetic dataset generation on my desktop Z8 GPU rig:
307 tok/sec
1.1M tok/hour

The bottleneck for generating massive, high-quality reinforcement learning datasets is never the GPU compute; it's always the model's willingness to actually answer the darn question.
prithivMLmods 
posted an update 10 days ago
view post
Post
1223
OpenGVLab's InternVL3.5 is a new family of open-source multimodal models that have advanced versatility, reasoning, and efficiency. I have created 𝐝𝐞𝐦𝐨 𝐧𝐨𝐭𝐞𝐛𝐨𝐨𝐤𝐬 for models ranging from 1B to 4B parameters, available in multiple versions (MPO, Instruct, Pre-trained) and in both "thinking" and "non-thinking" settings, with experimental compatibility for 𝐓𝐞𝐬𝐥𝐚 𝐓𝟒 GPUs.

➠InternVL3_5_2B_MPO_Thinking: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/InternVL-3.5-Notebook/InternVL3.5-Thinking/1_InternVL3_5_2B_MPO_Thinking/1_InternVL3_5_2B_MPO_Thinking.ipynb
➠InternVL3_5_1B_Instruct_Thinking: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/InternVL-3.5-Notebook/InternVL3.5-Thinking/2_InternVL3_5_1B_Instruct_Thinking/2_InternVL3_5_1B_Instruct_Thinking.ipynb

➠InternVL3_5-1B-MPO: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/InternVL-3.5-Notebook/InternVL3_5-MPO/InternVL3_5-1B-MPO/InternVL3_5_1B_MPO.ipynb
➠InternVL3_5-2B-MPO: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/tree/main/InternVL-3.5-Notebook/InternVL3_5-MPO/InternVL3_5-2B-MPO

➠InternVL3_5-1B-Instruct: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/InternVL-3.5-Notebook/InternVL3_5-Instruct/InternVL3_5-1B-Instruct/InternVL3_5_1B_Instruct.ipynb
➠InternVL3_5-2B-Instruct: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/InternVL-3.5-Notebook/InternVL3_5-Instruct/InternVL3_5-2B-Instruct/InternVL3_5_2B_Instruct.ipynb

➠InternVL3_5-1B-Pretrained: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/InternVL-3.5-Notebook/InternVL3_5-Pretrained/InternVL3_5-1B-Pretrained/InternVL3_5_1B_Pretrained.ipynb
➠InternVL3_5-2B-Pretrained: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/InternVL-3.5-Notebook/InternVL3_5-Pretrained/InternVL3_5-2B-Pretrained/InternVL3_5_2B_Pretrained.ipynb

no flash_attention
sergiopaniego 
posted an update 11 days ago
view post
Post
398
It's now posible to do end-2-end ML without leaving the @huggingface Hub, by combining TRL + HF jobs + Trackio!!

🐡We just released a full guide explaining the process.

Go check it out!

📖 Guide: https://huggingface.co/docs/trl/main/en/jobs_training

💡 Reminder: HF Jobs is only available for Pro, Team, or Enterprise plans. Yet another reason to upgrade
prithivMLmods 
posted an update 11 days ago
view post
Post
5144
OpenGVLab's InternVL3_5-2B-MPO [Mixed Preference Optimization (MPO)] is a compact vision-language model in the InternVL3.5 series. You can now experience it in the Tiny VLMs Lab, an app featuring 15+ multimodal VLMs ranging from 250M to 4B parameters. These models support tasks such as OCR, reasoning, single-shot answering with small models, and captioning (including ablated variants), across a broad range of visual categories. They are also capable of handling images with complex, sensitive, or nuanced content, while adapting to varying aspect ratios and resolutions.

✨ Space/App : prithivMLmods/Tiny-VLMs-Lab
🫙 Model : OpenGVLab/InternVL3_5-2B-MPO
↗️ Collection: OpenGVLab/internvl35-68ac87bd52ebe953485927fb
🗞️ Paper : https://arxiv.org/pdf/2508.18265
↗️ Multimodal Space Collection : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

To learn more, visit the relevant spaces, collections, and model cards.
  • 1 reply
·
prithivMLmods 
posted an update 13 days ago
view post
Post
448
Dropping new adapters for Qwen-Image, including Qwen-Image-Studio-Realism, Qwen-Image-Anime-LoRA, Qwen-Image-Sketch-Smudge, Qwen-Image-Synthetic-Face, and Qwen-Image-Fragmented-Portraiture, with various style intermix compatibilities. For more details, visit the model card.

⤷ Studio Realism : prithivMLmods/Qwen-Image-Studio-Realism
⤷ Image Anime LoRA : prithivMLmods/Qwen-Image-Anime-LoRA
⤷ Sketch Smudge : prithivMLmods/Qwen-Image-Sketch-Smudge
⤷ Synthetic Face : prithivMLmods/Qwen-Image-Synthetic-Face
⤷ Fragmented Portraiture : prithivMLmods/Qwen-Image-Fragmented-Portraiture

Try it here at
✦︎ Qwen-Image-LoRA-DLC : prithivMLmods/Qwen-Image-LoRA-DLC
✦︎ Qwen-Image-Diffusion : prithivMLmods/Qwen-Image-Diffusion

Collection
✦︎ Qwen-Image-Exp-LoRA : prithivMLmods/qwen-image-exp-lora-68a978fe11400bc3165b0c4d
✦︎ Image Gen Apps (Diffusion) - LastUpdated 08/18 : prithivMLmods/image-gen-apps-diffusion-lastupdated-08-18-68a2f4c5ef3e5e394eacc20a

.
.
.

To know more, visit the following spaces, collections, and model cards.
eliebak 
posted an update 13 days ago
view post
Post
534
Motif 2.6B tech report is pretty insane, first time i see a model with differential attention and polynorm trained at scale!

> It's trained on 2.5T of token, with a "data mixture schedule" to continuously adjust the mixture over training.
> They use WSD with a "Simple moving average" averaging the last 6 ckpt every 8B token.
> They trained on Finemath, Fineweb2, DCLM, TxT360.
> Lot of details in the finetuning data they used, for instance they used EvolKit and did some "dataset fusion" to have more compressed knowledge into the data.
> They mention they also tried Normalized GPT, QK-Norm and Cross Layer Attention.

Motif-Technologies/Motif-2.6B
prithivMLmods 
posted an update 20 days ago
prithivMLmods 
posted an update 21 days ago
view post
Post
4685
Excited to introduce the Tiny VLMs Lab App for experiencing 15+ multimodal VLMs, ranging from a 250M parameter model to a 4B parameter model, for tasks like OCR, reasoning, small models for single-shot answering, and captioning (abliterated), across a broad range of visual categories including images with complex, sensitive, or nuanced content, while handling varying aspect ratios and resolutions.🧪

🤗 Space/App: prithivMLmods/Tiny-VLMs-Lab

✦︎ Also introducing prithivMLmods/Qwen2.5-VL-3B-Abliterated-Caption-it, tailored for Abliterated Captioning / Uncensored Image Captioning. This release comes as a lighter alternative to the existing Qwen2.5-VL-7B-Abliterated-Caption-it prithivMLmods/Qwen2.5-VL-7B-Abliterated-Caption-it model, making it usable on mid-range GPUs and even experimental on T4 GPUs.

✦︎ Collection: prithivMLmods/vl-abliterated-caption-68a0443b63182e97a15c47a3
✦︎ GitHub: https://github.com/PRITHIVSAKTHIUR/Tiny-VLMs-Lab
.
.
.
To know more about it, visit the app page or the respective model page!!
prithivMLmods 
posted an update 25 days ago
view post
Post
3189
Try Liquid AI's all-new multimodal models: LFM2-VL-1.6B & LFM2-VL-450M! Demo with the Gradio UI and ReportLab support and both models are runnable on T4 GPU!

↗ LFM2-VL-1.6B-LiquidAI : https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/LFM2-VL-1.6B-LiquidAI/LFM2-VL-1.6B_ReportLab.ipynb

↗ LFM2-VL-450M-LiquidAI : https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/LFM2-VL-450M-LiquidAI/LFM2-VL-450M_ReportLab.ipynb

.
.
.
To know more about it, visit the multimodal outpost notebooks !!
  • 1 reply
·
sergiopaniego 
posted an update 25 days ago
sergiopaniego 
posted an update 26 days ago
view post
Post
404
New Zero-Shot Object Detectors in transformers! 🥽

We’ve added LLMDet and MM GroundingDINO, plus a demo Space to compare them with others 🖼️

Play with it: ariG23498/zero-shot-od
sergiopaniego 
posted an update 27 days ago
prithivMLmods 
posted an update 28 days ago
view post
Post
4386
On the verge of releasing Poseidon-Reasoning-5M, a dataset built to excel in general thought processes, mathematics, and science across a diverse mixture of domains, I’m also dropping the Gargantua-R1-Compact dataset, a collection of over six million high-quality reasoning QA pair traces. 🤗🚀

✦ Gargantua-R1-Compact : prithivMLmods/Gargantua-R1-Compact

from datasets import load_dataset

dataset = load_dataset("prithivMLmods/Gargantua-R1-Compact", split="train")

Additionally, I’m adding the mini version of Gargantua — the Gargantua-R1-Wee : prithivMLmods/Gargantua-R1-Wee

from datasets import load_dataset

dataset = load_dataset("prithivMLmods/Gargantua-R1-Wee", split="train")

The composition spans 73.93% core mathematical reasoning involving problems, proofs, and computational challenges, 12.11% across diverse scientific domains such as physics, chemistry, biology, and interdisciplinary topics, 11.35% in competitive coding covering algorithms and data structures, 1.37% in academic science focusing on research-level methodology, 0.95% in creative and analytical reasoning through logic puzzles and problem-solving tasks, 0.25% in specialized technical areas like MLOps, LLMs, diffusion models, and CUDA, and 0.06% involving data from graphs and charts converted into structured JSON formats. Designed with both rich contextual depth and formal structural clarity, Gargantua-R1-Compact is an optimal resource for advancing research in symbolic reasoning, interpretability, and high-precision question answering in mathematical domains.

✦ Collection : prithivMLmods/gargantua-r1-mod-6896bfd7834e82b89ad2b38b


To know more about it, visit the dataset card of the respective dataset. !!