Powergen-AI (PowergenAI)

posted an update 1 day ago

Post

1332

Introducing Camel-Doc-OCR-080125(v2), a document content-structure retrieval VLM designed for content extraction and summarization. This is the second model in the Camel Doc OCR VLM series, following Camel-Doc-OCR-062825(v1). The new version fixes formal table reconstruction issues in both en and zh language, achieving optimal performance for long-context inferences.🤗🐪

⤷ Camel-Doc-OCR(v2) : prithivMLmods/Camel-Doc-OCR-080125
⤷ Camel-Doc-OCR(v1) : prithivMLmods/Camel-Doc-OCR-062825
⤷ Demo : prithivMLmods/core-OCR

Multimodal Model Collections and Spaces:

➝ Camel-Doc-OCR : prithivMLmods/camel-doc-ocr-080125-688c0c61c5dba648756f31f8
➝ Vision-Language (VLr) : prithivMLmods/vision-language-for-reasoning-vlr-6889b3f45917352b5e3a6f7a
➝ Multimodal Spaces : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0
➝ Multimodal VLMs : prithivMLmods/multimodal-vlms-until-july25-688312e6b840e1e156f13027

.
.
.
To know more about it, visit the model card of the respective model. !!

2 replies

·

prithivMLmods

posted an update 3 days ago

Post

965

Exciting to bring the explicitly grounded experimental reasoning model, Lumian-VLR-7B-Thinking, built on top of Qwen2.5-VL, featuring reasoning-aware trajectories with enhanced spatial perception. Along with this, we’ve also added a demo for the model while bringing some of the latest and most interesting models available on the hub to make full use of the remaining resources.

✨ Multimodal-VLM-Thinking : prithivMLmods/Multimodal-VLM-Thinking
✨ Multimodal-VLM-OCR : prithivMLmods/Multimodal-VLM-OCR

✦ Models used in these spaces:

✨ Lumian-VLR-7B-Thinking : prithivMLmods/Lumian-VLR-7B-Thinking
✨ Enesidaon-VLR-7B-no-Thinking : prithivMLmods/Enesidaon-VLR-7B-no-Thinking
✨ GLM-4.1V-9B-Thinking : zai-org/GLM-4.1V-9B-Thinking
✨ DREX-062225-exp : prithivMLmods/DREX-062225-exp & more ...

✦ Multimodal Model Collections and Spaces:

✨ Vision-Language (VLr) : prithivMLmods/vision-language-for-reasoning-vlr-6889b3f45917352b5e3a6f7a
✨ Multimodal Spaces : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0
✨ Multimodal VLMs : prithivMLmods/multimodal-vlms-until-july25-688312e6b840e1e156f13027

.
.
.
To know more about it, visit the model card of the respective model. !!

prithivMLmods

posted an update 6 days ago

Post

4777

Explore OCR, Captioning, and Visual Understanding with Cutting-Edge Models on Hugging Face. 🤗🧪

I’ve put together a collection of Google Colab notebooks to experiment with some of the most exciting models available on the Hugging Face Hub focused on OCR, image captioning, and visual understanding tasks. [Image-to-Text] / [Image-Text-to-Text]

> 📖 OCR-ReportLab-Notebooks : prithivMLmods/OCR-ReportLab-Notebooks

These notebooks are built for quick prototyping and run on free T4 GPUs, making them perfect for experimentation, testing ideas, or just exploring what’s possible with modern vision-language models.

Note: The experimental notebooks are compiled with models that fit within the T4 GPU (free-tier) limits. More models along with their notebooks will be added over time.

prithivMLmods

posted an update 9 days ago

Post

2342

Excited to introduce the new experimental model "Qwen2.5-VL-7B-Abliterated-Caption-it", which is performing exceptionally well on image captioning tasks. This variant is specifically tailored for Abliterated Captioning and Uncensored Image Captioning. It is designed to generate highly detailed and descriptive captions across a broad range of visual categories including images with complex, sensitive, or nuanced content while handling varying aspect ratios and resolutions.🧪🤗

✨ Try the demo here : prithivMLmods/Qwen2.5-VL
✨ Qwen2.5-VL-7B-Abliterated-Caption-it : prithivMLmods/Qwen2.5-VL-7B-Abliterated-Caption-it
✨ Multimodal VLMs : prithivMLmods/multimodal-vlms-until-july25-688312e6b840e1e156f13027
✨ Multimodal Implementations : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

.
.
.
To know more about it, visit the model card of the respective model. !!

prithivMLmods

posted an update 10 days ago

Post

2355

olmOCR [Allen AI] just got an upgrade! 📈🧑‍🍳

The allenai/olmOCR-7B-0725 — fine-tuned with allenai/olmOCR-mix-0225 on top of Qwen/Qwen2.5-VL-7B-Instruct, pushing the boundaries of OCR technology. It takes a single document image as input, with the longest side resized to 1288 pixels. High-quality, openly available approach to parsing pdfs and other complex documents optical character recognition.

Try the demo here: prithivMLmods/Multimodal-OCR

✨ Model: allenai/olmOCR-7B-0725
✨ Model [fp8]: allenai/olmOCR-7B-0725-FP8
✨ Multimodal Implementations Space Collection: prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

.
.
.
To know more about it, visit the model card of the respective model. !!

prithivMLmods

posted an update 14 days ago

Post

5079

Upgraded the step-by-step notebook for fine-tuning SigLIP2 on domain-specific image classification tasks. The notebook supports both datasets with predefined train/test splits and those with only a train split, making it suitable for low-resource, custom, and real-world classification scenarios. 📢👉

➺ FineTuning-SigLIP2-Notebook : prithivMLmods/FineTuning-SigLIP2-Notebook

➺ GitHub : https://github.com/PRITHIVSAKTHIUR/FineTuning-SigLIP-2

➺ In the first, datasets include predefined train and test splits, enabling conventional supervised learning and generalization evaluation : prithivMLmods/FineTuning-SigLIP2-Notebook (.ipynb)

➺ In the second scenario, only a training split is available; in such cases, the training set is either partially reserved for validation or reused entirely for evaluation : prithivMLmods/FineTuning-SigLIP2-Notebook (.ipynb)

This flexibility supports experimentation in constrained or domain-specific settings, where standard test annotations may not exist.

prithivMLmods

posted an update 15 days ago

Post

4064

Dropping the general-purpose reasoning dataset Poseidon-Reasoning-5M, which supports general thought processes, math, and science — featuring a diverse mixture of domains 🌊 : prithivMLmods/Poseidon-Reasoning-5M

from datasets import load_dataset

dataset = load_dataset("prithivMLmods/Poseidon-Reasoning-5M", split="data")

The compact version is as follows — Poseidon-Reasoning-Mini-300K : prithivMLmods/Poseidon-Reasoning-Mini-300K

from datasets import load_dataset

dataset = load_dataset("prithivMLmods/Poseidon-Reasoning-Mini-300K", split="train")

Collection : prithivMLmods/poseidon-reasoning-6879ca98e118b307c781a9ba

openfree

posted an update 18 days ago

Post

3033

🎯 AGI NOVEL Generator: The First Step Toward True AI Creativity

openfree/AGI-NOVEL

Can AI Write a 100,000-Word Novel?
What's the ultimate test for AGI (Artificial General Intelligence)? Calculation? Logic? Or creativity?
We tackled the hardest creative challenge: A single AI writing a full-length novel with consistent voice from beginning to end.

🚀 Core Innovations

Single Writer System: Not fragmented texts from multiple AIs, but a genuine novel by one author
Immediate Critique System: Real-time literary critique and revision for each part
170 Quadrillion Themes: Infinite creative possibilities (4.6 million years at 100 novels/day!)
Philosophical Depth: Nobel Prize-level existential exploration and social insight

🎲 Infinite Possibilities
"The day my father died, I discovered he had another family he'd hidden all his life."
One random click generates a powerful opening sentence and a completely new story begins.
📊 Technical Achievements

8,000-word novella auto-generation (approximately 20 minutes)
10 organically structured parts: Perfect narrative arc from introduction to resolution
Real-time progress tracking: Session recovery for uninterrupted creation
DOCX/TXT export: Korean standard book format (152x225mm) support

🌟 Journey Toward AGI
This project goes beyond simple text generation. Sustained memory, causal reasoning, emotional nuance, ethical self-censorship, originality - it tests all capabilities required for AGI.
Experience it now! Your unique story awaits.

fantaxy

posted an update 18 days ago

Post

566

# 🚀 AGI Turing Test Leaderboard: Evaluating AI's Novel Writing Abilities!

## 🤔 Can Machines Truly Create?

In 2025, we've reached an era where we question whether AI can transcend being mere tools to become genuine creators. The **AGI Turing Test Leaderboard** is a revolutionary evaluation system that measures AI's true intelligence through long-form fiction writing!

fantaxy/AGI-LEADERBOARD

## 📚 Why Novel Writing?

Novel writing is humanity's most complex cognitive task. Maintaining coherent worldbuilding across tens of thousands of words, creating multidimensional characters, crafting narratives that move the human heart—all demanding a **symphony of cognitive abilities**! 🎭

## 🏆 10-Tier Literary Evaluation System

**10.0** ✨ Perfect Literary Achievement (Theoretical Apex)
**9.1** 🏅 Nobel Prize Level (*Beloved*, *Never Let Me Go*)
**8.1** 📖 Timeless Classic (*1984*, *Pride and Prejudice*)
**7.1** 🌍 Global Bestseller (*Harry Potter*, *The Da Vinci Code*)
**6.1** 🎖️ International Literary Award (*The Handmaid's Tale*, *Underground Railroad*)
**5.1** ✍️ Professional Writer Level (Academy Award Screenplays)
**0.0** ❌ Plagiarism/Human Work (Disqualified)

## 🎯 Current Record: 6.5 Points!

The recently submitted work "**Commotion as a meteorite crashes through the roof**" achieved **6.5 points**!

### 📊 Evaluation Breakdown
- **Base Score**: 6.1 points (International Literary Award level)
- **Volume Bonus**: +0.4 points (9,000 words)
- **Final Score**: 6.5/10 points

This work demonstrates quality comparable to major international literary award winners like the Booker Prize or Nebula Award!

## 🤝 Join This Historic Experiment!

The current record stands at 6.5 points. **Can your AI surpass this?**

This platform is more than an evaluation tool. It's a grand experiment in understanding creativity itself! We welcome submissions at all levels.

https://huggingface.co/blog/fantaxy/agi-leaderboard

prithivMLmods

posted an update 19 days ago

Post

2169

Open Omega Ω (Forge, Atom, Explora):
A Fusion of Math, Science, and Coding 🧪🤗

Datasets :
⌯⌲ Open-Omega-Forge-1M [Mathematics, Coding, and Science]: prithivMLmods/Open-Omega-Forge-1M
⌯⌲ Open-Omega-Atom-1.5M [Mathematics and Science]: prithivMLmods/Open-Omega-Atom-1.5M
⌯⌲ Open-Omega-Explora-2.5M [Forge + Atom]: prithivMLmods/Open-Omega-Explora-2.5M
⌯⌲ Others [Subordinate portion] - Curated and blended modular dataset.

Models :
> Omega-Qwen3-Atom-8B : prithivMLmods/Omega-Qwen3-Atom-8B
> Omega-Qwen2.5-Coder-3B : prithivMLmods/Omega-Qwen2.5-Coder-3B

Dataset Collection: prithivMLmods/open-omega-a-fusion-of-math-science-and-coding-68756c37769fa39c4055cc0e

.
.
.
For more information, refer to the dataset card(s).

prithivMLmods

posted an update 21 days ago

Post

3833

Excited to bring the new models that are performing exceptionally well in document OCR, image captioning, and visual understanding tasks. Megalodon-OCR and Perseus-Doc-VL have both demonstrated significant improvements across key areas. You can explore live demos on Hugging Face Spaces to compare their performance with other top-tier models available on the hub. 🤗📄

Models & Spaces :
> Megalodon-OCR (3B) : prithivMLmods/Megalodon-OCR-Sync-0713
> Perseus-Doc-vl (7B): prithivMLmods/Perseus-Doc-vl-0712
> Doc-VLMs-OCR : prithivMLmods/Multimodal-VLM-OCR
> core-OCR : prithivMLmods/core-OCR

Datasets Caption Mix :
> Corvus-OCR-Caption-Mix : prithivMLmods/Corvus-OCR-Caption-Mix
> Corvus-OCR-Caption-Mini-Mix : prithivMLmods/Corvus-OCR-Caption-Mini-Mix

Collections :
> Corvus OCR Caption Mix: prithivMLmods/corvus-ocr-caption-mix-687349bfaceffbd10976f0cc
> Captioning / OCR / DocTable : prithivMLmods/captioning-ocr-doctable-687382e1da822008bb5c06f2

GitHub :
> OCR-ReportLab : https://github.com/PRITHIVSAKTHIUR/OCR-ReportLab/blob/main/Megalodon-OCR-Sync-0713-ColabNotebook/Megalodon_OCR_Sync_0713_ReportLab.ipynb

Others Spaces :
> Multimodal-OCR : prithivMLmods/Multimodal-OCR
> Multimodal-VLMs : https://huggingface.co/spaces/prithivMLmods/Multimodal-OCR-Outpost
> Multimodal-OCR2 : prithivMLmods/Multimodal-OCR2
> Florence-2-Image-Caption : prithivMLmods/Florence-2-Image-Caption
> VisionScope-R2 : prithivMLmods/VisionScope-R2
> DocScope-R1 : prithivMLmods/DocScope-R1

.
.
.
To know more about it, visit the model card of the respective model. !!

ginipick

posted an update 22 days ago

Post

535

🎨 Flux Styler - AI Art Style Transfer

📝 Project Overview
Flux Styler is a cutting-edge AI web application that transforms ordinary images into stunning artworks using FLUX.1-Kontext-dev model with 22 professional style LoRAs.

ginigen/Flux-Kontext-Style

✨ Key Features

🖼️ One-Click Style Selection: Simply click thumbnails to apply styles instantly
🎯 22 Premium Art Styles: From Ghibli to Van Gogh, Pixel Art to LEGO
⚡ High-Speed GPU Processing: Generate 1024x1024 images in 30-60 seconds
🎮 Intuitive Interface: No complex settings - just upload and transform!

🎨 Style Categories
🌸 Anime & Cartoon
Ghibli | American Cartoon | JoJo | Snoopy | Rick & Morty
🎪 3D & Geometric
3D Chibi | Low Poly | LEGO | Clay Toy
🖌️ Traditional Art
Chinese Ink | Oil Painting | Van Gogh | Picasso | Pop Art
🧵 Craft & Material
Fabric | Origami | Paper Cutting | Macaron
💻 Digital Art
Pixel Art | Line Art | Vector
🚀 How to Use

Upload your image (or use default) 📤
Click any style thumbnail 🖱️
(Optional) Add custom instructions ✏️
Hit "Transform Image" 🎨
Download your masterpiece! 💾

🛠️ Tech Stack

Model: FLUX.1-Kontext-dev by Black Forest Labs
Style LoRAs: Owen777/Kontext-Style-Loras
Framework: Gradio + Diffusers
Acceleration: CUDA GPU (12GB+ VRAM)

💡 Use Cases

📱 Social Media Content Creation
🖼️ NFT Art Generation
🎁 Personalized Gift Design
📚 Educational Visual Materials
🎯 Brand Marketing Assets

openfree

posted an update 23 days ago

Post

663

🎮 3D Airforce Simulator - Web Browser 3D Fighter Jet Simulator

🚀 Introduction
A full-scale 3D aerial combat game created with VIBE CODING! 🔥
Enjoy realistic fighter jet simulation directly in your browser without any installation.

🎯 Play Now
cutechicken/3D-Airforce-Simulator

✨ Key Features

🛩️ Realistic Flight Physics
G-Force System: Blackout during extreme maneuvers! 😵
Stall System: Loss of control below 300kt ⚠️
Altitude Performance: G-Force increases with altitude 📈

💥 Intense Combat
20mm Cannon: 940 rounds 🔫
AIM-9 Missiles: 8 missiles with 3-stage lock-on 🎯
Flares: Essential for missile evasion! (3 uses) 🎆
Smart AI: Enemies perform evasive maneuvers + missile attacks 🤖

🎨 Professional HUD
Real-time speed/altitude/heading display 📊
RWR (Radar Warning Receiver) system 📡
Pitch ladder & roll indicator 🎯
Automatic target marking system ⭕

🎮 Controls
🖱️ Mouse: Flight control (pitch/roll)
⌨️ W/S: Throttle control
⌨️ A/D: Rudder (yaw)
🖱️ Left Click: Fire!
⌨️ R: Switch weapons
⌨️ F: Deploy flares
⌨️ G: Escape stall (hold 2 sec)

🏆 Mission
⏱️ Destroy all 4 enemy aircraft within 180 seconds!
💯 Cannon kill: 100pts | Missile kill: 100pts | Collision kill: 200pts

🔥 Pro Tips
Missile Lock: Keep target in crosshair for 3 seconds 🎯
Missile Warning: Press F immediately for flares! 🚨
Vision Darkening: Level out to recover 👁️
Over-G Warning: Don't overdo extreme maneuvers! ⚡

💻 Tech Stack
Three.js Vanilla JS Web Audio API Pointer Lock API

🎨 Power of VIBE CODING!
Complete game engine in a single JS file! 🚀

Real-time physics simulation ⚙️
Complex AI behavior patterns 🧠
Professional HUD system 📺
Immersive sound design 🔊

🎮 Play Now!
Browser only! Mouse required! 🖱️

🐛 Known issue: BGM autoplay restrictions in some browsers

#3DGame #WebGL #ThreeJS #FlightSimulator #IndieGame #WebGame

3 replies

·

prithivMLmods

posted an update 24 days ago

Post

2388

Demo of OCR & Math QA using multi-capable VLMs like MonkeyOCR-pro-1.2B, R1-One-Vision, VisionaryR1, Vision Matters-7B, and VIGAL-7B, all running together with support for both image and video inference. 🪐

✦ Demo Spaces :
⤷ Multimodal VLMs : https://huggingface.co/spaces/prithivMLmods/Multimodal-OCR-Outpost

✦ Models :
⤷ Visionary R1 : maifoundations/Visionary-R1
⤷ MonkeyOCR [1.2B] : echo840/MonkeyOCR-pro-1.2B
⤷ ViGaL 7B : yunfeixie/ViGaL-7B
⤷ Lh41-1042-Magellanic-7B-0711 : prithivMLmods/Lh41-1042-Magellanic-7B-0711
⤷ Vision Matters 7B : Yuting6/Vision-Matters-7B
⤷ WR30a-Deep-7B-0711 : prithivMLmods/WR30a-Deep-7B-0711

✦ MonkeyOCR-pro-1.2B Colab T4 Demo [ notebook ]
⤷ MonkeyOCR-pro-1.2B-ReportLab : https://github.com/PRITHIVSAKTHIUR/OCR-ReportLab/blob/main/MonkeyOCR-0709/MonkeyOCR-pro-1.2B-ReportLab.ipynb

✦ GitHub : https://github.com/PRITHIVSAKTHIUR/OCR-ReportLab

The community GPU grant was given by Hugging Face — special thanks to them.🤗🚀

.
.
.
To know more about it, visit the model card of the respective model. !!

prithivMLmods

posted an update about 1 month ago

Post

3561

Multimodal OCR with ReportLab? On Colab T4? (Nanonets OCR, Monkey OCR, OCRFlux 3B, Typhoo OCR 3B?) .. Yeah, it’s possible. I’ve made a dedicated Colab notebook to experiment with these models (all built on top of Qwen2.5 VL). 🤗🚀

Download notebooks here :

✦︎ NanonetsOCR : https://colab.research.google.com/drive/1VvA-amvSVxGdWgIsh4_by6KWOtEs_Iqp
✦︎ MonkeyOCR : https://colab.research.google.com/drive/1vPCojbmlXjDFUt06FJ1tjgnj_zWK4mUo
✦︎ OCRFluxOCR : https://colab.research.google.com/drive/1TDoCXzWdF2hxVLbISqW6DjXAzOyI7pzf
✦︎ TyphoonOCR : https://colab.research.google.com/drive/1_59zvLNnn1kvbiSFxzA1WiqhpbW8RKbz

🜲 Github : https://github.com/PRITHIVSAKTHIUR/OCR-ReportLab-Notebooks

What does it do?

1. Performs OCR on the input image
2. Generates a DOCX or PDF file with the input image and the extracted text

.
.
.
To know more about it, visit the model card of the respective model. !!

openfree

posted an update about 1 month ago

Post

615

🎰 DNA CASINO: Hit the Genetic Jackpot! 🧬

🎲 When Biotech Meets Vegas = ??
Hey there! Today I'm thrilled to introduce something truly extraordinary. We've transformed DNA-Diffusion into a full-blown casino slot machine - welcome to DNA CASINO! 🎊

🎯 What's This All About?

🧬 Generate 200bp DNA Regulatory Sequences: AI-powered generation of cell-type specific synthetic biology sequences
🎰 Slot Machine UI: Watch each nucleotide (A,T,C,G) spin like real casino reels!
🔬 Real-time Protein Analysis: Instantly translate generated DNA to protein and get AI-powered structure/function analysis

💫 Key Features
1️⃣ Choose Your Cell Type (Like Casino Chips!)
🟢 K562 - Leukemia cell line
🔵 GM12878 - Lymphoblastoid cell line
🟡 HepG2 - Liver cancer cell line

2️⃣ Pull the Lever to Begin!
Just like a real slot machine - pull the lever or hit SPIN and watch 200 nucleotides whirl in spectacular fashion! 🎪

3️⃣ AI-Powered Protein Analysis

DNA → Protein translation
Structure/function prediction via vidraft/gemma-3-r1984-27b (ranked #2 in medical reliability on FACTS Grounding Leaderboard)
Cell-type specific insights

🛠️ Tech Stack
python🎨 Frontend: HTML/CSS/JS (Pure vanilla goodness!)
🧠 Backend: Gradio + DNA-Diffusion model
🤖 AI Analysis: vidraft/gemma-3-r1984-27b
⚡ GPU Acceleration: Hugging Face Spaces GPU

🎮 How to Play

Pick your cell-type chip 🎯
Pull the lever or hit SPIN! 🎰
Watch your DNA sequence generate ✨
Check out the protein analysis 🔬

🌟 What Makes It Special

Immersive casino theme: Neon lights, glowing effects, and that iconic 777 lever!
Scientific accuracy: Real codon tables, precise protein translation
Educational value: Learn DNA-protein relationships the fun way

🚀 Try It Now!
VIDraft/DNA-CASINO

@openfree

aiqtech

posted an update about 1 month ago

Post

2407

🔥 HuggingFace Heatmap Leaderboard
Visualizing AI ecosystem activity at a glance

aiqtech/Heatmap-Leaderboard

🎯 Introduction
A leaderboard that visualizes the vibrant HuggingFace community activity through heatmaps.

✨ Key Features
📊 Real-time Tracking - Model/dataset/app releases from AI labs and developers
🏆 Auto Ranking - Rankings based on activity over the past year
🎨 Responsive UI - Unique colors per organization, mobile optimized
⚡ Auto Updates - Hourly data refresh for latest information

🌍 Major Participants
Big Tech: OpenAI, Google, Meta, Microsoft, Apple, NVIDIA
AI Startups: Anthropic, Mistral, Stability AI, Cohere, DeepSeek
Chinese Companies: Tencent, Baidu, ByteDance, Qwen
HuggingFace Official: HuggingFaceH4, HuggingFaceM4, lerobot, etc.
Active Developers: prithivMLmods, lllyasviel, multimodalart and many more

🚀 Value
Trend Analysis 📈 Real-time open source contribution insights
Inspiration 💪 Learn from other developers' activity patterns
Ecosystem Growth 🌱 Visualize AI community development

@John6666 @Nymbo @MaziyarPanahi @prithivMLmods @fffiloni @gokaygokay @enzostvs @black-forest-labs @lllyasviel @briaai @multimodalart @unsloth @Xenova @mistralai @meta-llama @facebook @openai @Anthropic @google @allenai @apple @microsoft @nvidia @CohereLabs @ibm-granite @stabilityai @huggingface @OpenEvals @HuggingFaceTB @HuggingFaceH4 @HuggingFaceM4 @HuggingFaceFW @HuggingFaceFV @open-r1 @parler-tts @nanotron @lerobot @distilbert @kakaobrain @NCSOFT @upstage @moreh @LGAI-EXAONE @naver-hyperclovax @OnomaAIResearch @kakaocorp @Baidu @PaddlePaddle @tencent @BAAI @OpenGVLab @InternLM @Skywork @MiniMaxAI @stepfun-ai @ByteDance @Bytedance Seed @bytedance-research @openbmb @THUDM @rednote-hilab @deepseek-ai @Qwen @wan-ai @XiaomiMiMo @IndexTeam @agents-course
@Agents-MCP-Hackathon @akhaliq @alexnasa @Alibaba-NLP
@ArtificialAnalysis @bartowski @bibibi12345 @calcuis
@ChenDY @city96 @Comfy-Org @fancyfeast @fal @google

1 reply

·

prithivMLmods

posted an update about 1 month ago

Post

1691

The bunch of comparable demos for Multimodal VLMs (excels in OCR, cinematography understanding, spatial reasoning, etc.) now up on the Hub 🤗 — max recent till Jun'25.

✦ Demo Spaces —

> [Nanonets-OCR-s, MonkeyOCR, Typhoon-OCR-7B, SmolDocling] : prithivMLmods/Multimodal-OCR2
> [GLM-4.1v, docscopeOCR-7B, MonkeyOCR, coreOCR-7B] : prithivMLmods/core-OCR
> [Camel-Doc-OCR, ViLaSR-7B, OCRFlux-3B, ShotVL-7B] : prithivMLmods/Multimodal-VLM-OCR
> [SkyCaptioner-V1, SpaceThinker-3B, coreOCR-7B, SpaceOm-3B] : prithivMLmods/VisionScope-R2
> [RolmOCR-7B, Qwen2-VL-OCR-2B, Aya-Vision-8B, Nanonets-OCR-s] : prithivMLmods/Multimodal-OCR
> [DREX-062225-7B, Typhoon-OCR-3B, olmOCR-7B-0225, VIREX-062225-7B] : prithivMLmods/Multimodal-VLM-Thinking
> [Cosmos-Reason1-7B, docscopeOCR-7B, Captioner-7B, visionOCR-3B] : prithivMLmods/DocScope-R1

✦ Space Collection : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

.
.
.
To know more about it, visit the model card of the respective model. !!

1 reply

·

prithivMLmods

posted an update about 1 month ago

Post

2441

The demo for Camel-Doc-OCR-062825 (exp) is optimized for document retrieval and direct Markdown (.md) generation from images and PDFs. Additional demos include OCRFlux-3B (document OCR), VilaSR (spatial reasoning with visual drawing), and ShotVL (cinematic language understanding). 🐪

✦ Space : https://huggingface.co/spaces/prithivMLmods/Doc-VLMs-v2-Localization

Models :
⤷ camel-doc-ocr-062825 : prithivMLmods/Camel-Doc-OCR-062825
⤷ ocrflux-3b : ChatDOC/OCRFlux-3B
⤷ vilasr : AntResearchNLP/ViLaSR
⤷ shotvl : Vchitect/ShotVL-7B

⤷ Multimodal Implementations : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

The community GPU grant was given by Hugging Face — special thanks to them. This space supports the following tasks: (image inference, video inference) with result markdown canvas and object detection/localization. 🤗🚀

.
.
.
To know more about it, visit the model card of the respective model. !!

fantos

posted an update about 1 month ago

Post

3013

🎬 How to Use Seedance, the #1 Video Generation Model, for Free

📌 A Hidden Gem I Stumbled Upon
While browsing Hugging Face, I discovered an amazing project. I found ByteDance's Seedance video generation service - which knocked Google's VEO3 down to 2nd place on the video generation leaderboard - available for free on Hugging Face!

ginigen/Seedance-Free

Leaderboard standings:
🥇 1st: ByteDance Seedance
🥈 2nd: Google VEO3

It's called "Bytedance Seedance Video Free" and is provided by Ginigen.

💡 My Experience Using It
Key Features

Natural Physics Engine
-Realistic object movements
-Sophisticated light and shadow rendering

Fast Generation Speed
-Average 30 seconds to 1 minute completion
-No waiting - instant access

🛠️ Available Features
Text to Video
-Generate 5-second videos from text descriptions
-Multiple aspect ratio support (16:9, 9:16, 1:1, etc.)

Image to Video
-Convert static images to videos
-Supports URL input or direct upload

AI Prompt Enhancement
-AI-based prompt optimization
-Expands simple descriptions into detailed scenarios

📊 Technical Stack
Model API: Bytedance Seedance
Interface: Gradio
Prompt AI: VIDraft/Gemma-3-R1984-27B
Free API keys provided

🎯 Perfect For
Social media content creators
Marketing/advertising professionals
Video production beginners
AI technology enthusiasts

1 reply

·

AI & ML interests

Team members 38

Powergen-AI's activity