AI & ML interests

None defined yet.

openfreeย 
posted an update about 19 hours ago
view post
Post
136
๐ŸŽฎ 3D Airforce Simulator - Web Browser 3D Fighter Jet Simulator

๐Ÿš€ Introduction
A full-scale 3D aerial combat game created with VIBE CODING! ๐Ÿ”ฅ
Enjoy realistic fighter jet simulation directly in your browser without any installation.

๐ŸŽฏ Play Now
cutechicken/3D-Airforce-Simulator

โœจ Key Features

๐Ÿ›ฉ๏ธ Realistic Flight Physics
G-Force System: Blackout during extreme maneuvers! ๐Ÿ˜ต
Stall System: Loss of control below 300kt โš ๏ธ
Altitude Performance: G-Force increases with altitude ๐Ÿ“ˆ

๐Ÿ’ฅ Intense Combat
20mm Cannon: 940 rounds ๐Ÿ”ซ
AIM-9 Missiles: 8 missiles with 3-stage lock-on ๐ŸŽฏ
Flares: Essential for missile evasion! (3 uses) ๐ŸŽ†
Smart AI: Enemies perform evasive maneuvers + missile attacks ๐Ÿค–

๐ŸŽจ Professional HUD
Real-time speed/altitude/heading display ๐Ÿ“Š
RWR (Radar Warning Receiver) system ๐Ÿ“ก
Pitch ladder & roll indicator ๐ŸŽฏ
Automatic target marking system โญ•

๐ŸŽฎ Controls
๐Ÿ–ฑ๏ธ Mouse: Flight control (pitch/roll)
โŒจ๏ธ W/S: Throttle control
โŒจ๏ธ A/D: Rudder (yaw)
๐Ÿ–ฑ๏ธ Left Click: Fire!
โŒจ๏ธ R: Switch weapons
โŒจ๏ธ F: Deploy flares
โŒจ๏ธ G: Escape stall (hold 2 sec)

๐Ÿ† Mission
โฑ๏ธ Destroy all 4 enemy aircraft within 180 seconds!
๐Ÿ’ฏ Cannon kill: 100pts | Missile kill: 100pts | Collision kill: 200pts

๐Ÿ”ฅ Pro Tips
Missile Lock: Keep target in crosshair for 3 seconds ๐ŸŽฏ
Missile Warning: Press F immediately for flares! ๐Ÿšจ
Vision Darkening: Level out to recover ๐Ÿ‘๏ธ
Over-G Warning: Don't overdo extreme maneuvers! โšก

๐Ÿ’ป Tech Stack
Three.js Vanilla JS Web Audio API Pointer Lock API

๐ŸŽจ Power of VIBE CODING!
Complete game engine in a single JS file! ๐Ÿš€

Real-time physics simulation โš™๏ธ
Complex AI behavior patterns ๐Ÿง 
Professional HUD system ๐Ÿ“บ
Immersive sound design ๐Ÿ”Š


๐ŸŽฎ Play Now!
Browser only! Mouse required! ๐Ÿ–ฑ๏ธ

๐Ÿ› Known issue: BGM autoplay restrictions in some browsers

#3DGame #WebGL #ThreeJS #FlightSimulator #IndieGame #WebGame
  • 2 replies
ยท
prithivMLmodsย 
posted an update 2 days ago
view post
Post
2192
Demo of OCR & Math QA using multi-capable VLMs like MonkeyOCR-pro-1.2B, R1-One-Vision, VisionaryR1, Vision Matters-7B, and VIGAL-7B, all running together with support for both image and video inference. ๐Ÿช

โœฆ Demo Spaces :
โคท Multimodal VLMs : prithivMLmods/Multimodal-VLMs

โœฆ Models :
โคท Visionary R1 : maifoundations/Visionary-R1
โคท MonkeyOCR [1.2B] : echo840/MonkeyOCR-pro-1.2B
โคท ViGaL 7B : yunfeixie/ViGaL-7B
โคท Lh41-1042-Magellanic-7B-0711 : prithivMLmods/Lh41-1042-Magellanic-7B-0711
โคท Vision Matters 7B : Yuting6/Vision-Matters-7B
โคท WR30a-Deep-7B-0711 : prithivMLmods/WR30a-Deep-7B-0711

โœฆ MonkeyOCR-pro-1.2B Colab T4 Demo [ notebook ]
โคท MonkeyOCR-pro-1.2B-ReportLab : https://github.com/PRITHIVSAKTHIUR/OCR-ReportLab/blob/main/MonkeyOCR-0709/MonkeyOCR-pro-1.2B-ReportLab.ipynb

โœฆ GitHub : https://github.com/PRITHIVSAKTHIUR/OCR-ReportLab

The community GPU grant was given by Hugging Face โ€” special thanks to them.๐Ÿค—๐Ÿš€

.
.
.
To know more about it, visit the model card of the respective model. !!
prithivMLmodsย 
posted an update 9 days ago
view post
Post
3461
Multimodal OCR with ReportLab? On Colab T4? (Nanonets OCR, Monkey OCR, OCRFlux 3B, Typhoo OCR 3B?) .. Yeah, itโ€™s possible. Iโ€™ve made a dedicated Colab notebook to experiment with these models (all built on top of Qwen2.5 VL). ๐Ÿค—๐Ÿš€

Download notebooks here :

โœฆ๏ธŽ NanonetsOCR : https://colab.research.google.com/drive/1VvA-amvSVxGdWgIsh4_by6KWOtEs_Iqp
โœฆ๏ธŽ MonkeyOCR : https://colab.research.google.com/drive/1vPCojbmlXjDFUt06FJ1tjgnj_zWK4mUo
โœฆ๏ธŽ OCRFluxOCR : https://colab.research.google.com/drive/1TDoCXzWdF2hxVLbISqW6DjXAzOyI7pzf
โœฆ๏ธŽ TyphoonOCR : https://colab.research.google.com/drive/1_59zvLNnn1kvbiSFxzA1WiqhpbW8RKbz

๐Ÿœฒ Github : https://github.com/PRITHIVSAKTHIUR/OCR-ReportLab

What does it do?

1. Performs OCR on the input image
2. Generates a DOCX or PDF file with the input image and the extracted text

.
.
.
To know more about it, visit the model card of the respective model. !!
openfreeย 
posted an update 10 days ago
view post
Post
489
๐ŸŽฐ DNA CASINO: Hit the Genetic Jackpot! ๐Ÿงฌ

๐ŸŽฒ When Biotech Meets Vegas = ??
Hey there! Today I'm thrilled to introduce something truly extraordinary. We've transformed DNA-Diffusion into a full-blown casino slot machine - welcome to DNA CASINO! ๐ŸŽŠ

๐ŸŽฏ What's This All About?

๐Ÿงฌ Generate 200bp DNA Regulatory Sequences: AI-powered generation of cell-type specific synthetic biology sequences
๐ŸŽฐ Slot Machine UI: Watch each nucleotide (A,T,C,G) spin like real casino reels!
๐Ÿ”ฌ Real-time Protein Analysis: Instantly translate generated DNA to protein and get AI-powered structure/function analysis

๐Ÿ’ซ Key Features
1๏ธโƒฃ Choose Your Cell Type (Like Casino Chips!)
๐ŸŸข K562 - Leukemia cell line
๐Ÿ”ต GM12878 - Lymphoblastoid cell line
๐ŸŸก HepG2 - Liver cancer cell line

2๏ธโƒฃ Pull the Lever to Begin!
Just like a real slot machine - pull the lever or hit SPIN and watch 200 nucleotides whirl in spectacular fashion! ๐ŸŽช

3๏ธโƒฃ AI-Powered Protein Analysis

DNA โ†’ Protein translation
Structure/function prediction via vidraft/gemma-3-r1984-27b (ranked #2 in medical reliability on FACTS Grounding Leaderboard)
Cell-type specific insights

๐Ÿ› ๏ธ Tech Stack
python๐ŸŽจ Frontend: HTML/CSS/JS (Pure vanilla goodness!)
๐Ÿง  Backend: Gradio + DNA-Diffusion model
๐Ÿค– AI Analysis: vidraft/gemma-3-r1984-27b
โšก GPU Acceleration: Hugging Face Spaces GPU

๐ŸŽฎ How to Play

Pick your cell-type chip ๐ŸŽฏ
Pull the lever or hit SPIN! ๐ŸŽฐ
Watch your DNA sequence generate โœจ
Check out the protein analysis ๐Ÿ”ฌ

๐ŸŒŸ What Makes It Special

Immersive casino theme: Neon lights, glowing effects, and that iconic 777 lever!
Scientific accuracy: Real codon tables, precise protein translation
Educational value: Learn DNA-protein relationships the fun way

๐Ÿš€ Try It Now!
VIDraft/DNA-CASINO

@openfree
aiqtechย 
posted an update 10 days ago
view post
Post
2196
๐Ÿ”ฅ HuggingFace Heatmap Leaderboard
Visualizing AI ecosystem activity at a glance

aiqtech/Heatmap-Leaderboard

๐ŸŽฏ Introduction
A leaderboard that visualizes the vibrant HuggingFace community activity through heatmaps.

โœจ Key Features
๐Ÿ“Š Real-time Tracking - Model/dataset/app releases from AI labs and developers
๐Ÿ† Auto Ranking - Rankings based on activity over the past year
๐ŸŽจ Responsive UI - Unique colors per organization, mobile optimized
โšก Auto Updates - Hourly data refresh for latest information

๐ŸŒ Major Participants
Big Tech: OpenAI, Google, Meta, Microsoft, Apple, NVIDIA
AI Startups: Anthropic, Mistral, Stability AI, Cohere, DeepSeek
Chinese Companies: Tencent, Baidu, ByteDance, Qwen
HuggingFace Official: HuggingFaceH4, HuggingFaceM4, lerobot, etc.
Active Developers: prithivMLmods, lllyasviel, multimodalart and many more

๐Ÿš€ Value
Trend Analysis ๐Ÿ“ˆ Real-time open source contribution insights
Inspiration ๐Ÿ’ช Learn from other developers' activity patterns
Ecosystem Growth ๐ŸŒฑ Visualize AI community development

@John6666 @Nymbo @MaziyarPanahi @prithivMLmods @fffiloni @gokaygokay @enzostvs @black-forest-labs @lllyasviel @briaai @multimodalart @unsloth @Xenova @mistralai @meta-llama @facebook @openai @Anthropic @google @allenai @apple @microsoft @nvidia @CohereLabs @ibm-granite @stabilityai @huggingface @OpenEvals @HuggingFaceTB @HuggingFaceH4 @HuggingFaceM4 @HuggingFaceFW @HuggingFaceFV @open-r1 @parler-tts @nanotron @lerobot @distilbert @kakaobrain @NCSOFT @upstage @moreh @LGAI-EXAONE @naver-hyperclovax @OnomaAIResearch @kakaocorp @Baidu @PaddlePaddle @tencent @BAAI @OpenGVLab @InternLM @Skywork @MiniMaxAI @stepfun-ai @ByteDance @Bytedance Seed @bytedance-research @openbmb @THUDM @rednote-hilab @deepseek-ai @Qwen @wan-ai @XiaomiMiMo @IndexTeam @agents-course
@Agents-MCP-Hackathon @akhaliq @alexnasa @Alibaba-NLP
@ArtificialAnalysis @bartowski @bibibi12345 @calcuis
@ChenDY @city96 @Comfy-Org @fancyfeast @fal @google
  • 1 reply
ยท
prithivMLmodsย 
posted an update 11 days ago
view post
Post
1635
The bunch of comparable demos for Multimodal VLMs (excels in OCR, cinematography understanding, spatial reasoning, etc.) now up on the Hub ๐Ÿค— โ€” max recent till Jun'25.

โœฆ Demo Spaces โ€”

> [Nanonets-OCR-s, MonkeyOCR, Typhoon-OCR-7B, SmolDocling] : prithivMLmods/Multimodal-OCR2
> [GLM-4.1v, docscopeOCR-7B, MonkeyOCR, coreOCR-7B] : prithivMLmods/core-OCR
> [Camel-Doc-OCR, ViLaSR-7B, OCRFlux-3B, ShotVL-7B] : prithivMLmods/Doc-VLMs-v2-Localization
> [SkyCaptioner-V1, SpaceThinker-3B, coreOCR-7B, SpaceOm-3B] : prithivMLmods/VisionScope-R2
> [RolmOCR-7B, Qwen2-VL-OCR-2B, Aya-Vision-8B, Nanonets-OCR-s] : prithivMLmods/Multimodal-OCR
> [DREX-062225-7B, Typhoon-OCR-3B, olmOCR-7B-0225, VIREX-062225-7B] : prithivMLmods/Doc-VLMs-OCR
> [Cosmos-Reason1-7B, docscopeOCR-7B, Captioner-7B, visionOCR-3B] : prithivMLmods/DocScope-R1

โœฆ Space Collection : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

.
.
.
To know more about it, visit the model card of the respective model. !!
  • 1 reply
ยท
prithivMLmodsย 
posted an update 12 days ago
view post
Post
2396
The demo for Camel-Doc-OCR-062825 (exp) is optimized for document retrieval and direct Markdown (.md) generation from images and PDFs. Additional demos include OCRFlux-3B (document OCR), VilaSR (spatial reasoning with visual drawing), and ShotVL (cinematic language understanding). ๐Ÿช

โœฆ Space : prithivMLmods/Doc-VLMs-v2-Localization

Models :
โคท camel-doc-ocr-062825 : prithivMLmods/Camel-Doc-OCR-062825
โคท ocrflux-3b : ChatDOC/OCRFlux-3B
โคท vilasr : AntResearchNLP/ViLaSR
โคท shotvl : Vchitect/ShotVL-7B

โคท Multimodal Implementations : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

The community GPU grant was given by Hugging Face โ€” special thanks to them. This space supports the following tasks: (image inference, video inference) with result markdown canvas and object detection/localization. ๐Ÿค—๐Ÿš€

.
.
.
To know more about it, visit the model card of the respective model. !!
fantosย 
posted an update 12 days ago
view post
Post
2760
๐ŸŽฌ How to Use Seedance, the #1 Video Generation Model, for Free

๐Ÿ“Œ A Hidden Gem I Stumbled Upon
While browsing Hugging Face, I discovered an amazing project. I found ByteDance's Seedance video generation service - which knocked Google's VEO3 down to 2nd place on the video generation leaderboard - available for free on Hugging Face!

ginigen/Seedance-Free

Leaderboard standings:
๐Ÿฅ‡ 1st: ByteDance Seedance
๐Ÿฅˆ 2nd: Google VEO3

It's called "Bytedance Seedance Video Free" and is provided by Ginigen.

๐Ÿ’ก My Experience Using It
Key Features

Natural Physics Engine
-Realistic object movements
-Sophisticated light and shadow rendering

Fast Generation Speed
-Average 30 seconds to 1 minute completion
-No waiting - instant access

๐Ÿ› ๏ธ Available Features
Text to Video
-Generate 5-second videos from text descriptions
-Multiple aspect ratio support (16:9, 9:16, 1:1, etc.)

Image to Video
-Convert static images to videos
-Supports URL input or direct upload

AI Prompt Enhancement
-AI-based prompt optimization
-Expands simple descriptions into detailed scenarios

๐Ÿ“Š Technical Stack
Model API: Bytedance Seedance
Interface: Gradio
Prompt AI: VIDraft/Gemma-3-R1984-27B
Free API keys provided

๐ŸŽฏ Perfect For
Social media content creators
Marketing/advertising professionals
Video production beginners
AI technology enthusiasts
  • 1 reply
ยท
ginipickย 
posted an update 12 days ago
view post
Post
2677
๐ŸŽจ Flux-Kontext FaceLORA - AI Portrait Style Transfer

๐ŸŒŸ Introduction
Transform your photos into masterpieces! Flux-Kontext FaceLORA is an innovative AI-powered tool that converts portrait photos into various artistic styles using cutting-edge technology.

ginigen/Flux-Kontext-FaceLORA

โœจ Key Features

๐Ÿ“ธ Easy to Use: Upload photo โ†’ Select style โ†’ Click Generate!
๐ŸŽจ 7 Art Styles: Famous painter styles including Van Gogh, Monet, Renoir
๐Ÿค– Face Preservation: AI maintains your facial features while transforming the style
โšก Fast Generation: Get results in seconds with ZeroGPU support
๐ŸŽฏ Custom LoRA: Use any LoRA model from HuggingFace

๐Ÿ–ผ๏ธ Available Styles

๐Ÿฏ Studio Ghibli - Whimsical anime art style
๐ŸŒŠ Winslow Homer - American realist watercolor
๐ŸŒป Van Gogh - Post-impressionist with swirling brushstrokes
๐ŸŽ Paul Cรฉzanne - Geometric post-impressionist structure
๐ŸŒธ Renoir - Impressionist with soft luminous light
๐Ÿชท Claude Monet - Impressionist light and color
โš”๏ธ Fantasy Art - Epic magical character portraits

๐Ÿš€ How to Use
1๏ธโƒฃ Upload your portrait photo
2๏ธโƒฃ Select an art style from gallery
3๏ธโƒฃ Add optional description
4๏ธโƒฃ Click Generate โœจ button!
๐Ÿ’ก Pro Tips

๐ŸŽญ Front-facing photos work best
๐ŸŽจ Adjust Style Strength for transformation intensity
๐ŸŽฒ Use Randomize seed for varied results
๐Ÿ“ Add descriptions for more detailed outputs

๐Ÿ› ๏ธ Tech Stack

Model: FLUX.1-Kontext-dev by Black Forest Labs
LoRA: Community-created style adapters
Infrastructure: Hugging Face Spaces + ZeroGPU

๐ŸŽ‰ Start Creating Now!
Create your unique AI portrait and share it on social media! #FluxKontextFaceLORA #AIArt #PortraitTransfer
openfreeย 
posted an update 13 days ago
view post
Post
4072
๐Ÿง  SOMA: The Core Architecture for AGI Level 1 ๐Ÿš€

VIDraft/SOMA-AGI

๐ŸŽฏ The First Step Toward AGI
SOMA (Self-Orchestrating Modular Architect) is a revolutionary architecture that fulfills the essential requirements for AGI (Artificial General Intelligence) Level 1. It perfectly implements the common AGI prerequisites emphasized by Yann LeCun (Meta), OpenAI, and Google DeepMind within a single LLM.

๐Ÿ“‹ AGI Level 1 Core Requirements = SOMA's Perfect Implementation โœ…

๐ŸŽฏ Planning Capability
โ†’ Supervisor AI autonomously designs and executes comprehensive analysis roadmaps

๐Ÿงฉ Role Differentiation & Modularity
โ†’ A single LLM instantly differentiates into 5 expert AIs for collaboration

๐Ÿ”„ Self-reflection & Feedback Loops
โ†’ Evaluator AI continuously validates and directs improvements

๐Ÿ› ๏ธ Tool-use & Autonomy
โ†’ Full automation from web search to report generation

๐ŸŽฎ Long-term Agency Structure
โ†’ Completes complex 11-stage collaborative processes end-to-end

๐Ÿ”ท SOMA's Three Core Structures

๐Ÿงญ Self-Orchestrating
The ability to define problems and distribute roles without external instructions is fundamental. This is the actual implementation of OpenAI's "Agentic AI" concept, with built-in real-time self-regulation mechanisms.

๐Ÿงฉ Modular
A single LLM internally creates multiple personas:
๐ŸŽฏ Planner = Supervisor AI establishes strategies
๐Ÿ’ก Creator = Presents innovative solutions
๐Ÿ“š Analyzer = Collects and analyzes data
โš–๏ธ Evaluator = Performs critical assessments
๐Ÿ“Š Executor = Final synthesis and implementation
This perfectly realizes Meta AI's proposed "World Model + Planner + Memory + Actor" structure.

๐Ÿง  Architect
Capable of high-level thinking and problem structuring beyond simple execution. It actually implements the plan-adapt-multitask capabilities required by DeepMind's Gemini series, systematically decomposing and reconstructing complex problems.

๐Ÿ’ซ SOMA = The Embodiment of AGI Level 1
prithivMLmodsย 
posted an update 18 days ago
view post
Post
1966
The demo for DREX-062225-exp (Document Retrieval and Extraction eXpert ~ experimental) / typhoon-ocr-3b (a bilingual document parsing model built specifically for real-world documents) / VIREX-062225-exp (Video Information Retrieval and Extraction eXpert ~ experimental) / olmOCR-7B-0225-preview (the document parsing model based on Qwen2VL). ๐Ÿค—

โœฆ Demo : prithivMLmods/Doc-VLMs-OCR ~ ( with .md canvas )

โคท DREX-062225-exp : prithivMLmods/DREX-062225-exp
โคท typhoon-ocr-3b : scb10x/typhoon-ocr-3b
โคท VIREX-062225-exp : prithivMLmods/VIREX-062225-exp
โคท olmOCR-7B-0225-preview : allenai/olmOCR-7B-0225-preview

โคท Collection : prithivMLmods/doc-vl-685839064a863e1cd23be3f1
โคท Multimodal Implementations : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0
.
.
.

To know more about it, visit the model card of the respective model. !!
ยท
prithivMLmodsย 
posted an update 19 days ago
view post
Post
2687
Updated the docscopeOCR-7B-050425-exp with the DREX-062225-exp, with improved preciseness in table structure and line spacing in the markdown used on the document page. And though this is still an experimental one, it's expected to perform well in the defined DREX use cases [ Document Retrieval and Extraction eXpert โ€“ experimental ocr ]. ๐Ÿ’ป

โคท Model : prithivMLmods/DREX-062225-exp
โคท Demo : prithivMLmods/Doc-VLMs-OCR

โคท Collection : prithivMLmods/doc-vl-685839064a863e1cd23be3f1
โคท Multimodal Implementations : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0
โคท Git : https://github.com/PRITHIVSAKTHIUR/DREX.git
.
.
.

To know more about it, visit the model card of the respective model. !!
seawolf2357ย 
posted an update 19 days ago
view post
Post
697
๐Ÿš€ VEO3 Real-Time: Real-time AI Video Generation with Self-Forcing

๐ŸŽฏ Core Innovation: Self-Forcing Technology
VEO3 Real-Time, an open-source project challenging Google's VEO3, achieves real-time video generation through revolutionary Self-Forcing technology.

Heartsync/VEO3-RealTime

โšก What is Self-Forcing?
While traditional methods require 50-100 steps, Self-Forcing achieves the same quality in just 1-2 steps. Through self-correction and rapid convergence, this Distribution Matching Distillation (DMD) technique maintains quality while delivering 50x speed improvement.

๐Ÿ’ก Technical Advantages of Self-Forcing
1. Extreme Speed
Generates 4-second videos in under 30 seconds, with first frame streaming in just 3 seconds. This represents 50x faster performance than traditional diffusion methods.
2. Consistent Quality
Maintains cinematic quality despite fewer steps, ensures temporal consistency, and minimizes artifacts.
3. Efficient Resource Usage
Reduces GPU memory usage by 70% and heat generation by 30%, enabling smooth operation on mid-range GPUs like RTX 3060.

๐Ÿ› ๏ธ Technology Stack Synergy
VEO3 Real-Time integrates multiple technologies organically around Self-Forcing DMD. Self-Forcing DMD handles ultra-fast video generation, Wan2.1-T2V-1.3B serves as the high-quality video backbone, PyAV streaming enables real-time transmission, and Qwen3 adds intelligent prompt enhancement for polished results.

๐Ÿ“Š Performance Comparison
Traditional methods require 50-100 steps, taking 2-5 minutes for the first frame and 5-10 minutes total. In contrast, Self-Forcing needs only 1-2 steps, delivering the first frame in 3 seconds and complete videos in 30 seconds while maintaining equal quality.๐Ÿ”ฎ Future of Self-Forcing
Our next goal is real-time 1080p generation, with ongoing research to achieve
prithivMLmodsย 
posted an update 22 days ago
view post
Post
1898
The demo for smoldocling / nanonets ocr / typhoon ocr / monkey ocr explores the document OCR capabilities of various newly released multimodal VLMs in a single space. And if you're experiencing or demoing long document image OCR, kindly use the Smoldocling 256M preview [ Smoldocling is back in demo here. ] ๐Ÿค—.

โœฆ Try the demo here : prithivMLmods/Multimodal-OCR2

โคท MonkeyOCR Recognition : echo840/MonkeyOCR
โคท Nanonets-OCR-s : nanonets/Nanonets-OCR-s
โคท SmolDocling-256M-preview : ds4sd/SmolDocling-256M-preview
โคท typhoon-ocr-7b : scb10x/typhoon-ocr-7b

โคท Multimodal Implementations : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

โคท Github : https://github.com/PRITHIVSAKTHIUR/Multimodal-OCR2


The community GPU grant was given by Hugging Face โ€” special thanks to them. ๐Ÿค—๐Ÿš€



To know more about it, visit the model card of the respective model. !!
  • 2 replies
ยท
openfreeย 
posted an update 24 days ago
view post
Post
3197
๐ŸŽฏ Open GAMMA - AI PPT Generator 'GamJa'

๐Ÿš€ Project Introduction
Revolutionary AI presentation generator presented by OpenFree AI Community! Create professional-level PPTs with just a few clicks.
๐Ÿ†“ Completely FREE! Create Premium PPTs with Free GAMMA! ๐ŸŽ‰

DEMO: openfree/Open-GAMMA

โœจ Key Features

๐Ÿค– Powered by FACTS Grounding Leaderboard 2nd RANK LLM
Base Model: vidraft/gemma-3-R1984-27B
Perfect support for English/Korean/Multi-language
Automatic speaker notes generation

๐ŸŽจ Premium Visuals
3D style AI image generation
5 design themes (Professional, Modern, Nature, Creative, Minimal)
FLUX style diagram images
Automatic emoji bullet points

๐Ÿ“Š Smart Diagrams
Process Flow, Concept Map, WBS, Radial, Synoptic Chart
Content analysis-based automatic diagram generation
Perfect Korean font support

๐Ÿ’ก Main Features

๐Ÿ“ Intelligent Content Generation
Auto-generate 3-20 slides just by entering a topic
Latest information through web search
Reference PDF, CSV, TXT files


๐Ÿ–ผ๏ธ Visual Automation
3D images for cover & conclusion slides
Auto-generate 2 content-based diagrams
Add 2 FLUX style images


๐ŸŽฏ Customizable Design
5 professional themes
3 layout styles
Automatic emoji mapping system

๐Ÿ’ฐ Premium Features for FREE!
Create professional-grade presentations with Free GAMMA (Open GAMMA) that rivals paid PPT generation services! ๐Ÿš€
  • 4 replies
ยท
prithivMLmodsย 
posted an update 25 days ago
view post
Post
3878
The demo for the MonkeyOCR Recognition model, which adopts a Structure-Recognition-Relation (SRR) triplet paradigm & Nanonets-OCR-s a powerful, state-of-the-art image-to-markdown OCR model that goes far beyond traditional text extraction and other experimental document OCR models, is combined into a single space.

โœฆ Try the demo here : prithivMLmods/core-OCR
โœฆ Try Nanonets-OCR-s demo here : prithivMLmods/Multimodal-OCR

โคท MonkeyOCR Recognition : echo840/MonkeyOCR
โคท docscopeOCR-7B-050425-exp : prithivMLmods/docscopeOCR-7B-050425-exp
โคท coreOCR-7B-050325-preview : prithivMLmods/coreOCR-7B-050325-preview
โคท Nanonets-OCR-s : nanonets/Nanonets-OCR-s

โคท Multimodal Implementations : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

Also, include a sample OCR test using the VisionOCR-3B-061125 model and the Qwen2-VL-OCR-2B-Instruct model.
โคท Blog : https://huggingface.co/blog/prithivMLmods/visionocr-3b-061125-vs-qwen2-vl-ocr-2b-instruct

To know more about it, visit the model card of the respective model. !!
ginipickย 
posted an update 27 days ago
view post
Post
3391
๐ŸŽฌ VEO3 Directors - All-in-One AI Video Creation Suite

๐Ÿš€ What is VEO3 Directors?
VEO3 Directors is a revolutionary end-to-end AI video creation platform that transforms your ideas into cinematic reality. From story conception to final video with synchronized audio - all in one seamless workflow!

๐Ÿ”— Try It Now
ginigen/VEO3-Directors
ginigen/VEO3-Free
ginigen/VEO3-Free-mirror

โœจ Key Features
๐Ÿ“ Story Seed Generator

๐ŸŽฒ Instantly generate creative story ideas across multiple genres
๐ŸŒ Bilingual support (English/Korean)
๐ŸŽญ Rich categories: Genre, Setting, Characters, and more

๐ŸŽฅ AI Script & Prompt Crafting

๐Ÿ’ฌ Powered by Friendli API for Hollywood-quality prompts
๐Ÿค– AI Director writes detailed cinematography instructions
๐ŸŽฌ Professional elements: camera movements, lighting, VFX

๐ŸŽฌ Video + Audio Generation

๐ŸŽจ Wan2.1-T2V-14B for stunning visual quality
โšก NAG 4-step inference - 10x faster generation
๐ŸŽต MMAudio auto-generates matching soundscapes
๐ŸŽ›๏ธ Full control over resolution, duration, and style
๐Ÿ’ฌLLM(API): VIDraft/Gemma-3-R1984-27B

๐Ÿ’ก How It Works

Generate Story โ†’ "The Time Traveler's Final Choice" ๐Ÿ•ฐ๏ธ
Create Script โ†’ AI writes cinematic scene descriptions ๐Ÿ“œ
Produce Video โ†’ 4-8 second clip with synchronized audio ๐ŸŽž๏ธ

๐ŸŽฏ What Makes It Special

Unified Workflow: From idea to video in one interface
Director-Level Prompts: Professional cinematography language
Lightning Fast: Minutes, not hours
Smart Audio: Context-aware sound generation

๐Ÿ† Use Cases

๐Ÿ“ฑ Social Media Content
๐ŸŽ“ Educational Videos
๐Ÿ“บ Marketing & Ads
๐ŸŽฎ Game Cutscene Prototyping
๐ŸŽจ Digital Art Creation
seawolf2357ย 
posted an update 28 days ago
view post
Post
5318
โšก FusionX Enhanced Wan 2.1 I2V (14B) ๐ŸŽฌ

๐Ÿš€ Revolutionary Image-to-Video Generation Model
Generate cinematic-quality videos in just 8 steps!

Heartsync/WAN2-1-fast-T2V-FusioniX

โœจ Key Features
๐ŸŽฏ Ultra-Fast Generation: Premium quality in just 8-10 steps
๐ŸŽฌ Cinematic Quality: Smooth motion with detailed textures
๐Ÿ”ฅ FusionX Technology: Enhanced with CausVid + MPS Rewards LoRA
๐Ÿ“ Optimized Resolution: 576ร—1024 default settings
โšก 50% Speed Boost: Faster rendering compared to base models
๐Ÿ› ๏ธ Technical Stack

Base Model: Wan2.1 I2V 14B
Enhancement Technologies:

๐Ÿ”— CausVid LoRA (1.0 strength) - Motion modeling
๐Ÿ”— MPS Rewards LoRA (0.7 strength) - Detail optimization

Scheduler: UniPC Multistep (flow_shift=8.0)
Auto Prompt Enhancement: Automatic cinematic keyword injection

๐ŸŽจ How to Use

Upload Image - Select your starting image
Enter Prompt - Describe desired motion and style
Adjust Settings - 8 steps, 2-5 seconds recommended
Generate - Complete in just minutes!

๐Ÿ’ก Optimization Tips
โœ… Recommended Settings: 8-10 steps, 576ร—1024 resolution
โœ… Prompting: Use "cinematic motion, smooth animation" keywords
โœ… Duration: 2-5 seconds for optimal quality
โœ… Motion: Emphasize natural movement and camera work
๐Ÿ† FusionX Enhanced vs Standard Models
Performance Comparison: While standard models typically require 15-20 inference steps to achieve decent quality, our FusionX Enhanced version delivers premium results in just 8-10 steps - that's more than 50% faster! The rendering speed has been dramatically improved through optimized LoRA fusion, allowing creators to iterate quickly without sacrificing quality. Motion quality has been significantly enhanced with advanced causal modeling, producing smoother, more realistic animations compared to base implementations. Detail preservation is substantially better thanks to MPS Rewards training, maintaining crisp textures and consistent temporal coherence throughout the generated sequences.
  • 1 reply
ยท
openfreeย 
posted an update 28 days ago
view post
Post
2243
๐ŸŒ Whisper-OCR Multilingual Translation Space ๐Ÿš€

Welcome! This Space takes English audio, video, images, and PDFs and instantly converts them into Chinese (ZH), Thai (TH), and Russian (RU)โ€”no other source language required.

VIDraft/voice-trans

โœจ Key Features
๐ŸŽค Microphoneโ€‚โ€“ Record English speech โ†’ transcript + 3-language translation

๐Ÿ”Š Audio Fileโ€‚โ€“ Upload English audio โ†’ transcript + translation

๐ŸŽฌ Video Fileโ€‚โ€“ Auto-extract audio with FFmpeg โ†’ transcript + translation

๐Ÿ–ผ๏ธ Imageโ€‚โ€“ Nanonets-OCR pulls text โ†’ translation

๐Ÿ“„ PDFโ€‚โ€“ Up to 50 pages of text & tables โ†’ translation

๐Ÿ”„ Realtime Modeโ€‚โ€“ Flush every 10-15 s; newest lines appear at the top

๐Ÿ› ๏ธ Quick Start
Click โ€œDuplicateโ€ to fork, or launch directly.

Pick a tab (๐ŸŽค/๐Ÿ”Š/๐ŸŽฌ/๐Ÿ–ผ๏ธ/๐Ÿ“„/๐Ÿ”„) and feed it English input.

After a few seconds, see the ๐Ÿ“œ original and ๐ŸŒ 3-language translation side by side.

โšก Tech Stack
openai/whisper-large-v3-turbo โ€” fast, high-accuracy ASR

Nanonets-OCR-s (+ Flash Attention 2) โ€” document/image OCR

Gradio Blocks โ€” clean tabbed UI

PyTorch + CUDA โ€” auto GPU allocation & ThreadPool load balancing

๐Ÿ“Œ Notes
Translation quality depends on audio quality, lighting, and resolution.

Huge videos hit the HF Space upload cap (~2 GB).

Realtime tab requires browser microphone permission.
openfreeย 
posted an update about 1 month ago
view post
Post
2283
๐Ÿค— I'm leading 'Openfree AI', Korea's most prominent AI open-source community. First and foremost, I'd like to express my deepest gratitude for Hugging Face's continuous support and efforts. ๐Ÿ’™
Our Openfree AI collaborates with various AI communities across Korea, contributing to knowledge sharing and ecosystem development. ๐Ÿค I've been actively promoting the critical importance of Hugging Face as Korea's AI infrastructure backbone, engaging with senior government officials, National Assembly members, university leaders, and media executives to emphasize how Hugging Face represents Korea's AI future at a national policy level. I consider myself a 'voluntary Korean ambassador for Hugging Face'. ๐Ÿ‡ฐ๐Ÿ‡ทโœจ
Let me share our community's achievements on the Hugging Face platform over the past year: ๐ŸŽฏ

๐Ÿš€ Published hundreds of models and spaces
๐Ÿ‘ฅ Surpassed 10 million cumulative visitors
๐Ÿ“ˆ Achieved 1.7 million Monthly Active Users (MAU)
๐ŸŽจ Generated over 1 million images/videos per month

These achievements were possible thanks to Hugging Face's generous support, including H200 resources. Thank you sincerely. ๐Ÿ™
๐ŸŽ‰ I'm thrilled to share exciting news! This July, we'll host the "Hugging Face Forever" seminar at the Korean National Assembly, sponsored by AI policy lawmakers. ๐Ÿ›๏ธ Our community will organize this groundbreaking event focusing on 'Hugging Face and Community Contributions and Roles' - a truly meaningful and revolutionary milestone for Korea's AI ecosystem. ๐Ÿ’ซ
We'll continue working hard for Korea's AI ecosystem development and... oh, if you ever need a Korean branch manager for Hugging Face, please let me know! ๐Ÿ˜„ (Just kidding... or am I? ๐Ÿค”)
Thank you. ๐Ÿค—
Openfree AI Representative ๐Ÿ’Œ