Joy Caption Pre Alpha
Generate captions for images
Generate captions for images
Segment objects in images and videos using text prompts
Generate descriptions by uploading images or videos
Generate insights from charts using text prompts
Generate descriptions for images using text prompts
Upload an image to detect objects
Extract text and metadata from PDF files
Try PaliGemma on document understanding tasks
Generate image descriptions
Chat with an AI that understands images and text
Ask questions about images and get detailed answers
GPT 4o like bot.
Extract and recognize text from documents
Generate detailed descriptions from images and videos
Generate retrieval queries from document images
Microsoft Phi-3 Vision 128k with Multimodal capabilities
A Fully Open Multilingual Multimodal LLM for 39 Languages
Demo for DocLayout-YOLO
Convert PDF or image to Markdown
Extract text from images
Huggingface space for JanusFlow-1.3B
Upload documents for Q&A
Generate clickable coordinates on a screenshot
PaliGemma2 LoRA finetuned on VQAv2
Gaze detection using Moondream
Detect and annotate poses in images and videos
image and video understanding
Extract text from documents
OmniParser, turn your LLM into GUI agent
See, read, and reasonβbetter together.
Generate text or segment objects from an image
Interact with the Aya family of models.
interact with videos !
Classify images in real-time using your webcam
OCR for PDFs and Images using Mistral OCR
Upload image to detect objects
Object Detection & Scene Understanding for Images and Video
Generate descriptions from images using masks
Object Detection on Images and Video
Describe objects in webcam feed
Seed1.5-VL API Demo