Joy Caption Pre Alpha
Generate captions for images
Generate captions for images
Segment and caption objects in images and videos
Generate descriptions by uploading images or videos
Generate insights from charts using text prompts
Chat about images and get instant answers
Detect objects in your images
Extract text and metadata from PDF files
Try PaliGemma on document understanding tasks
Chat with an image using Phi-3 Vision model
Interact with a chatbot that understands text and images
Chat with Llama about images and text
GPT 4o like bot.
Extract text from documents using images or PDFs
Generate detailed descriptions from images and videos
Generate document retrieval queries from a page image
Microsoft Phi-3 Vision 128k with Multimodal capabilities
A Fully Open Multilingual Multimodal LLM for 39 Languages
Demo for DocLayout-YOLO
A data extraction tool to convert PDF to Markdown and JSON
Extract text from images
Huggingface space for JanusFlow-1.3B
Generate clickable coordinates on a screenshot
PaliGemma2 LoRA finetuned on VQAv2
Gaze detection using Moondream
Detect and visualize human poses in images and videos
demo of a collection of impressive ocr vl models on hf
Extract and recognize text from documents and images
OmniParser, turn your LLM into GUI agent
See, read, and reason—better together.
Generate text and segment images using PaliGemma 2
Interact with the Aya family of models.
interact with videos !
Classify images in real-time using your webcam
OCR for PDFs and Images using Mistral OCR
Upload an image to detect objects
Object Detection & Scene Understanding for Images and Video
Describe any selected part of an image
Object Detection on Images and Video
Generate text answers from live camera images
Seed1.5-VL API Demo
Demo for Nanonets-OCR
Chat with Kimi-VL: respond to text, images, video, PDFs
THUDM/GLM-4.1V-9B-Thinking Demo
Generate text responses from images and text input
Extract structured layout and text from PDFs or images