CCTV Visual RAG
Search surveillance video frames with natural language
Search surveillance video frames with natural language
Answer questions about uploaded images
Answer questions about uploaded images using natural language
Ask questions about any image
Industrial video intelligence on AMD MI300X
Answer questions about uploaded images or videos
Ask questions about images and get answers
Predict UI click coordinates from a screenshot and instruction
Visual Retrieval with ColPali and Vespa
Dense Grounded Understanding of Images and Videos
Try PaliGemma on document understanding tasks
let's talk about the meaning of life
Magma-8B model for UI Agents
Answer questions using images and Chinese text
Answer questions about images
MoonDream 2 Vision Model on the Browser: Candle/Rust/WASM
Answer questions about images with text prompts
Predict click location on a UI screenshot
Celebrate the launch of Dicta-LM 3.0!
Molmo2 - Image, Video (QA, Pointing & Tracking)
Answer questions about images using OCR
Visualize web interaction recordings
Interactive analyzer for modular models in Transformers lib
Transcribe manga chapters with character names