vision-r1
updated
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large
Language Models
Paper
• 2503.06749
• Published
• 31
Executable Code Actions Elicit Better LLM Agents
Paper
• 2402.01030
• Published
• 188
VGR: Visual Grounded Reasoning
Paper
• 2506.11991
• Published
• 20
Visual-TableQA: Open-Domain Benchmark for Reasoning over Table Images
Paper
• 2509.07966
• Published
• 5
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal
Large Language Models
Paper
• 2504.15279
• Published
• 78
Visual Abstract Thinking Empowers Multimodal Reasoning
Paper
• 2505.20164
• Published
• 1
PixelCraft: A Multi-Agent System for High-Fidelity Visual Reasoning on
Structured Images
Paper
• 2509.25185
• Published
• 5
Seeing Culture: A Benchmark for Visual Reasoning and Grounding
Paper
• 2509.16517
• Published
• 3
VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal
Reasoning by Iterative Perception
Paper
• 2509.21100
• Published
• 1
Jigsaw-R1: A Study of Rule-based Visual Reinforcement Learning with
Jigsaw Puzzles
Paper
• 2505.23590
• Published
• 25