Papers
arxiv:2606.22476

CVSBench: A Comprehensive Benchmark for Cross-view Spatial Reasoning and Dreaming

Published on Jun 21
Authors:
,
,
,
,
,

Abstract

CVSBench evaluates cross-view spatial reasoning in vision-language models using satellite-street pairs, revealing limitations in maintaining object-level consistency under viewpoint changes and demonstrating improved performance through 3D scene imagination pipelines.

Humans can effortlessly reason about scenes across different viewpoints, yet it remains unclear whether Vision-Language Models (VLMs) possess similar cross-view spatial abilities. Satellite-street scene pairs, with their complex contexts and extreme viewpoint variations, provide an ideal testbed. Motivated by this, we introduce CVSBench, a large-scale benchmark for evaluating cross-view spatial reasoning through satellite-street pairs. This benchmark supports multiple tasks, including cross-view VQA, cross-view grounding, and viewpoint identification. CVSBench comprises 3,297 cross-view image groups with 9,468 object-level annotations and 40,679 question-answer (QA) pairs, enabling systematic and controlled evaluation of cross-view spatial reasoning. Extensive evaluations reveal that advanced VLMs struggle to maintain object-level and layout consistency under drastic viewpoint changes. To bridge this gap towards human-like spatial cognition, we investigate two categories of approaches: spatially grounded reasoning and the incorporation of cognitive map inputs. Our findings demonstrate that language-only reasoning yields marginal improvements, while incorporating visual spatial imagination via a 3D scene imagination pipeline substantially improves cross-view reasoning. These results highlight the necessity of explicit visual-spatial representations for robust spatial cognition in VLMs. Our data and code are released at https://huggingface.co/datasets/zlyzlyzly/CVSBench.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.22476
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.22476 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.22476 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.