Relational Visual Similarity
Paper
•
2512.07833
•
Published
•
24
![]() |
|---|
TL;DR: We introduce a new visual similarity notion: relational visual similarity, which complements traditional attribute-based perceptual similarity (e.g., LPIPS, CLIP, DINO).
This code is tested on Python 3.10: (i) NVIDIA A100 80GB (torch2.5.1+cu124) and (ii) NVIDIA RTX A6000 48GB (torch2.9.1+cu128).
Other hardware setup hasn't been tested, but it should still work. Please install pytorch and torchvision according to your machine configuration.
conda create -n relsim python=3.10
pip install relsim
# or you can clone the repo
git clone https://github.com/thaoshibe/relsim.git
cd relsim
pip install -r requirements.txt
Given two images, you can compute their relational visual similarity (relsim) like this:
from relsim.relsim_score import relsim
from PIL import Image
# Load model
model, preprocess = relsim(pretrained=True, checkpoint_dir="thaoshibe/relsim-qwenvl25-lora")
img1 = preprocess(Image.open("image_path_1"))
img2 = preprocess(Image.open("image_path_2"))
similarity = model(img1, img2) # Returns similarity score (higher = more similar)
print(f"relational similarity score: {similarity:.3f}")
For more details, training code, data, etc. please visit: thaoshibe/relsim
@misc{nguyen2025relationalvisualsimilarity,
title={Relational Visual Similarity},
author={Thao Nguyen and Sicheng Mo and Krishna Kumar Singh and Yilin Wang and Jing Shi and Nicholas Kolkin and Eli Shechtman and Yong Jae Lee and Yuheng Li},
year={2025},
eprint={2512.07833},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2512.07833},
}
Base model
Qwen/Qwen2.5-VL-7B-Instruct