C-GenReg: Training-Free 3D Point Cloud Registration by Multi-View-Consistent Geometry-to-Image Generation with Probabilistic Modalities Fusion
Abstract
C-GenReg is a training-free 3D point cloud registration framework that uses generative priors and Vision Foundation Models to transfer matching problems to an image domain for improved cross-domain generalization.
We introduce C-GenReg, a training-free framework for 3D point cloud registration that leverages the complementary strengths of world-scale generative priors and registration-oriented Vision Foundation Models (VFMs). Current learning-based 3D point cloud registration methods struggle to generalize across sensing modalities, sampling differences, and environments. Hence, C-GenReg augments the geometric point cloud registration branch by transferring the matching problem into an auxiliary image domain, where VFMs excel, using a World Foundation Model to synthesize multi-view-consistent RGB representations from the input geometry. This generative transfer, preserves spatial coherence across source and target views without any fine-tuning. From these generated views, a VFM pretrained for finding dense correspondences extracts matches. The resulting pixel correspondences are lifted back to 3D via the original depth maps. To further enhance robustness, we introduce a "Match-then-Fuse" probabilistic cold-fusion scheme that combines two independent correspondence posteriors, that of the generated-RGB branch with that of the raw geometric branch. This principled fusion preserves each modality inductive bias and provides calibrated confidence without any additional learning. C-GenReg is zero-shot and plug-and-play: all modules are pretrained and operate without fine-tuning. Extensive experiments on indoor (3DMatch, ScanNet) and outdoor (Waymo) benchmarks demonstrate strong zero-shot performance and superior cross-domain generalization. For the first time, we demonstrate a generative registration framework that operates successfully on real outdoor LiDAR data, where no imagery data is available.
Community
Most 3D point cloud registration methods struggle to generalize across sensors, sampling patterns, and environments—especially when relying on geometry alone.
We introduce C-GenReg, a training-free (zero-shot) framework that tackles this by transferring the problem into the image domain.
The key idea is simple but powerful:
we use a World Foundation Model to generate multi-view consistent RGB images from geometry, apply a strong Vision Foundation Model (VFM) to extract dense correspondences, and then lift these matches back to 3D.
To improve robustness, we introduce a Match-then-Fuse probabilistic scheme that combines image-based and geometric correspondences while preserving each modality’s inductive bias.
The result is a plug-and-play pipeline with strong zero-shot performance, improved cross-domain generalization, and the ability to operate even in challenging real-world LiDAR scenarios where RGB is unavailable.
We evaluate on 3DMatch, ScanNet, and Waymo, demonstrating consistent gains over strong baselines without any additional training.
Curious to hear thoughts, especially on using generative models as a bridge for geometry-only problems.
📖 Citation:
@article {haitman2026cgenreg,
title = {C-GenReg: Training-Free 3D Point Cloud Registration by Multi-View-Consistent Geometry-to-Image Generation with Probabilistic Modalities Fusion},
author = {Haitman, Yuval and Efraim, Amit and Francos, Joseph M.},
journal = {arXiv preprint arXiv:2604.16680},
year = {2026},
doi = {10.48550/arXiv.2604.16680},
url = {https://arxiv.org/abs/2604.16680}
}
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- CMHANet: A cross-modal hybrid attention network for point cloud registration (2026)
- GLASS: Geometry-aware Local Alignment and Structure Synchronization Network for 2D-3D Registration (2026)
- GGPT: Geometry Grounded Point Transformer (2026)
- Contrastive Language-Colored Pointmap Pretraining for Unified 3D Scene Understanding (2026)
- PlanaReLoc: Camera Relocalization in 3D Planar Primitives via Region-Based Structure Matching (2026)
- SegVGGT: Joint 3D Reconstruction and Instance Segmentation from Multi-View Images (2026)
- GeoGuide: Hierarchical Geometric Guidance for Open-Vocabulary 3D Semantic Segmentation (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2604.16680 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper
