Abstract
Manifold-Constrained Hyper-Connections (mHC) stabilize and scale residual connection architectures by restoring identity mapping properties through manifold projection and infrastructure optimization.
Recently, studies exemplified by Hyper-Connections (HC) have extended the ubiquitous residual connection paradigm established over the past decade by expanding the residual stream width and diversifying connectivity patterns. While yielding substantial performance gains, this diversification fundamentally compromises the identity mapping property intrinsic to the residual connection, which causes severe training instability and restricted scalability, and additionally incurs notable memory access overhead. To address these challenges, we propose Manifold-Constrained Hyper-Connections (mHC), a general framework that projects the residual connection space of HC onto a specific manifold to restore the identity mapping property, while incorporating rigorous infrastructure optimization to ensure efficiency. Empirical experiments demonstrate that mHC is effective for training at scale, offering tangible performance improvements and superior scalability. We anticipate that mHC, as a flexible and practical extension of HC, will contribute to a deeper understanding of topological architecture design and suggest promising directions for the evolution of foundational models.
Community
DeepSeek released a new paper proposing a novel architecture called mHC (Manifold-Constrained Hyper-Connections).
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Virtual Width Networks (2025)
- Dynamic Subspace Composition: Efficient Adaptation via Contractive Basis Expansion (2025)
- GMoPE:A Prompt-Expert Mixture Framework for Graph Foundation Models (2025)
- Scaling Bidirectional Spans and Span Violations in Attention Mechanism (2025)
- ROOT: Robust Orthogonalized Optimizer for Neural Network Training (2025)
- Efficient Training of Diffusion Mixture-of-Experts Models: A Practical Recipe (2025)
- Mixture-of-Experts with Gradient Conflict-Driven Subspace Topology Pruning for Emergent Modularity (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
arXiv lens breakdown of this paper ๐ https://arxivlens.com/PaperView/Details/mhc-manifold-constrained-hyper-connections-5913-27498555
- Executive Summary
- Detailed Breakdown
- Practical Applications
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper