An interactive demo has been created to explore DeepSeek's mHC paper, addressing the instability in Hyper-Connections caused by the multiplication of learned matrices across multiple layers. This instability results in exponential amplification, reaching values as high as 10^16. The solution involves projecting these matrices onto a doubly stochastic manifold using the Sinkhorn-Knopp algorithm, which ensures that the composite mapping remains bounded, regardless of depth. Surprisingly, just one iteration of the Sinkhorn process is sufficient to stabilize the gain from 10^16 to approximately 1. This matters because it offers a practical method to enhance the stability and performance of deep learning models that utilize Hyper-Connections.
Read Full Article: Interactive Visualization of DeepSeek’s mHC Stability