FoX

  • Genesis-152M-Instruct: Exploring Hybrid Architectures


    Genesis-152M-Instruct — Hybrid GLA + FoX + Test-Time Training at small scaleGenesis-152M-Instruct is an experimental small-scale language model designed to explore the interplay of recent architectural innovations under tight data constraints, boasting 152 million parameters trained on approximately 2 billion tokens. It integrates hybrid GLA and FoX attention mechanisms, test-time training (TTT) during inference, selective activation via sparse feedforward networks, and µP-scaled training. Despite its small scale, Genesis achieves notable performance on benchmarks like ARC-Easy, BoolQ, and SciQ, demonstrating the potential of architectural strategies to compensate for limited data. The model is fully open-source and invites feedback, particularly from those interested in linear attention, hybrid architectures, or test-time adaptation. This exploration matters as it provides insights into how architectural advancements can enhance model performance even with constrained data resources.

    Read Full Article: Genesis-152M-Instruct: Exploring Hybrid Architectures