cross-attention

Grafted Titans: Enhancing LLMs with Neural Memory

An experiment with Test-Time Training (TTT) aimed to replicate Google's "Titans" architecture by grafting a trainable memory module onto a frozen open-weight model, Qwen-2.5-0.5B, using consumer-grade hardware. This new architecture, called "Grafted Titans," appends memory embeddings to the input layer through a trainable cross-attention gating mechanism, allowing the memory to update while the base model remains static. In tests using the BABILong benchmark, the Grafted Titans model achieved 44.7% accuracy, outperforming the vanilla Qwen model's 34.0% accuracy by acting as a denoising filter. However, the model faces limitations such as signal dilution and susceptibility to input poisoning, and further research is needed to address these issues. This matters because it explores innovative ways to enhance neural network performance without extensive computational resources, potentially democratizing access to advanced AI capabilities.
Read Full Article
Read Full Article: Grafted Titans: Enhancing LLMs with Neural Memory

Posted on

Jan 5, 2026

by

NoHypeTech

in

Deep Dives, Tools

Topics: AI advancements, AI performance, language models