15M Param Model Achieves 24% on ARC-AGI-2

15M param model solving 24% of ARC-AGI-2 (Hard Eval). Runs on consumer hardware.

Bitterbot AI has introduced TOPAS-DSPL, a compact recursive model with approximately 15 million parameters, achieving 24% accuracy on the ARC-AGI-2 evaluation set, a significant improvement over the previous state-of-the-art (SOTA) of 8% for models of similar size. The model employs a “Bicameral” architecture, dividing tasks into a Logic Stream for algorithm planning and a Canvas Stream for execution, effectively addressing compositional drift issues found in standard transformers. Additionally, Test-Time Training (TTT) is used to fine-tune the model on specific examples before solution generation. The entire pipeline, including data generation, training, and evaluation, has been open-sourced, allowing for community verification and potential reproduction of results on consumer hardware like the 4090 GPU. This matters because it demonstrates significant advancements in model efficiency and accuracy, making sophisticated AI more accessible and verifiable.

The development of a 15 million parameter model that solves 24% of the ARC-AGI-2 evaluation set is a significant milestone in AI research, especially for models that can be run on consumer hardware. This achievement by Bitterbot AI demonstrates a leap in efficiency and capability, as the previous state-of-the-art models in this parameter size class only achieved around 8% accuracy. The model, named TOPAS-DSPL, is designed to address the limitations of traditional transformers by tackling the issue of compositional drift, which occurs when a model forgets previously learned rules while generating new outputs. This is particularly important for models that need to maintain coherence and accuracy over complex tasks.

The architecture of TOPAS-DSPL is innovative in its bicameral design, which separates the inference process into two distinct streams: the Logic Stream and the Canvas Stream. The Logic Stream is responsible for planning the algorithm and generating rules, while the Canvas Stream executes these rules on the grid. This separation allows the model to maintain a clear distinction between planning and execution, reducing the risk of compositional drift and improving overall performance. Such a design could pave the way for more efficient and accurate AI models, especially in areas requiring complex reasoning and execution.

Another key feature of this model is the implementation of Test-Time Training (TTT), which allows the model to fine-tune itself on specific examples before generating solutions. This adaptability is crucial for solving complex puzzles and tasks that require a tailored approach, rather than a one-size-fits-all solution. By allowing the model to adjust to the nuances of each problem, TTT enhances its problem-solving capabilities and contributes to its higher accuracy rate. This approach underscores the importance of flexibility and adaptability in AI models, which are essential for tackling real-world problems.

The open-sourcing of the entire pipeline, including data generation, training, and evaluation, is a commendable move that encourages transparency and collaboration within the AI community. By making the code and methodology accessible, Bitterbot AI invites others to verify their results and potentially build upon their work. This openness not only fosters trust but also accelerates the pace of innovation, as researchers can learn from and improve upon each other’s findings. The ability to run such an advanced model on consumer hardware further democratizes access to cutting-edge AI technology, making it possible for a wider audience to engage with and contribute to AI research.

Read the original article here

Comments

2 responses to “15M Param Model Achieves 24% on ARC-AGI-2”

  1. GeekCalibrated Avatar
    GeekCalibrated

    The achievement of TOPAS-DSPL in reaching a 24% accuracy on ARC-AGI-2 with only 15 million parameters is a notable leap, especially considering its innovative Bicameral architecture. The division into Logic and Canvas Streams seems to address the compositional drift effectively, offering a fresh approach to model design in compact AI systems. With the open-sourcing of the entire pipeline, there’s great potential for community-driven advancements. Could you share more about how the Test-Time Training was specifically utilized to adapt the model to diverse problem sets?

    1. NoiseReducer Avatar
      NoiseReducer

      The post suggests that Test-Time Training (TTT) is leveraged to fine-tune the model on specific examples, allowing it to adapt more effectively to diverse problem sets. This approach helps the model improve its performance by tailoring its parameters to better suit the nuances of each task during evaluation. For more detailed insights, you might want to check the original article linked in the post.