AI & Technology Updates

  • 15M Param Model Achieves 24% on ARC-AGI-2


    15M param model solving 24% of ARC-AGI-2 (Hard Eval). Runs on consumer hardware.Bitterbot AI has introduced TOPAS-DSPL, a compact recursive model with approximately 15 million parameters, achieving 24% accuracy on the ARC-AGI-2 evaluation set, a significant improvement over the previous state-of-the-art (SOTA) of 8% for models of similar size. The model employs a "Bicameral" architecture, dividing tasks into a Logic Stream for algorithm planning and a Canvas Stream for execution, effectively addressing compositional drift issues found in standard transformers. Additionally, Test-Time Training (TTT) is used to fine-tune the model on specific examples before solution generation. The entire pipeline, including data generation, training, and evaluation, has been open-sourced, allowing for community verification and potential reproduction of results on consumer hardware like the 4090 GPU. This matters because it demonstrates significant advancements in model efficiency and accuracy, making sophisticated AI more accessible and verifiable.


  • The State Of LLMs 2025: Progress, Problems, Predictions


    [P] The State Of LLMs 2025: Progress, Problems, and PredictionsChoosing the right machine learning framework is crucial for development efficiency and model performance. PyTorch and TensorFlow are two of the most recommended frameworks, with TensorFlow being favored in industrial settings due to its robust tools and Keras integration, which simplifies development. However, some users find TensorFlow setup challenging, particularly on Windows due to the lack of native GPU support. Other notable frameworks include JAX, Scikit-Learn, and XGBoost, with various subreddits offering platforms for further discussion and personalized advice from experienced practitioners. This matters because selecting an appropriate machine learning framework can significantly influence the success and efficiency of AI projects.


  • Condé Nast User Database Breach: Ars Unaffected


    Condé Nast User database reportedly breached, Ars unaffectedA hacker named Lovely claimed responsibility for breaching a Condé Nast user database, releasing over 2.3 million user records from WIRED, with plans to leak an additional 40 million records from other Condé Nast properties. The data includes demographic information but no passwords, and Ars Technica remains unaffected due to its unique tech stack. Despite Lovely's claims of urging Condé Nast to fix security vulnerabilities, it appears the hacker's motives were financially driven rather than altruistic. Condé Nast has yet to comment on the breach, and the situation highlights the importance of robust cybersecurity measures to protect user data. This matters because it underscores the ongoing threat of data breaches and the need for companies to prioritize user data security.


  • Federated Fraud Detection with PyTorch


    A Coding Implementation of an OpenAI-Assisted Privacy-Preserving Federated Fraud Detection System from Scratch Using Lightweight PyTorch SimulationsA privacy-preserving fraud detection system is simulated using Federated Learning, allowing ten independent banks to train local fraud-detection models on imbalanced transaction data. The system utilizes a FedAvg aggregation loop to improve a global model without sharing raw transaction data between clients. OpenAI is integrated to provide post-training analysis and risk-oriented reporting, transforming federated learning outputs into actionable insights. This approach emphasizes privacy, simplicity, and real-world applicability, offering a practical blueprint for experimenting with federated fraud models. Understanding and implementing such systems is crucial for enhancing fraud detection while maintaining data privacy.


  • Enhance LLM Plots with LLMPlot.com


    I built LLMPlot.com (free + OSS) to make LLM plots not ugly anymore!LLMPlot.com is a new platform designed to enhance the visual appeal of language model evaluation plots, which are often criticized for their lack of aesthetics. The tool is free and open source, allowing users to input model details, provider, and scores to generate visually appealing comparison plots. These plots are optimized for sharing on social media platforms like X, LinkedIn, and Reddit, making them accessible and engaging for a wider audience. This matters because it improves the communication and understanding of complex data through better visual representation.