Tools
-
15M Param Model Achieves 24% on ARC-AGI-2
Read Full Article: 15M Param Model Achieves 24% on ARC-AGI-2
Bitterbot AI has introduced TOPAS-DSPL, a compact recursive model with approximately 15 million parameters, achieving 24% accuracy on the ARC-AGI-2 evaluation set, a significant improvement over the previous state-of-the-art (SOTA) of 8% for models of similar size. The model employs a "Bicameral" architecture, dividing tasks into a Logic Stream for algorithm planning and a Canvas Stream for execution, effectively addressing compositional drift issues found in standard transformers. Additionally, Test-Time Training (TTT) is used to fine-tune the model on specific examples before solution generation. The entire pipeline, including data generation, training, and evaluation, has been open-sourced, allowing for community verification and potential reproduction of results on consumer hardware like the 4090 GPU. This matters because it demonstrates significant advancements in model efficiency and accuracy, making sophisticated AI more accessible and verifiable.
-
The State Of LLMs 2025: Progress, Problems, Predictions
Read Full Article: The State Of LLMs 2025: Progress, Problems, Predictions
Choosing the right machine learning framework is crucial for development efficiency and model performance. PyTorch and TensorFlow are two of the most recommended frameworks, with TensorFlow being favored in industrial settings due to its robust tools and Keras integration, which simplifies development. However, some users find TensorFlow setup challenging, particularly on Windows due to the lack of native GPU support. Other notable frameworks include JAX, Scikit-Learn, and XGBoost, with various subreddits offering platforms for further discussion and personalized advice from experienced practitioners. This matters because selecting an appropriate machine learning framework can significantly influence the success and efficiency of AI projects.
-
New SSM Architecture Exceeds Transformer Baseline
Read Full Article: New SSM Architecture Exceeds Transformer Baseline
Recent advancements in sequence modeling have introduced a new State Space Model (SSM) architecture that surpasses traditional Transformers by addressing their O(L^2) complexity limitation for long sequences. By integrating delta-rule updates with the powerful representational capabilities of gated convolutions, this new architecture achieves O(n) complexity, making it a strong baseline for sequence modeling tasks. The architecture not only matches but exceeds the performance and speed of Transformers, even with relatively short sequence lengths, thanks to the use of mildly optimized Triton kernels. This development is significant as it provides a more efficient and scalable solution for processing long sequences in natural language processing and other domains.
-
LLM Price Tracker & Cost Calculator
Read Full Article: LLM Price Tracker & Cost Calculator
A new tool has been developed to help users keep track of pricing differences across over 2100 language models from various providers. This tracker not only aggregates model prices but also includes a simple cost calculator to estimate expenses. It updates every six hours, ensuring users have the latest information, and is published as a static site on GitHub pages, making it accessible for automation and programmatic use. This matters because it simplifies the process of comparing and managing costs for those using language models, potentially saving time and money.
-
AI Agent Executes 100,000 Tasks with One Prompt
Read Full Article: AI Agent Executes 100,000 Tasks with One Prompt
An innovative AI feature called "Scale Mode" enables a single prompt to execute thousands of coordinated tasks autonomously, such as visiting numerous links to collect data or processing extensive documents. This capability allows for efficient handling of large-scale operations, including generating and enriching B2B leads and processing invoices. The feature is designed to be versatile, complementing a wide range of tasks by simply adding "Do it in Scale Mode" to the prompt. This advancement in AI technology showcases the potential for increased productivity and automation in various industries. Why this matters: Scale Mode represents a significant leap in AI capabilities, offering businesses the ability to automate and efficiently manage large volumes of tasks, which can lead to time savings and increased operational efficiency.
-
Benchmarking Speech-to-Text Models for Medical Dialogue
Read Full Article: Benchmarking Speech-to-Text Models for Medical Dialogue
A comprehensive benchmarking of 26 speech-to-text (STT) models was conducted on long-form medical dialogue using the PriMock57 dataset, consisting of 55 files and over 81,000 words. The models were ranked based on their average Word Error Rate (WER), with Google Gemini 2.5 Pro leading at 10.79% and Parakeet TDT 0.6B v3 emerging as the top local model at 11.9% WER. The evaluation also considered processing time per file and noted issues such as repetition-loop failures in some models, which required chunking to mitigate. The full evaluation, including code and a complete leaderboard, is available on GitHub, providing valuable insights for developers working on medical transcription technology. This matters because accurate and efficient STT models are crucial for improving clinical documentation and reducing the administrative burden on healthcare professionals.
-
botchat: Privacy-Preserving Multi-Bot AI Chat Tool
Read Full Article: botchat: Privacy-Preserving Multi-Bot AI Chat Tool
botchat is a newly launched tool designed for users who engage with multiple AI language models simultaneously while prioritizing privacy. It allows users to assign different personas to bots, enabling diverse perspectives on a single query and capitalizing on the unique strengths of various models within the same conversation. Importantly, botchat emphasizes data protection by ensuring that conversations and attachments are not stored on any servers, and when using the default keys, user data is not retained by AI providers for model training. This matters because it offers a secure and versatile platform for interacting with AI, addressing privacy concerns while enhancing user experience with multiple AI models.
-
CNN in x86 Assembly: Cat vs Dog Classifier
Read Full Article: CNN in x86 Assembly: Cat vs Dog Classifier
An ambitious project involved implementing a Convolutional Neural Network (CNN) from scratch in x86-64 assembly to classify images of cats and dogs, using a dataset of 25,000 RGB images. The project aimed to deeply understand CNNs by focusing on low-level operations such as memory layout, data movement, and SIMD arithmetic, without relying on any machine learning frameworks or libraries. Key components like Conv2D, MaxPool, Dense layers, activations, forward and backward propagation, and the data loader were developed in pure assembly, achieving a performance approximately 10 times faster than a NumPy version. Despite the challenges of debugging at this scale, the implementation successfully runs inside a lightweight Debian Slim Docker container, showcasing a unique blend of low-level programming and machine learning. This matters because it demonstrates the potential for significant performance improvements in neural networks through low-level optimizations.
