AI & Technology Updates

  • Hybrid Retrieval: BM25 + FAISS on t3.medium


    Production Hybrid Retrieval: 48% better accuracy with BM25 + FAISS on a single t3.mediumA hybrid retrieval system has been developed to efficiently serve over 127,000 queries on a single AWS Lightsail instance, combining the precision of BM25 with the semantic understanding of FAISS. This system operates without a GPU for embeddings, though a GPU can be used optionally for reranking to achieve a 3x speedup. The infrastructure is cost-effective, running on a t3.medium instance for approximately $50 per month, and achieves 91% accuracy, significantly outperforming dense-only methods. The hybrid approach effectively handles complex queries by using a four-stage cascade that combines keyword precision with semantic understanding, optimizing latency and accuracy through asynchronous parallel retrieval and batch reranking. This matters because it demonstrates a cost-effective, high-performance solution for query retrieval that balances precision and semantic understanding, crucial for applications requiring accurate and efficient information retrieval.


  • Stability Over Retraining: A New Approach to AI Forgetting


    I experimented with forcing "stability" instead of retraining to fix Catastrophic Forgetting. It worked. Here is the code.An intriguing experiment suggests that neural networks can recover lost functions without retraining on original data, challenging traditional approaches to catastrophic forgetting. By applying a stability operator to restore the system's recursive dynamics, a network was able to regain much of its original accuracy after being destabilized. This finding implies that maintaining a stable topology could lead to the development of self-healing AI agents, potentially more robust and energy-efficient than current models. This matters because it opens the possibility of creating AI systems that do not require extensive data storage for retraining, enhancing their efficiency and resilience.


  • Optimize Your 8+32+ System with Granite 4.0 Small


    Don't sleep on granite 4 small if you got an 8+32+ systemA ThinkPad P15 with 32GB of RAM and an 8GB Quadro GPU, typically only suitable for 7-8 billion parameter models, can efficiently handle larger tasks using Granite 4.0 Small. This model, a hybrid transformer and mamba, maintains speed as context increases, processing a 50-page document (~50.5k tokens) at approximately 7 tokens per second. This performance makes it a practical choice for users needing to manage large data sets without sacrificing speed. Understanding how to optimize hardware with the right models can significantly enhance productivity and efficiency for users with similar setups.


  • Concerns Over AI Model Consistency


    Consistency concern overall models updates.A long-time user of ChatGPT expresses concern about the consistency of OpenAI's model updates, particularly how they affect long-term projects and coding tasks. The updates have reportedly disrupted existing projects, leading to issues like hallucinations and unfulfilled promises from the AI, which undermine trust in the tool. The user suggests that OpenAI's focus on acquiring more users might be compromising the quality and reliability of their models for those with specific needs, pushing them towards more expensive plans. This matters because it highlights the tension between expanding user bases and maintaining reliable, high-quality AI services for existing users.