AI & Technology Updates

  • Scaling to 11M Embeddings: Product Quantization Success


    Scaling to 11 Million Embeddings: How Product Quantization Saved My Vector InfrastructureHandling 11 million embeddings in a large-scale knowledge graph project presented significant challenges in terms of storage, cost, and performance. The Gemini-embeddings-001 model was chosen for its strong semantic representations, but its high dimensionality led to substantial storage requirements. Storing these embeddings in Neo4j resulted in a prohibitive monthly cost of $32,500 due to the high memory footprint. To address this, Product Quantization (PQ), specifically PQ64, was implemented, reducing storage needs by approximately 192 times, bringing the total storage requirement to just 0.704 GB. While there are concerns about retrieval accuracy with such compression, PQ64 maintained a recall@10 of 0.92, with options like PQ128 available for even higher accuracy. This matters because it demonstrates a scalable and cost-effective approach to managing large-scale vector data without significantly compromising performance.


  • Devstral Small 2 on RTX 5060 Ti: Local AI Coding Setup


    Devstral Small 2 (Q4_K_M) on 5060 Ti 16GB and Zed Agent is amazing!The setup featuring an RTX 5060 Ti 16GB and 32GB DDR5-6000 RAM, paired with the Devstral Small 2 model, offers impressive local AI coding capabilities without the need for RAM offloading. This configuration excels in maintaining a good token generation speed by fitting everything within the GPU's VRAM, effectively using the Zed Editor with Zed Agent for efficient code exploration and execution. Despite initial skepticism about handling a dense 24B model, the setup proves capable of generating and refining code, particularly when provided with detailed instructions, and operates at a cool temperature with minimal noise. This matters as it demonstrates the potential for high-performance local AI development without resorting to expensive hardware upgrades.


  • Introducing the nanoRLHF Project


    Introducing nanoRLHF project!nanoRLHF is a project designed to implement core components of Reinforcement Learning from Human Feedback (RLHF) using PyTorch and Triton. It offers educational reimplementations of large-scale systems, focusing on clarity and core concepts rather than efficiency. The project includes minimal Python implementations and custom Triton kernels, such as Flash Attention, and provides training pipelines using open-source math datasets to train a Qwen3 model. This initiative serves as a valuable learning resource for those interested in understanding the internal workings of RL training frameworks. Understanding RLHF is crucial as it enhances AI systems' ability to learn from human feedback, improving their performance and adaptability.


  • Improving RAG Systems with Semantic Firewalls


    RAG is lazy. We need to stop treating the context window like a junk drawer.In the GenAI space, the common approach to building Retrieval-Augmented Generation (RAG) systems involves embedding data, performing a semantic search, and stuffing the context window with top results. This approach often leads to confusion as it fills the model with technically relevant but contextually useless data. A new method called "Scale by Subtraction" proposes using a deterministic Multidimensional Knowledge Graph to filter out noise before the language model processes the data, significantly reducing noise and hallucination risk. By focusing on critical and actionable items, this method enhances the model's efficiency and accuracy, offering a more streamlined approach to RAG systems. This matters because it addresses the inefficiencies in current RAG systems, improving the accuracy and reliability of AI-generated responses.


  • NASA Orders Medical Evacuation from ISS


    NASA orders “controlled medical evacuation” from the International Space StationNASA has decided to conduct a "controlled medical evacuation" of four crew members from the International Space Station after one experienced a medical issue. The affected astronaut, part of the Crew-11 mission, is reportedly stable, but NASA is prioritizing caution by returning the entire crew to Earth earlier than planned. The Crew-11 team, which includes commander Zena Cardman, pilot Mike Fincke, Japanese astronaut Kimiya Yui, and Russian cosmonaut Oleg Platonov, will return via the SpaceX Crew Dragon spacecraft. NASA emphasizes that the health and well-being of astronauts remain their highest priority, maintaining privacy about the specific medical condition. This matters because it underscores NASA's commitment to astronaut safety and the complexities involved in managing health issues in space.