How-Tos

  • EdgeVec v0.7.0: Fast Browser-Native Vector Database


    [P] EdgeVec v0.7.0: Browser-Native Vector Database with 8.75x Faster Hamming Distance via SIMDEdgeVec is an open-source vector database designed to run entirely in the browser using WebAssembly, offering significant performance improvements in its latest version, v0.7.0. The update includes an 8.75x speedup in Hamming distance calculations through SIMD optimizations, a 32x memory reduction via binary quantization, and a 3.2x acceleration in Euclidean distance computations. EdgeVec enables browser-based applications to perform semantic searches and retrieval-augmented generation without server dependencies, ensuring privacy, reducing latency, and eliminating hosting costs. These advancements make it feasible to handle large vector indices in-browser, supporting offline-first AI tools and enhancing user experience in web applications. Why this matters: EdgeVec's advancements in browser-native vector databases enhance privacy, reduce latency, and lower costs, making sophisticated AI applications more accessible and efficient for developers and users alike.

    Read Full Article: EdgeVec v0.7.0: Fast Browser-Native Vector Database

  • Federated Fraud Detection with PyTorch


    A Coding Implementation of an OpenAI-Assisted Privacy-Preserving Federated Fraud Detection System from Scratch Using Lightweight PyTorch SimulationsA privacy-preserving fraud detection system is simulated using Federated Learning, allowing ten independent banks to train local fraud-detection models on imbalanced transaction data. The system utilizes a FedAvg aggregation loop to improve a global model without sharing raw transaction data between clients. OpenAI is integrated to provide post-training analysis and risk-oriented reporting, transforming federated learning outputs into actionable insights. This approach emphasizes privacy, simplicity, and real-world applicability, offering a practical blueprint for experimenting with federated fraud models. Understanding and implementing such systems is crucial for enhancing fraud detection while maintaining data privacy.

    Read Full Article: Federated Fraud Detection with PyTorch

  • Pagesource: CLI Tool for Web Dev with LLM Context


    Pagesource - CLI tool to dump website runtime sources for local LLM contextPagesource is a command-line tool designed to capture and dump the runtime sources of a website, providing a more accurate representation of the site's structure for local language model (LLM) context. Unlike the traditional "Save As" feature in browsers that flattens the webpage into a single HTML file, Pagesource preserves the actual file structure, including separate JavaScript modules, CSS files, and lazy-loaded resources. Built on Playwright, it allows developers to access all dynamically loaded JS modules and maintain the original directory structure, making it particularly useful for web developers who need to replicate or analyze website components effectively. This matters because it enhances the ability to work with LLMs by providing them with a more detailed and accurate context of web resources.

    Read Full Article: Pagesource: CLI Tool for Web Dev with LLM Context

  • Optimizing GLM-4.7 on 2015 CPU-Only Hardware


    Running GLM-4.7 (355B MoE) in Q8 at ~5 Tokens/s on 2015 CPU-Only Hardware – Full Optimization GuideRunning the massive 355B parameter GLM-4.7 Mixture of Experts model on a 2015 Lenovo System x3950 X6 with eight Xeon E7-8880 v3 CPUs showcases the potential of older hardware for local large language models. By using Q8_0 quantization, the model maintains high-quality outputs with minimal degradation, achieving around 5-6 tokens per second without a GPU. Key optimizations include BIOS tweaks, NUMA node distribution, llama.cpp forks for MoE architecture, and Linux kernel adjustments, although the setup is power-intensive, drawing about 1300W AC. This approach is ideal for homelab enthusiasts or those lacking modern GPUs, offering a viable solution for running large models locally. This matters because it demonstrates how older hardware can still be leveraged effectively for advanced AI tasks, expanding access to powerful models without the need for cutting-edge technology.

    Read Full Article: Optimizing GLM-4.7 on 2015 CPU-Only Hardware

  • Roadmap: Software Developer to AI Engineer


    From Software Developer to AI Engineer: The Exact Roadmap I Followed (Projects + Interviews)Transitioning from a software developer to an AI engineer involves a structured roadmap that leverages existing coding skills while diving into machine learning and AI technologies. The journey spans approximately 18 months, with phases covering foundational knowledge, core machine learning and deep learning, modern AI practices, MLOps, and deployment. Key resources include free online courses, practical projects, and structured programs for accountability. The focus is on building real-world applications and gaining practical experience, which is crucial for job readiness and successful interviews. This matters because it provides a practical, achievable pathway for developers looking to pivot into the rapidly growing field of AI engineering without needing advanced degrees.

    Read Full Article: Roadmap: Software Developer to AI Engineer

  • Automate Time-Series Data Cleaning with DataSetIQ


    [Resource] A library to practice Time-Series ML without spending hours cleaning dataPracticing time-series forecasting or regression often involves the challenging task of cleaning economic data, such as aligning dates and handling missing values. The DataSetIQ Python client simplifies this process with its new helper function, get_ml_ready, which automates data pre-processing. This function is particularly useful for quickly generating feature matrices to test models like LSTM and XGBoost on real-world economic data. By streamlining data preparation, it allows users to focus more on model testing and less on data cleaning.

    Read Full Article: Automate Time-Series Data Cleaning with DataSetIQ

  • EntropyGuard: Local CLI for Data Deduplication


    I built a free local CLI to clean/dedup data BEFORE sending it to the API (Saved me ~$500/mo).To reduce API costs and improve data processing efficiency, a new open-source CLI tool called EntropyGuard was developed for local data cleaning and deduplication. It addresses the issue of duplicate content in document chunks, which can inflate token usage and costs when using services like OpenAI. The tool employs two stages of deduplication: exact deduplication using xxHash and semantic deduplication with local embeddings and FAISS. This approach has demonstrated significant cost savings, reducing dataset sizes by approximately 40% and enhancing retrieval quality by eliminating redundant information. This matters because it offers a cost-effective solution for optimizing data handling without relying on expensive enterprise platforms or cloud services.

    Read Full Article: EntropyGuard: Local CLI for Data Deduplication

  • Streamline ML Serving with Infrastructure Boilerplate


    Production ML Serving Boilerplate - Skip the Infrastructure SetupAn MLOps engineer has developed a comprehensive infrastructure boilerplate for model serving, designed to streamline the transition from a trained model to a production API. The stack includes tools like MLflow for model registry, FastAPI for inference API, and a combination of PostgreSQL, Redis, and MinIO for data handling, all orchestrated through Kubernetes with Docker Desktop K8s. Key features include ensemble predictions, hot model reloading, and stage-based deployment, enabling efficient model versioning and production-grade health probes. The setup offers a quick deployment process with a 5-minute setup via Docker and a one-command Kubernetes deployment, aiming to address common pain points in ML deployment workflows. This matters because it simplifies and accelerates the deployment of machine learning models into production environments, which is often a complex and time-consuming process.

    Read Full Article: Streamline ML Serving with Infrastructure Boilerplate

  • aichat: Efficient Session Management Tool


    aichat: Claude-Code/Codex-CLI tool for fast full-text session search, and continue work without compactionThe aichat tool enhances productivity in Claude-Code or Codex-CLI sessions by allowing users to continue their work without the need for compaction, which often results in the loss of important details. By using the >resume trigger, users can seamlessly continue their work through three modes: blind trim, smart-trim, and rollover, each offering different ways to manage session context. The tool also features a super-fast Rust/Tantivy-based full-text search for retrieving context from past sessions, making it easier to find and continue previous work. This functionality is particularly valuable for users who frequently hit context limits in their sessions and need efficient ways to manage and retrieve session data. This matters because it offers a practical solution to maintain workflow continuity and efficiency in environments with limited context capacity.

    Read Full Article: aichat: Efficient Session Management Tool

  • AI-Doomsday-Toolbox: Distributed Inference & Workflows


    AI-Doomsday-Toolbox Distributed inference + workflowsThe AI Doomsday Toolbox v0.513 introduces significant updates, enabling the distribution of large AI models across multiple devices using a master-worker setup via llama.cpp. This update allows users to manually add workers and allocate RAM and layer proportions per device, enhancing the flexibility and efficiency of model execution. New features include the ability to transcribe and summarize audio and video content, generate and upscale images in a single workflow, and share media directly to transcription workflows. Additionally, models and ZIM files can now be used in-place without copying, though this requires All Files Access permission. Users should uninstall previous versions due to a database schema change. These advancements make AI processing more accessible and efficient, which is crucial for leveraging AI capabilities in everyday applications.

    Read Full Article: AI-Doomsday-Toolbox: Distributed Inference & Workflows