How-Tos

  • AI Agent for Quick Data Analysis & Visualization


    AI Agent to analyze + visualize data in <1 minAn AI agent has been developed to efficiently analyze and visualize data in under one minute, significantly streamlining the data analysis process. By copying the NYC Taxi Trips dataset to its workspace, the agent reads relevant files, writes and executes analysis code, and plots relationships between multiple features. It also creates an interactive map of trips in NYC, showcasing its capability to handle complex data visualization tasks. This advancement highlights the potential for AI tools to enhance productivity and accessibility in data analysis, reducing reliance on traditional methods like Jupyter notebooks.

    Read Full Article: AI Agent for Quick Data Analysis & Visualization

  • Web Control Center for llama.cpp


    I built a web control centre for llama.cpp with automatic parameter recommendationsA new web control center has been developed for managing llama.cpp instances more efficiently, addressing common issues such as optimal parameter calculation, port management, and log access. It features automatic hardware detection to recommend optimal settings like n_ctx, n_gpu_layers, and n_threads, and allows for multi-server management with a user-friendly interface. The system includes a built-in chat interface, performance benchmarking, and real-time log streaming, all built on a FastAPI backend and Vanilla JS frontend. The project seeks feedback on parameter recommendations, testing on various hardware setups, and ideas for enterprise features, with potential for future monetization through GitHub Sponsors and Pro features. This matters because it streamlines the management of llama.cpp instances, enhancing efficiency and performance for users.

    Read Full Article: Web Control Center for llama.cpp

  • Guide: Running Llama.cpp on Android


    Llama.cpp running on Android with Snapdragon 888 and 8GB of ram. Compiled/Built on device. [Guide/Tutorial]Running Llama.cpp on an Android device with a Snapdragon 888 and 8GB of RAM involves a series of steps beginning with downloading Termux from F-droid. After setting up Termux, the process includes cloning the Llama.cpp repository, installing necessary packages like cmake, and building the project. Users need to select a quantized model from HuggingFace, preferably a 4-bit version, and configure the server command in Termux to launch the model. Once the server is running, it can be accessed via a web browser by navigating to 'localhost:8080'. This guide is significant as it enables users to leverage advanced AI models on mobile devices, enhancing accessibility and flexibility for developers and enthusiasts.

    Read Full Article: Guide: Running Llama.cpp on Android

  • Rewind-cli: Ensuring Determinism in Local LLM Runs


    CLI tool to enforce determinism in local LLM runsRewind-cli is a new tool designed to ensure determinism in local LLM automation scripts by acting as a black-box recorder for terminal executions. It captures the output, error messages, and exit codes into a local folder and performs a strict byte-for-byte comparison on subsequent runs to detect any variations. Written in Rust, it operates entirely locally without relying on cloud services, which enhances privacy and control. The tool also supports a YAML mode for running test suites, making it particularly useful for developers working with llama.cpp and similar projects. This matters because it helps maintain consistency and reliability in automated processes, crucial for development and testing environments.

    Read Full Article: Rewind-cli: Ensuring Determinism in Local LLM Runs

  • LLMeQueue: Efficient LLM Request Management


    LLMeQueue: let me queue LLM requests from my GPU - local or over the internetLLMeQueue is a proof-of-concept project designed to efficiently handle large volumes of requests for generating embeddings and chat completions using a locally available NVIDIA GPU. The setup involves a lightweight public server that receives requests, which are then processed by a local worker connected to the server. This worker, capable of concurrent processing, uses the GPU to execute tasks in the OpenAI API format, with llama3.2:3b as the default model, although other models can be specified if available in the worker’s Ollama environment. LLMeQueue aims to streamline the process of managing and processing AI requests by leveraging local resources effectively. This matters because it offers a scalable solution for developers needing to handle high volumes of AI tasks without relying solely on external cloud services.

    Read Full Article: LLMeQueue: Efficient LLM Request Management

  • Building LLMs: Evaluation & Deployment


    Part 4 (Finale): Building LLMs from Scratch – Evaluation & Deployment [Follow-up to Parts 1, thru 3]The final installment in the series on building language models from scratch focuses on the crucial phase of evaluation, testing, and deployment. It emphasizes the importance of validating trained models through a practical evaluation framework that includes both quick and comprehensive checks beyond just perplexity. Key tests include historical accuracy, linguistic checks, temporal consistency, and performance sanity checks. Deployment strategies involve using CI-like smoke checks on CPUs to ensure models are reliable and reproducible. This phase is essential because training a model is only half the battle; without thorough evaluation and a repeatable publishing workflow, models risk being unreliable and unusable.

    Read Full Article: Building LLMs: Evaluation & Deployment

  • Rendrflow Update: Enhanced AI Performance & Stability


    [Project Update] I improved the On-Device AI performance of Rendrflow based on your feedback (Fixed memory leaks & 10x faster startup)The recent update to Rendrflow, an on-device AI image upscaling tool for Android, addresses critical user feedback by enhancing memory management and significantly improving startup times. Memory usage for "High" and "Ultra" upscaling models has been optimized to prevent crashes on devices with lower RAM, while the initialization process has been refactored for a tenfold increase in speed. Stability issues, such as the "Gallery Sharing" bug and navigation loops, have been resolved, and the tool now supports 10 languages for broader accessibility. These improvements demonstrate the feasibility of performing high-quality AI upscaling privately and offline on mobile devices, eliminating the need for cloud-based solutions.

    Read Full Article: Rendrflow Update: Enhanced AI Performance & Stability

  • Cook High Quality Custom GGUF Dynamic Quants Online


    🍳 Cook High Quality Custom GGUF Dynamic Quants — right from your web browserA new web front-end has been developed to simplify the process of creating high-quality dynamic GGUF quants, eliminating the need for command-line interaction. This browser-based tool allows users to upload or select calibration/deg CSVs, adjust advanced settings through an intuitive user interface, and quickly export a custom .recipe tailored to their hardware. The process involves three easy steps: generating a GGUF recipe, downloading the GGUF files, and running them on any GGUF-compatible runtime. This approach makes GGUF quantization more accessible by removing the complexities associated with terminal use and dependency management. This matters because it democratizes access to advanced quantization tools, making them usable for a wider audience without technical barriers.

    Read Full Article: Cook High Quality Custom GGUF Dynamic Quants Online

  • Building a Self-Testing Agentic AI System


    A Coding Implementation to Build a Self-Testing Agentic AI System Using Strands to Red-Team Tool-Using Agents and Enforce Safety at RuntimeAn advanced red-team evaluation harness is developed using Strands Agents to test the resilience of tool-using AI systems against prompt-injection and tool-misuse attacks. The system orchestrates multiple agents to generate adversarial prompts, execute them against a guarded target agent, and evaluate responses using structured criteria. This approach ensures a comprehensive and repeatable safety evaluation by capturing tool usage, detecting secret leaks, and scoring refusal quality. By integrating these evaluations into a structured report, the framework highlights systemic weaknesses and guides design improvements, demonstrating the potential of agentic AI systems to maintain safety and robustness under adversarial conditions. This matters because it provides a systematic method for ensuring AI systems remain secure and reliable as they evolve.

    Read Full Article: Building a Self-Testing Agentic AI System