OpenAI-compatible

Quill: Open Source Writing Assistant with Prompt Control

Quill is a streamlined open-source background writing assistant designed for users who want more control over prompt engineering. Inspired by Writing Tools, Quill removes certain features like screen capture and a separate chat window to focus on selected text processing, making it compatible with local language models. It allows users to configure parameters and inference settings, and supports any OpenAI-compatible API, such as Ollama and llama.cpp. The user interface is kept simple and readable, though some features from Writing Tools are omitted, which might be missed by some users. Currently, Quill is available only for Windows, and feedback is encouraged to improve its functionality. This matters as it provides writers with a customizable tool that enhances their writing process by integrating local language models and offering greater control over how prompts are managed.

Read Full Article

Posted on

Jan 8, 2026

by

TheTweakedGeek

in

Commentary, Tools

Topics: open source, local LLMs, customization

llama-benchy: Benchmarking for Any LLM Backend

llama-benchy is a command-line benchmarking tool designed to evaluate the performance of language models across various backends, supporting any OpenAI-compatible endpoint. Unlike traditional benchmarking tools, it measures prompt processing and token generation speeds at different context lengths, allowing for a more nuanced understanding of model performance. It offers features like configurable prompt length, generation length, and context depth, and uses HuggingFace tokenizers for accurate token counts. This tool addresses limitations in existing benchmarking solutions by providing detailed metrics such as time to first response and end-to-end time to first token, making it highly useful for developers working with multiple inference engines. Why this matters: It enables developers to comprehensively assess and compare the performance of language models across different platforms, leading to more informed decisions in model deployment and optimization.

Read Full Article

Posted on

Jan 6, 2026

by

TweakedGeek

in

Benchmarking, Deep Dives

Topics: language models, benchmarking, model optimization

Local Image Edit API Server for OpenAI-Compatible Models

A new API server allows users to create and edit images entirely locally, supporting OpenAI-compatible formats for seamless integration with local interfaces like OpenWebUI. The server, now in version 3.0.0, enhances functionality by supporting multiple images in a single request, enabling advanced features like image blending and style transfer. Additionally, it offers video generation capabilities using optimized models that require less RAM, such as diffusers/FLUX.2-dev-bnb-4bit, and includes features like a statistics endpoint and intelligent batching. This development is significant for users seeking privacy and efficiency in image processing tasks without relying on external servers.