OpenAI API

  • LLM-Shield: Privacy Proxy for Cloud LLMs


    LLM-Shield: Privacy proxy - masks PII or routes to local LLMLLM-Shield is a privacy proxy designed for those using cloud-based language models while concerned about client data privacy. It offers two modes: Mask Mode, which anonymizes personal identifiable information (PII) such as emails and names before sending data to OpenAI, and Route Mode, which keeps PII local by routing it to a local language model. The tool supports various PII types across 24 languages with automatic detection, utilizing Microsoft Presidio. Easily integrated with applications using the OpenAI API, LLM-Shield is open-sourced and includes a dashboard for monitoring. Future enhancements include a Chrome extension for ChatGPT and PDF/attachment masking. This matters because it provides a solution for maintaining data privacy when leveraging powerful cloud-based AI tools.

    Read Full Article: LLM-Shield: Privacy Proxy for Cloud LLMs

  • 30x Real-Time Transcription on CPU with Parakeet


    Achieving 30x Real-Time Transcription on CPU . Multilingual STT Openai api endpoint compatible. Plug and play in Open-webui - ParakeetAchieving remarkable speeds in real-time transcription on CPUs, a new setup using NVIDIA Parakeet TDT 0.6B V3 in ONNX format outperforms previous benchmarks, processing one minute of audio in just two seconds on an i7-12700KF. This multilingual model supports 25 languages, including English, Spanish, and French, with impressive accuracy and punctuation capabilities, surpassing Whisper Large V3 in some cases. Users can easily integrate this technology into projects compatible with the OpenAI API, thanks to a developed frontend and API endpoint. This advancement highlights significant progress in CPU-based transcription, offering faster and more efficient solutions for multilingual speech-to-text applications.

    Read Full Article: 30x Real-Time Transcription on CPU with Parakeet

  • Web Control Center for llama.cpp


    I built a web control centre for llama.cpp with automatic parameter recommendationsA new web control center has been developed for managing llama.cpp instances more efficiently, addressing common issues such as optimal parameter calculation, port management, and log access. It features automatic hardware detection to recommend optimal settings like n_ctx, n_gpu_layers, and n_threads, and allows for multi-server management with a user-friendly interface. The system includes a built-in chat interface, performance benchmarking, and real-time log streaming, all built on a FastAPI backend and Vanilla JS frontend. The project seeks feedback on parameter recommendations, testing on various hardware setups, and ideas for enterprise features, with potential for future monetization through GitHub Sponsors and Pro features. This matters because it streamlines the management of llama.cpp instances, enhancing efficiency and performance for users.

    Read Full Article: Web Control Center for llama.cpp

  • LLMeQueue: Efficient LLM Request Management


    LLMeQueue: let me queue LLM requests from my GPU - local or over the internetLLMeQueue is a proof-of-concept project designed to efficiently handle large volumes of requests for generating embeddings and chat completions using a locally available NVIDIA GPU. The setup involves a lightweight public server that receives requests, which are then processed by a local worker connected to the server. This worker, capable of concurrent processing, uses the GPU to execute tasks in the OpenAI API format, with llama3.2:3b as the default model, although other models can be specified if available in the worker’s Ollama environment. LLMeQueue aims to streamline the process of managing and processing AI requests by leveraging local resources effectively. This matters because it offers a scalable solution for developers needing to handle high volumes of AI tasks without relying solely on external cloud services.

    Read Full Article: LLMeQueue: Efficient LLM Request Management

  • Infer: A CLI Tool for Piping into LLMs


    made a simple CLI tool to pipe anything into an LLM. that follows unix philosophy.Infer is a newly developed command-line interface tool that allows users to pipe command outputs directly into a large language model (LLM) for analysis, similar to how grep is used for text searching. By integrating with OpenAI-compatible APIs, users can ask questions about their command outputs, such as identifying processes consuming RAM or checking for hardware errors, without manually copying and pasting logs. The tool is lightweight, consisting of less than 200 lines of C code, and outputs plain text, making it a practical solution for debugging and command recall. This innovation simplifies the interaction with LLMs, enhancing productivity and efficiency in managing command-line tasks.

    Read Full Article: Infer: A CLI Tool for Piping into LLMs

  • Physician’s 48-Hour NLP Journey in Healthcare AI


    [P] Physician → NLP in 48 hours: Building a clinical signal extraction pipeline during my December breakA psychiatrist with an engineering background embarked on a journey to learn natural language processing (NLP) and develop a clinical signal extraction tool for C-SSRS/PHQ-9 assessments within 48 hours. Despite initial struggles with understanding machine learning concepts and tools, the physician successfully created a working prototype using rule-based methods and OpenAI API integration. The project highlighted the challenges of applying AI in healthcare, particularly due to the subjective and context-dependent nature of clinical tools like PHQ-9 and C-SSRS. This experience underscores the need for a bridge between clinical expertise and technical development to enhance healthcare AI applications. Understanding and addressing these challenges is crucial for advancing AI's role in healthcare.

    Read Full Article: Physician’s 48-Hour NLP Journey in Healthcare AI