Speakr v0.8.0: New Diarization & REST API

Speakr v0.8.0 introduces new features for its self-hosted transcription app, enhancing user experience with additional diarization options and a REST API. Users can now perform speaker diarization without a GPU by setting the TRANSCRIPTION_MODEL to gpt-4o-transcribe-diarize, utilizing their OpenAI key for diarized transcripts. The REST API v1 facilitates automation, compatible with tools like n8n and Zapier, and includes interactive Swagger documentation and personal access tokens for authentication. The update also improves UI responsiveness for lengthy transcripts, offers better audio playback, and maintains compatibility with local LLMs for text generation, while simplifying configuration through a connector architecture that auto-detects providers based on user settings. This matters because it makes advanced transcription and automation accessible to more users by reducing hardware requirements and simplifying setup, enhancing productivity and collaboration.

Speakr v0.8.0 introduces significant updates that enhance its usability and flexibility, particularly for those interested in transcription and speaker diarization. The most notable feature is the ability to perform speaker diarization without the need for a GPU, using the new transcription model ‘gpt-4o-transcribe-diarize’. This is a game-changer for users who want to identify different speakers in an audio file but lack the hardware to run WhisperX containers. By leveraging OpenAI’s capabilities, users can now achieve diarization more efficiently, which is crucial for applications like meeting transcriptions, interviews, or any multi-speaker audio analysis.

The addition of a REST API v1 is another pivotal enhancement, providing full automation capabilities. This API allows integration with automation tools like n8n, Zapier, and Make, or even custom scripts. Such integration possibilities mean that users can streamline their workflows, automate repetitive tasks, and integrate Speakr into larger systems with ease. The interactive Swagger documentation further simplifies the process by offering a user-friendly guide to the API’s functionalities. Personal access tokens for authentication ensure that the system remains secure while providing seamless access to its features.

Moreover, the updated connector architecture simplifies configuration by automatically detecting the user’s provider settings. This means that users can quickly set up and start using Speakr without getting bogged down in complex configurations. For those who prefer self-hosting, WhisperX still offers the best quality with voice profiles, ensuring that users who prioritize quality over convenience have the option to maintain their standards. Additionally, the introduction of token budgets per user is a thoughtful addition for those sharing their Speakr instance, allowing for better resource management.

The improvements in the user interface, particularly for handling very long transcripts, and the enhanced audio player, contribute to a more seamless user experience. For users who rely on local language models, Speakr continues to support text generation through platforms like Ollama and LM Studio, ensuring that existing workflows remain uninterrupted. These updates matter because they make advanced transcription and diarization accessible to a broader audience, regardless of technical expertise or hardware limitations, ultimately democratizing access to powerful audio processing tools.

Read the original article here

Posted

2026-01-08

How-Tos, Tools

TweakTheGeek

Tags:

audio processing, automation, collaboration, local LLMs, OpenAI, Productivity, REST API, speaker diarization, transcription, WhisperX

Comments

2 responses to “Speakr v0.8.0: New Diarization & REST API”

SignalNotNoise

2026-01-08

The introduction of speaker diarization without the need for a GPU is a significant advancement for users with limited hardware resources, making transcription more accessible. The REST API’s compatibility with automation tools like n8n and Zapier opens up exciting possibilities for integrating Speakr into various workflows. How does the new connector architecture handle conflicts when auto-detecting multiple providers in user settings?
1. TweakTheGeek
  
  2026-01-08
  
  The new connector architecture handles conflicts by prioritizing the most recently configured provider in the user settings. This ensures that the user’s preferred provider is always used for transcription tasks. For more detailed information, please refer to the original article linked in the post.

Speakr v0.8.0: New Diarization & REST API

Comments

2 responses to “Speakr v0.8.0: New Diarization & REST API”

Enhanced GUI for Higgs Audio v2

Grok’s Deepfake Image Feature Controversy

2026 Roadmap for AI Search & RAG Systems

Automate Data Cleaning with Python Scripts

Andreessen Horowitz Raises $15B for Tech Dominance

AI’s Impact on Healthcare Efficiency and Accuracy

VeridisQuo: Open Source Deepfake Detector with Explainable AI

VeridisQuo: Open Source Deepfake Detector

Highlights from CES 2026: Innovations and Trends

Turning Classic Games into DeepRL Environments

LGAI-EXAONE/K-EXAONE-236B-A23B-GGUF Model Overview

Physical AI Revolutionizing Cars