A comprehensive benchmarking of 26 speech-to-text (STT) models was conducted on long-form medical dialogue using the PriMock57 dataset, consisting of 55 files and over 81,000 words. The models were ranked based on their average Word Error Rate (WER), with Google Gemini 2.5 Pro leading at 10.79% and Parakeet TDT 0.6B v3 emerging as the top local model at 11.9% WER. The evaluation also considered processing time per file and noted issues such as repetition-loop failures in some models, which required chunking to mitigate. The full evaluation, including code and a complete leaderboard, is available on GitHub, providing valuable insights for developers working on medical transcription technology. This matters because accurate and efficient STT models are crucial for improving clinical documentation and reducing the administrative burden on healthcare professionals.
Read Full Article: Benchmarking Speech-to-Text Models for Medical Dialogue