streaming ASR

NVIDIA’s Nemotron Speech ASR: Low-Latency Transcription

NVIDIA has introduced Nemotron Speech ASR, an open-source streaming transcription model designed for low-latency applications like voice agents and live captioning. Utilizing a cache-aware FastConformer encoder and RNNT decoder, the model processes 16 kHz mono audio with configurable chunk sizes ranging from 80 ms to 1.12 s, allowing developers to balance latency and accuracy without retraining. This innovative approach avoids overlapping window recomputation, enhancing concurrency and efficiency on modern NVIDIA GPUs. With a word error rate (WER) between 7.16% and 7.84% across various benchmarks, Nemotron Speech ASR offers a scalable solution for real-time speech applications. This matters because it enables more efficient and accurate real-time speech processing, crucial for applications like voice assistants and live transcription services.
Read Full Article
Read Full Article: NVIDIA’s Nemotron Speech ASR: Low-Latency Transcription

Posted on

Jan 6, 2026

by

TweakedGeek

in

News, Tools

Topics: open source, Nvidia, AI