streaming ASR

  • NVIDIA’s Nemotron Speech ASR: Low-Latency Transcription


    NVIDIA AI Released Nemotron Speech ASR: A New Open Source Transcription Model Designed from the Ground Up for Low-Latency Use Cases like Voice AgentsNVIDIA has introduced Nemotron Speech ASR, an open-source streaming transcription model designed for low-latency applications like voice agents and live captioning. Utilizing a cache-aware FastConformer encoder and RNNT decoder, the model processes 16 kHz mono audio with configurable chunk sizes ranging from 80 ms to 1.12 s, allowing developers to balance latency and accuracy without retraining. This innovative approach avoids overlapping window recomputation, enhancing concurrency and efficiency on modern NVIDIA GPUs. With a word error rate (WER) between 7.16% and 7.84% across various benchmarks, Nemotron Speech ASR offers a scalable solution for real-time speech applications. This matters because it enables more efficient and accurate real-time speech processing, crucial for applications like voice assistants and live transcription services.

    Read Full Article: NVIDIA’s Nemotron Speech ASR: Low-Latency Transcription