TTS model

Sopro: Real-Time TTS with Zero-Shot Voice Cloning

Sopro is a compact text-to-speech model with 169 million parameters, designed for real-time applications and capable of zero-shot voice cloning. It supports streaming and can generate 30 seconds of audio in just 7.5 seconds on a CPU, requiring only 3-12 seconds of reference audio for effective voice cloning. While it is not state-of-the-art and occasionally struggles with voice likeness, Sopro is a notable achievement given its development on a single L40S GPU and limited resources. The model is available under the Apache 2.0 license, although it currently supports only English due to data constraints.
Read Full Article
Read Full Article: Sopro: Real-Time TTS with Zero-Shot Voice Cloning

Posted on

Jan 7, 2026

by

TweakedGeekAI

in

Commentary, Deep Dives

Topics: Apache 2.0 license, voice cloning