TTS model
-
Sopro: Real-Time TTS with Zero-Shot Voice Cloning
Read Full Article: Sopro: Real-Time TTS with Zero-Shot Voice Cloning
Sopro is a compact text-to-speech model with 169 million parameters, designed for real-time applications and capable of zero-shot voice cloning. It supports streaming and can generate 30 seconds of audio in just 7.5 seconds on a CPU, requiring only 3-12 seconds of reference audio for effective voice cloning. While it is not state-of-the-art and occasionally struggles with voice likeness, Sopro is a notable achievement given its development on a single L40S GPU and limited resources. The model is available under the Apache 2.0 license, although it currently supports only English due to data constraints.
Popular AI Topics
machine learning AI advancements AI models AI tools AI development AI Integration AI technology AI innovation AI applications open source AI efficiency AI ethics AI systems Python AI performance Innovation AI limitations AI reliability Nvidia AI capabilities AI agents AI safety LLMs user experience AI interaction
