FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Open-Source Frontier Voice AI
A PyTorch-based Speech Toolkit
A generative speech model for daily dialogue.
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
1 capture since 2026-05-25