modelscope/FunASR
Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.
A generative speech model for daily dialogue.
Open-Source Frontier Voice AI
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
2 captures since 2026-05-25