|
Main | | | | | | | | | | | | | |
| Name | Repo | Type | Link | Stars | | Implementation | Authors | | | | | |
| Whisper | https://github.com/openai/whisper | ASR | | 56200 | | N/A | OpenAI | | | | | |
| SV TL TTS | https://github.com/CorentinJ/Real-Time-Voice-Cloning | Resemble AI employee cloning tool, outdated. | | 50000 | | N/A | Corentin Jemine | | | | | |
| FFmpeg | https://github.com/FFmpeg/Ffmpeg | Common utility for audio. | | 41000 | | N/A | Fabrice Bellard, Bobby Bingham | | | | | |
| Mockingbird (Chinese) | https://github.com/babysor/MockingBird | Chinese fork of Corentin's repo | | 33200 | | N/A | Chinese Anon | | | | | |
| Bark | https://github.com/suno-ai/bark | TTS/TTA library, doesn't work well. | | 30600 | | doesn't work well | Suno | | | | | |
| TTS (Coqui) | https://github.com/coqui-ai/TTS | TTS | | 27000 | | up next? | Coqui (RIP) | | | | | |
| MPV | https://github.com/mpv-player/mpv | Terminal audio output. | | 25200 | | N/A | Large OS project | | | | | |
| DeepSpeech | https://github.com/mozilla/DeepSpeech | | | 23900 | | N/A | Mozilla | | | | | |
| So-Vits-SVC | https://github.com/svc-develop-team/so-vits-svc | | | 22800 | | | Chinese OS Anons | | | | | |
| Audiocraft | https://github.com/facebookresearch/audiocraft | | | 18800 | | | Meta | | | | | |
| RVC | https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI | | | 16500 | | | | | | | | |
| GPT-SoVITS | https://github.com/RVC-Boss/GPT-SoVITS | | | 14400 | | | | | | | | |
| OpenVoice | https://github.com/myshell-ai/OpenVoice | | | 14300 | | | | | | | | |
| Vocal Remover | https://github.com/Anjok07/ultimatevocalremovergui | | | 13700 | | N/A | | | | | | |
| Kaldi | https://github.com/kaldi-asr/kaldi | | | 13500 | | | | | | | | |
| PaddleHub | https://github.com/PaddlePaddle/PaddleHub | | | 12400 | | | | | | | | |
| Tortoise | https://github.com/neonbjb/tortoise-tts | | www.nonint.com | 11100 | Slow, need to speed up diffuser. Used in ElevenLabs, Play.ht. | Limited, retrain | James Betker | | | | | |
| PaddleSpeech | https://github.com/PaddlePaddle/PaddleSpeech | | | 9700 | | | | | | | | |
| Seamless | https://github.com/facebookresearch/seamless_communication | | | 9700 | | doesn't work well | Meta | | | | | |
| AudioGPT | https://github.com/AIGC-Audio/AudioGPT | | | 9700 | | | | | | | | |
| Nemo | https://github.com/NVIDIA/NeMo | Not specifically TTS | https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/tts/intro.html | 9300 | | | Nvidia | | | | | |
| Mozilla TTS | https://github.com/mozilla/TTS | | | 8600 | | N/A | Mozilla | | | | | |
| PyDub | https://github.com/jiaaro/pydub | | | 8200 | | | | | | | | |
| so-vits-svc | https://github.com/voicepaw/so-vits-svc-fork | | | 8000 | | | | | | | | |
| whisperX | https://github.com/m-bain/whisperX | ASR | | 7900 | | N/A | | | | | | |
| Uberi SR | https://github.com/Uberi/speech_recognition | ASR | | 7900 | | N/A | | | | | | |
| Espnet | https://github.com/espnet/espnet | | | 7600 | | | | | | | | |
| Jukebox | https://github.com/openai/jukebox | | | 7400 | | | OpenAI | | | | | |
| ASRT | https://github.com/nl8590687/ASRT_SpeechRecognition | | | 7300 | | N/A | | | | | | |
| SpeechBrain | https://github.com/speechbrain/speechbrain | | | 7300 | | | | | | | | |
| VALL-E | https://github.com/Plachtaa/VALL-E-X | | | 6900 | | | | | | | | |
| PaddlePaddle | https://github.com/PaddlePaddle/models | | | 6900 | | | | | | | | |
| Vosk API | https://github.com/alphacep/vosk-api | | | 6700 | | | | | | | | |
| Annyang | https://github.com/TalAter/annyang | | | 6500 | | | | | | | | |
| Librosa | https://github.com/librosa/librosa | | | 6500 | | N/A | | | | | | |
| C++ Whisper | https://github.com/Const-me/Whisper | ASR | | 6500 | | N/A | | | | | | |
| wav2letter | https://github.com/flashlight/wav2letter | | | 6300 | | | | | | | | |
| VITS | https://github.com/jaywalnut310/vits | TTS | | 6000 | | Production | | | | | | |
| Bert-VITS2 | https://github.com/fishaudio/Bert-VITS2 | | | 5900 | | | | | | | | |
| EmotiVoice (Netease) | https://github.com/netease-youdao/EmotiVoice | | | 5900 | | | | | | | | |
| Py Audio Analysis | https://github.com/tyiannak/pyAudioAnalysis | | | 5600 | | | | | | | | |
| CloneVoice (Coqui fork) | https://github.com/jianchang512/clone-voice | | | 5500 | | | | | | | | |
| Wukong-Robot | https://github.com/wzpan/wukong-robot | | | 5500 | | | | | | | | |
| Pedalboard | https://github.com/spotify/pedalboard | | | 4700 | | | Spotify | | | | | |
| Pyannote | https://github.com/pyannote/pyannote-audio | | | 4600 | | | | | | | | |
| Silero | https://github.com/snakers4/silero-models | | | 4400 | | | | | | | | |
| VITS-fast-fine-tuning | https://github.com/Plachtaa/VITS-fast-fine-tuning | | | 4300 | | | | | | | | |
| DiffSinger | | | | 4000 | | | | | | | | |
| STT WaveNet | https://github.com/buriburisuri/speech-to-text-wavenet | | | 3900 | | | | | | | | |
| Lyra | https://github.com/google/lyra | | | 3700 | | | Google | | | | | |
| TensorFlowTTS | | | | 3600 | | | | | | | | |
| StyleTTS2 | https://github.com/yl4579/StyleTTS2 | | | 3600 | Fast but not expressive | Production | Aaron | | | | | |
| wenet | https://github.com/wenet-e2e/wenet | | | 3600 | | | | | | | | |
| Amphion | https://github.com/open-mmlab/Amphion | | | 3500 | | | | | | | | |
| Piper | https://github.com/rhasspy/piper | | | 3100 | | | | | | | | |
| WhisperSpeech | | | | 3000 | | | | | | | | |
| Tacotron | https://github.com/keithito/tacotron | Unofficial Implementation | | 2900 | | | | | | | | |
| TF ASR Mandarin/Eng | https://github.com/zzw922cn/Automatic_Speech_Recognition | | | 2800 | | | | | | | | |
| VALL-E | | | | 2800 | | | | | | | | |
| EdgeTTS | | | | 2800 | | | | | | | | |
| eSpeak | https://github.com/espeak-ng/espeak-ng | | | 2700 | | | | | | | | |
| diff-svc | https://github.com/prophesier/diff-svc | | | 2600 | | | | | | | | |
| JuliusJS | https://github.com/zzmp/juliusjs | | | 2600 | | | | | | | | |
| FunASR | https://github.com/alibaba-damo-academy/FunASR | | | 2500 | | | | | | | | |
| aeneas | https://github.com/readbeyond/aeneas | | | 2400 | | | | | | | | |
| Metavoice | https://github.com/metavoiceio/metavoice-src | | | 2300 | | | | | | | | |
| pytorch-kaldi | https://github.com/mravanelli/pytorch-kaldi | | | 2300 | | | | | | | | |
| pytorch/audio | https://github.com/pytorch/audio | | | 2300 | | | Meta | | | | | |
| Python Speech Features | https://github.com/jameslyons/python_speech_features | | | 2300 | | | | | | | | |
| WaveNet | https://github.com/r9y9/wavenet_vocoder | | | 2300 | | | | | | | | |
| so-vits-svc-5.0 | https://github.com/PlayVoice/so-vits-svc-5.0 | | | 2300 | | | | | | | | |
| MaryTTS | https://github.com/marytts/marytts | | | 2200 | | | | | | | | |
| Waveglow | https://github.com/NVIDIA/waveglow | | | 2200 | | | | | | | | |
| TF SR | https://github.com/pannous/tensorflow-speech-recognition | | | 2200 | | | | | | | | |
| AudioLDM | https://github.com/haoheliu/AudioLDM | | | 2100 | | | | | | | | |
| gTTS | https://github.com/pndurette/gTTS | | | 2100 | | | | | | | | |
| DeepSpeech 2 | https://github.com/SeanNaren/deepspeech.pytorch | | | 2100 | | | | | | | | |
| Coqui STT | https://github.com/coqui-ai/STT | | | 2100 | | | | | | | | |
| S3PRL | https://github.com/s3prl/s3prl | | | 2000 | | | | | | | | |
| DeepVoice3 | https://github.com/r9y9/deepvoice3_pytorch | | | 1900 | | | | | | | | |
| Bark VC | https://github.com/KevinWang676/Bark-Voice-Cloning | | | 1900 | | | | | | | | |
| pyTTSx3 | https://github.com/nateshmbhat/pyttsx3 | | | 1800 | | | | | | | | |
| TF Tacotron | https://github.com/Kyubyong/tacotron | | | 1800 | | | | | | | | |
| mASR | https://github.com/nobody132/masr | | | 1800 | | | | | | | | |
| Julius | https://github.com/julius-speech/julius | | | 1800 | | | | | | | | |
| Deep Filter Net | https://github.com/Rikorose/DeepFilterNet | | | 1700 | | | | | | | | |
| VALL-E | https://github.com/lifeiteng/vall-e | | | 1700 | | | | | | | | |
| whisper-diarization | https://github.com/MahmoudAshraf97/whisper-diarization | | | 1700 | | | | | | | | |
| elevenlabs python | https://github.com/elevenlabs/elevenlabs-python | | | 1600 | | | | | | | | |
| soloud | https://github.com/jarikomppa/soloud | | | 1600 | | | | | | | | |
| delta | https://github.com/Delta-ML/delta | | | 1600 | | | | | | | | |
| FastSpeech2 | https://github.com/ming024/FastSpeech2 | | | 1600 | | | | | | | | |
| OpenSeq2Seq | https://github.com/NVIDIA/OpenSeq2Seq | | | 1500 | | | | | | | | |
| Denoiser (FB) | https://github.com/facebookresearch/denoiser | | | 1500 | | | | | | | | |
| DDSP-SVC | https://github.com/yxlllc/DDSP-SVC | | | 1500 | | | | | | | | |
| PocketSphinx.js | https://github.com/syl22-00/pocketsphinx.js | | | 1500 | | | | | | | | |
| Say.JS | https://github.com/Marak/say.js | | | 1500 | | | | | | | | |
| Web Speech API | https://github.com/mdn/web-speech-api | | | 1400 | | | | | | | | |
| Live Transcribe | https://github.com/google/live-transcribe-speech-engine | | | 1400 | | | | | | | | |
| Java ASR | https://github.com/cmusphinx/sphinx4 | | | 1400 | | | | | | | | |
| RHVoice | https://github.com/RHVoice/RHVoice | | | 1400 | | | | | | | | |
| Voice Elements | https://github.com/zenorocha/voice-elements | | | 1300 | | | | | | | | |
| Praat | https://github.com/praat/praat | | | 1300 | | | | | | | | |
| Whisper Timestamp | https://github.com/linto-ai/whisper-timestamped | | | 1300 | | | | | | | | |
| eSpeak JS | https://github.com/kripken/speak.js | | | 1300 | | | | | | | | |
| Noise Reduce Py | https://github.com/timsainb/noisereduce | | | 1200 | | | | | | | | |
| NeuralSpeech (MSFT) | https://github.com/microsoft/NeuralSpeech | | | 1200 | | | | | | | | |
| Artyom | https://github.com/sdkcarlos/artyom.js | | | 1200 | | | | | | | | |
| Whisper-plus | https://github.com/kadirnar/whisper-plus | | | 1200 | | | | | | | | |
| Speech Emotion Analyzer | https://github.com/MiteshPuthran/Speech-Emotion-Analyzer | | | 1200 | | | | | | | | |
| Whisper Apple Silicon | https://github.com/argmaxinc/WhisperKit | | | 1200 | | | | | | | | |
| Speech Corpora | https://github.com/coqui-ai/open-speech-corpora | | | 1200 | | | | | | | | |
| XZVoice | https://github.com/bawangxx/XZVoice | | | 1200 | | | | | | | | |
| Subsync | https://github.com/sc0ty/subsync | | | 1200 | | | | | | | | |
| DC TTS | https://github.com/Kyubyong/dc_tts | | | 1200 | | | | | | | | |
| SAM | https://github.com/s-macke/SAM | | | 1100 | | | | | | | | |
| NaturalSpeech 2 | https://github.com/lucidrains/naturalspeech2-pytorch | | https://arxiv.org/pdf/2304.09116.pdf | 1100 | | | | | | | | |
| VOSK TTS | https://github.com/ideasman42/nerd-dictation | | | 1100 | | | | | | | | |
| World | https://github.com/mmorise/World | | | 1100 | | | | | | | | |
| LPCNet | https://github.com/xiph/LPCNet | | | 1100 | | | | | | | | |
| TransformerTTS | https://github.com/as-ideas/TransformerTTS | | | 1100 | | | | | | | | |
| Voice2JSON | https://github.com/synesthesiam/voice2json | | | 1100 | | | | | | | | |
| Ekho | https://github.com/hgneng/ekho | | | 1100 | | | | | | | | |
| SoundStorm | https://github.com/lucidrains/soundstorm-pytorch | Unofficial | https://google-research.github.io/seanet/soundstorm/examples/ | 1100 | | | | | | | | |
| VITS Chinese | https://github.com/PlayVoice/vits_chinese | | | 1000 | | | | | | | | |
| HierSpeechpp | https://github.com/sh-lee-prml/HierSpeechpp | | | 980 | | | | | | | | |
| pyKaldi | https://github.com/pykaldi/pykaldi | | | 975 | | | | | | | | |
| MoeTTS | https://github.com/luoyily/MoeTTS | | | 966 | | | | | | | | |
| T5 (Microsoft) | https://github.com/microsoft/SpeechT5 | | | 942 | | | | | | | | |
| Botium | https://github.com/codeforequity-at/botium-speech-processing | | | 941 | | | | | | | | |
| NATSpeech | https://github.com/NATSpeech/NATSpeech | | | 941 | | | | | | | | |
| Asia TTSKit | https://github.com/kuangdd/ttskit | | | 940 | | | | | | | | |
| Espresso | https://github.com/freewym/espresso | | | 939 | | | | | | | | |
| mimic3 | https://github.com/MycroftAI/mimic3 | | | 932 | | | | | | | | |
| Athena | https://github.com/athena-team/athena | | | 928 | | | | | | | | |
| Quillman | https://github.com/modal-labs/quillman | | | 910 | | | | | | | | |
| Vonage | https://github.com/Vonage/vonage-php-sdk-core | | | 891 | | | | | | | | |
| TensorFlow ASR | https://github.com/TensorSpeech/TensorFlowASR | | | 891 | | | | | | | | |
| SpeechPy | https://github.com/astorfi/speechpy | | | 881 | | | | | | | | |
| Flowtron | https://github.com/NVIDIA/flowtron | | | 873 | | | | | | | | |
| Loop (FB) | https://github.com/facebookarchive/loop | | | 871 | | | | | | | | |
| Voicefixer | https://github.com/haoheliu/voicefixer | | | 845 | | | | | | | | |
| FastSpeech | https://github.com/xcmyz/FastSpeech | | | 834 | | | | | | | | |
| Conformer ASR | https://github.com/sooftware/conformer | | | 834 | | | | | | | | |
| SpeechGPT | https://github.com/0nutation/SpeechGPT | | | 822 | | | | | | | | |
| Larynx | https://github.com/rhasspy/larynx | | | 821 | | | | | | | | |
| Vosk Server | https://github.com/alphacep/vosk-server | | | 811 | | | | | | | | |
| VAD Toolkit | https://github.com/jtkim-kaist/VAD | | | 809 | | | | | | | | |
| Emotion Recognition | https://github.com/Renovamen/Speech-Emotion-Recognition | | | 805 | | | | | | | | |
| Lhotse | https://github.com/lhotse-speech/lhotse | | | 800 | | | | | | | | |
| Speechmetrics | https://github.com/aliutkus/speechmetrics | | | 795 | | | | | | | | |
| Tencent Chinese Speech | https://github.com/TencentGameMate/chinese_speech_pretrain | | | 794 | | | | | | | | |
| RealtimeTTS | https://github.com/KoljaB/RealtimeTTS | | | 794 | | | | | | | | |
| Tacotron 2 multilingual | https://github.com/Tomiinek/Multilingual_Text_to_Speech | | | 793 | | | | | | | | |
| Segan | https://github.com/santi-pdp/segan | | | 785 | | | | | | | | |
| Flite | https://github.com/festvox/flite | | | 766 | | | | | | | | |
| OpenTTS | https://github.com/synesthesiam/opentts | | | 756 | | | | | | | | |
| Speech Transformer Chinese | https://github.com/kaituoxu/Speech-Transformer | | | 752 | | | | | | | | |
| SALMONN (ByteDanse) | https://github.com/bytedance/SALMONN | | | 727 | | | | | | | | |
| PPASR | https://github.com/yeyupiaoling/PPASR | | | 724 | | | | | | | | |
| AV Hubert | https://github.com/facebookresearch/av_hubert | | | 710 | | | | | | | | |
| Awni PySpeech | https://github.com/awni/speech | | | 704 | | | | | | | | |
| Sherpa NCNN | https://github.com/k2-fsa/sherpa-ncnn | | | 704 | | | | | | | | |
| Tortoise Fast | https://github.com/152334H/tortoise-tts-fast | KV Cache, New Diffusion Model | | 700 | | | | | | | | |
| FishSpeech | https://github.com/fishaudio/fish-speech | | | 695 | | | | | | | | |
| Resemble Enhance | https://github.com/resemble-ai/resemble-enhance | | | 693 | | | | | | | | |
| Whisper Streaming | https://github.com/ufal/whisper_streaming | | | 687 | | | | | | | | |
| LibreASR | https://github.com/iceychris/LibreASR | | | 681 | | | | | | | | |
| Speech Segmenter | https://github.com/ina-foss/inaSpeechSegmenter | | | 674 | | | | | | | | |
| DSD | https://github.com/szechyjs/dsd | | | 658 | | | | | | | | |
| Denoising | https://github.com/drethage/speech-denoising-wavenet | | | 652 | | | | | | | | |
| OpenSpeech | https://github.com/openspeech-team/openspeech | | | 635 | | | | | | | | |
| SpecAugment | https://github.com/DemisEom/SpecAugment | | | 628 | | | | | | | | |
| Cboard | https://github.com/cboard-org/cboard | | | 627 | | | | | | | | |
| WavAugment | https://github.com/facebookresearch/WavAugment | | | 624 | | | | | | | | |
| TransformerTTS | https://github.com/soobinseo/Transformer-TTS | | | 622 | | | | | | | | |
| Conv-TasNet | https://github.com/kaituoxu/Conv-TasNet | | | 621 | | | | | | | | |
| Glow-TTS | https://github.com/jaywalnut310/glow-tts | | | 620 | | | | | | | | |
| Voice-Builder | https://github.com/google/voice-builder | | | 613 | | | | | | | | |
| Parrot | https://github.com/sotelo/parrot | | | 612 | | | | | | | | |
| Sonus | https://github.com/evancohen/sonus | | | 610 | | | | | | | | |
| React Use Whisper | https://github.com/chengsokdara/use-whisper | | | 607 | | | | | | | | |
| Paddle-DeepSpeech | https://github.com/yeyupiaoling/PaddlePaddle-DeepSpeech | | | 603 | | | | | | | | |
| SpeechSplit | https://github.com/auspicious3000/SpeechSplit | | | 601 | | | | | | | | |
| Parakeet | https://github.com/PaddlePaddle/Parakeet | | | 596 | | | | | | | | |
| Codec2-dev | https://github.com/drowe67/codec2-dev | | | 591 | | | | | | | | |
| Rhino | https://github.com/Picovoice/rhino | | | 584 | | | | | | | | |
| ASR/LM | https://github.com/hirofumi0810/neural_sp | | | 583 | | | | | | | | |
| VisQOL | https://github.com/google/visqol | | | 580 | | | | | | | | |
| GigaSpeech | https://github.com/SpeechColab/GigaSpeech | | | 574 | | | | | | | | |
| Chinese Text Norm | https://github.com/speechio/chinese_text_normalization | | | 574 | | | | | | | | |
| Sonic | https://github.com/JamesBrill/react-speech-recognition | | | 574 | | | | | | | | |
| Audio Denoising | https://github.com/vbelz/Speech-enhancement | | | 563 | | | | | | | | |
| xVA-Synth | https://github.com/DanRuta/xVA-Synth | | | 562 | | | | | | | | |
| Sherpa-Onnx | https://github.com/k2-fsa/sherpa-onnx | | | 560 | | | | | | | | |
| KoSpeech | https://github.com/sooftware/kospeech | | | 559 | | | | | | | | |
| Wunjo AI | https://github.com/wladradchenko/wunjo.wladradchenko.ru | | | 559 | | | | | | | | |
| Cheetah | https://github.com/Picovoice/cheetah | | | 540 | | | | | | | | |
| NISQA | https://github.com/gabrielmittag/NISQA | | | 533 | | | | | | | | |
| DTLN | https://github.com/breizhn/DTLN | | | 530 | | | | | | | | |
| Tacotron | https://github.com/google/tacotron | Samples only | | 525 | | | | | | | | |
| Realtime STT | https://github.com/KoljaB/RealtimeSTT | | | 520 | | | | | | | | |
| GANTTS | https://github.com/r9y9/gantts | | | 515 | | | | | | | | |
| So-Vits-SVC-4.0-v2 | https://github.com/justinjohn0306/so-vits-svc-4.0-v2 | | | 515 | | | | | | | | |
| Voice Converter | https://github.com/leimao/Voice-Converter-CycleGAN | | | 514 | | | | | | | | |
| Meta-voicebox (Speechify) | https://github.com/SpeechifyInc/Meta-voicebox | | | 514 | | | | | | | | |
| Termit | https://github.com/pawurb/termit | | | 507 | | | | | | | | |
| Huawei Speech Backbones | https://github.com/huawei-noah/Speech-Backbones | | | 500 | | | | | | | | |
| FreeVC | https://github.com/OlaWod/FreeVC | | | 495 | | | | | | | | |
| FullSubNet | https://github.com/Audio-WestlakeU/FullSubNet | | | 493 | | | | | | | | |
| Emotion Recognition | https://github.com/x4nth055/emotion-recognition-using-speech | | | 490 | | | | | | | | |
| Allosaurus | https://github.com/xinjli/allosaurus | | | 488 | | | | | | | | |
| Vox Populi | https://github.com/facebookresearch/voxpopuli | | | 484 | | | | | | | | |
| SpecAugment | https://github.com/zcaceres/spec_augment | | | 484 | | | | | | | | |
| VRCWizard | https://github.com/VRCWizard/TTS-Voice-Wizard | | | 479 | | | | | | | | |
| mASR | https://github.com/yeyupiaoling/MASR | | | 479 | | | | | | | | |
| DeepXi | https://github.com/anicolson/DeepXi | | | 477 | | | | | | | | |
| uSpeech | https://github.com/arjo129/uSpeech | | | 474 | | | | | | | | |
| AwesomeTTS for Anki | https://github.com/AwesomeTTS/awesometts-anki-addon | | | 470 | | | | | | | | |
| Whisper-Web | https://github.com/xenova/whisper-web | | | 461 | | | | | | | | |
| VoiceBox | https://github.com/lucidrains/voicebox-pytorch | | https://research.facebook.com/publications/voicebox-text-guided-multilingual-universal-speech-generation-at-scale/ | 460 | | Unofficial | | | | | | |
| PESQ | https://github.com/ludlows/PESQ | | | 455 | | | | | | | | |
| StarGAN | https://github.com/yl4579/StarGANv2-VC | | | 448 | | | | | | | | |
| WPE | https://github.com/fgnt/nara_wpe | | | 444 | | | | | | | | |
| ASR | https://github.com/gooofy/zamia-speech | | | 442 | | | | | | | | |
| ASR | https://github.com/espressif/esp-sr | | | 439 | | | | | | | | |
| CSS10 | https://github.com/Kyubyong/css10 | | | 438 | | | | | | | | |
| WenetSpeech | https://github.com/wenet-e2e/WenetSpeech | | | 436 | | | | | | | | |
| PASE | https://github.com/santi-pdp/pase | | | 433 | | | | | | | | |
| Stuttgart TTS | https://github.com/DigitalPhonetics/IMS-Toucan | | | 431 | | | | | | | | |
| FastSpeech2 Mand | https://github.com/ranchlai/mandarin-tts | | | 430 | | | | | | | | |
| SPTK | https://github.com/r9y9/pysptk | | | 424 | | | | | | | | |
| ProDiff | https://github.com/Rongjiehuang/ProDiff | | | 419 | | | | | | | | |
| Linux SR | https://github.com/JamezQ/Palaver | | | 418 | | | | | | | | |
| SwiftSpeech | https://github.com/Cay-Zhang/SwiftSpeech | | | 411 | | | | | | | | |
| German ASR | https://github.com/DeutscheKI/tevr-asr-tool | | | 409 | | | | | | | | |
| Kan-TTS | https://github.com/alibaba-damo-academy/KAN-TTS | | | 401 | | | | | | | | |
| PyCTCDecode | https://github.com/kensho-technologies/pyctcdecode | | | 399 | | | | | | | | |
| HuggingSound | https://github.com/jonatasgrosman/huggingsound | | | 397 | | | | | | | | |
| Leopard | https://github.com/Picovoice/leopard | | | 395 | | | | | | | | |
| Sherpa | https://github.com/k2-fsa/sherpa | | | 390 | | | | | | | | |
| SS Systems | https://github.com/r9y9/nnmnkwii | | | 390 | | | | | | | | |
| ContentVec | https://github.com/auspicious3000/contentvec | Disentangling speakers with SSL | https://arxiv.org/pdf/2204.09224.pdf | 387 | | | | | | | | |
| Speech Enhancement with Kaldi | https://github.com/funcwj/setk | | | 386 | | | | | | | | |
| SPChat | https://github.com/petewarden/spchcat | | | 386 | | | | | | | | |
| TF SR | https://github.com/llSourcell/tensorflow_speech_recognition_demo | | | 383 | | | | | | | | |
| Speech Aligner | https://github.com/open-speech/speech-aligner | | | 382 | | | | | | | | |
| Emotion Recognition | https://github.com/marcogdepinto/emotion-classification-from-audio-files | | | 380 | | | | | | | | |
| NN VC | https://github.com/bshall/knn-vc | | | 379 | | | | | | | | |
| Dual Path RNN Pytorch | https://github.com/JusperLee/Dual-Path-RNN-Pytorch | | | 376 | | | | | | | | |
| Emotion Recognition | https://github.com/xuanjihe/speech-emotion-recognition | | | 375 | | | | | | | | |
| UniSpeech (MSFT) | https://github.com/microsoft/UniSpeech | | | 374 | | | | | | | | |
| ISTFT | https://github.com/MasayaKawamura/MB-iSTFT-VITS | | | 373 | | | | | | | | |
| Self Supervised Speech | https://github.com/mailong25/self-supervised-speech-recognition | | | 371 | | | | | | | | |
| Segan | https://github.com/santi-pdp/segan_pytorch | | | 370 | | | | | | | | |
| Vonage | https://github.com/Vonage/vonage-node-sdk | | | 370 | | | | | | | | |
| Speech-Hacker | https://github.com/ParhamP/Speech-Hacker | | | 369 | | | | | | | | |
| pyTTSx | https://github.com/RapidWareTech/pyttsx | | | 368 | | | | | | | | |
| GST-Tacotron | https://github.com/RapidWareTech/pyttsx | | | 368 | | | | | | | | |
| Conv-TasNet | https://github.com/JusperLee/Conv-TasNet | | | 366 | | | | | | | | |
| EmoTTS Attempt | https://github.com/Emotional-Text-to-Speech/dl-for-emo-tts | | | 365 | | | | | | | | |
| OpenTransformer | https://github.com/ZhengkunTian/OpenTransformer | | | 364 | | | | | | | | |
| Parrots | https://github.com/shibing624/parrots | | | 360 | | | | | | | | |
| Awesome Singing Voice | https://github.com/guan-yuan/Awesome-Singing-Voice-Synthesis-and-Singing-Voice-Conversion | | | 360 | | | | | | | | |
| Dragonfly | https://github.com/dictation-toolbox/dragonfly | | | 358 | | | | | | | | |
| Dereverb | https://github.com/sp-uhh/sgmse | | | 356 | | | | | | | | |
| SpeechBrain | https://github.com/speechbrain/speechbrain.github.io | | | 351 | | | | | | | | |
| VOSK | https://github.com/alphacep/vosk | | | 350 | | | | | | | | |
| DeepVoicer | https://github.com/israelg99/deepvoice | | | 350 | | | | | | | | |
| TikTok-TTS | https://github.com/Weilbyte/tiktok-tts | | | 350 | | | | | | | | |
| Emotion Recognition | https://github.com/Demfier/multimodal-speech-emotion-recognition | | | 349 | | | | | | | | |
| Soft VC | https://github.com/bshall/soft-vc | | | 348 | | | | | | | | |
| GST-Tacotron | https://github.com/KinglittleQ/GST-Tacotron | | | 345 | | | | | | | | |
| PySEPM | https://github.com/schmiph2/pysepm | | | 343 | | | | | | | | |
| Speech Resynth | https://github.com/facebookresearch/speech-resynthesis | | | 342 | | | | | | | | |
| Festival | https://github.com/festvox/festival | | | 342 | | | | | | | | |
| Pika | https://github.com/tencent-ailab/pika | | | 338 | | | | | | | | |
| VAD | https://github.com/filippogiruzzi/voice_activity_detection | | | 333 | | | | | | | | |
| WETTS | https://github.com/wenet-e2e/wetts | | | 332 | | | | | | | | |
| StyleTTS | https://github.com/yl4579/StyleTTS | | | 331 | | | | | | | | |
| ASR Speech Dataset | https://github.com/double22a/speech_dataset | | | 329 | | | | | | | | |
| PortaSpeech | https://github.com/keonlee9420/PortaSpeech | | | 324 | | | | | | | | |
| Speechbox | https://github.com/huggingface/speechbox | | | 321 | | | | | | | | |
| Diffusion-SVC | https://github.com/CNChTu/Diffusion-SVC/tree/v1_Stable | | | 320 | | | | | | | | |
| CoVoST | https://github.com/facebookresearch/covost | | | 317 | | | | | | | | |
| VQMIVC | https://github.com/Wendison/VQMIVC | | | 315 | | | | | | | | |
| DisVoice | https://github.com/jcvasquezc/DisVoice | | | 309 | | | | | | | | |
| Vosk | https://github.com/ccoreilly/vosk-browser | | | 309 | | | | | | | | |
| STT Webcam | https://github.com/1heisuzuki/speech-to-text-webcam-overlay | | | 307 | | | | | | | | |
| Source Separation | https://github.com/AppleHolic/source_separation | | | 306 | | | | | | | | |
| HanTTS | https://github.com/junzew/HanTTS | | | 306 | | | | | | | | |
| German ASR | https://github.com/AASHISHAG/deepspeech-german | | | 306 | | | | | | | | |
| Room Impulse Corpus | https://github.com/RoyJames/room-impulse-responses | | | 306 | | | | | | | | |
| Transformer TTS | https://github.com/keonlee9420/Comprehensive-Transformer-TTS | | | 306 | | | | | | | | |
| STT Russian | https://github.com/SergeyShk/Speech-to-Text-Russian | | | 304 | | | | | | | | |
| ZeroSpeech | https://github.com/bshall/ZeroSpeech | | | 303 | | | | | | | | |
| VITS-2 | https://github.com/daniilrobnikov/vits2 | | Unofficial repo | 288 | | | | | | | | |
| VQVAE | https://github.com/swasun/VQ-VAE-Speech | | | 254 | | | | | | | | |
| SONAR | https://github.com/facebookresearch/SONAR | | | 233 | | | | | | | | |
| Ocotillo | https://github.com/neonbjb/ocotillo | | | 228 | | | | | | | | |
| SpearTTS | https://google-research.github.io/seanet/speartts/examples/ | https://github.com/lucidrains/spear-tts-pytorch | https://google-research.github.io/seanet/speartts/examples/ | 216 | | | | | | | | |
| BDDM | https://github.com/tencent-ailab/bddm | | | 209 | | | | | | | | |
| MBROLA | https://github.com/numediart/MBROLA | | | 207 | | | | | | | | |
| VoiceFlow TTS | https://github.com/X-LANCE/VoiceFlow-TTS | | | 197 | | | | | | | | |
| SC-VITS | https://github.com/hcy71o/SC-VITS | | | 26 | | | | | | | | |
| PnGBERT | | | | | | | | | | | | |
| wav2vec | | | | | | | | | | | | |
| DeepSpeech | https://github.com/mozilla/DeepSpeech | | | | | | | PSQM | | | | |
| BASE TTS | | | | | | | | | | | | |
| YourTTS | | | | | | | | | | | | |
| NaturalSpeech | https://speechresearch.github.io/naturalspeech/ | | | | | | | | | | | |
| Tacotron 2 | | Sequence-to-sequence, acoustic feature prediction + WaveNet vocoder. | | | | | | | | | | |
| Audiobox | | Meta | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Vocoders | | | Notes | | | | | | | | | |
| WaveNet | Google | Deep autoregressive NN from 2016, beat SOTA parametric and concatenative systems. | https://www.deepmind.com/blog/wavenet-a-generative-model-for-raw-audio | | | | | | | | | |
| UnivNet | | | Used by Tortoise | | | | | | | | | |
| BigVGAN | Nvidia | https://github.com/NVIDIA/BigVGAN | Used by Tortoise | | | | | | | | | |
| HiFi-GAN | | https://github.com/jik876/hifi-gan | | 1700 | | | | | | | | |
| HiFTNet | Columbia | | | | | | | | | | | |
| iSTFTNet | | https://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/istftnet2/ | | | | | | | | | | |
| LightVoc | | https://www.isca-speech.org/archive/pdfs/interspeech_2023/dang23b_interspeech.pdf | | | | | | | | | | |
| EVA-GAN | Nvidia | 44.1kHz | | | | | | | | | | |
| MelGAN | | https://github.com/descriptinc/melgan-neurips | | 914 | | | | | | | | |
| DiffWave | | https://github.com/lmnt-com/diffwave | | 702 | | | | | | | | |
| WaveGrad | | https://github.com/ivanvovk/WaveGrad | | 394 | | | | | | | | |
| | | | | | | | | | | | | |
| AR vocoders were too slow, which led to early GANs such as WaveGAN & MelGAN. HifiGAN was the breakthrough. | | | | | | | | | | | | |
| | | | | | | | | | | | | This approach replaces a division with a multiplication, which can be beneficial from a performance standpoint. However, in the context of Python and PyTorch, the speedup might not be significant due to various optimizations already in place. As always, if performance is a critical aspect of your application, it's a good idea to profile your code with both methods and see if there is a noteworthy difference. Remember, such micro-optimizations should be considered after addressing more significant performance bottlenecks. |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Fourier transform | converts time into frequency | | | | | | | | | | | |
| mel scale | 1937 semi-log scale named after 'melody' | | | | | | | | | | | |
| sample | number of samples per second, such as 44,100 | | | | | | | | | | | |
| spectrogram | | | | | | | | | | | | |
| spectrum | plotted on spectrogram, amplitude of audio at different frequencies | | | | | | | | | | | |
| stft | short-term fourier transform | | | | | | | | | | | |