Martin Shkreli

	A	B	C	D	E	F	G	H	I	J	N
1	Main
2		Name	Repo	Type	Link	Stars		Implementation	Authors
3		Whisper	https://github.com/openai/whisper	ASR		56200		N/A	OpenAI
4		SV TL TTS	https://github.com/CorentinJ/Real-Time-Voice-Cloning	Resemble AI employee cloning tool, outdated.		50000		N/A	Corentin Jemine
5		FFmpeg	https://github.com/FFmpeg/Ffmpeg	Common utility for audio.		41000		N/A	Fabrice Bellard, Bobby Bingham
6		Mockingbird (Chinese)	https://github.com/babysor/MockingBird	Chinese fork of Corentin's repo		33200		N/A	Chinese Anon
7		Bark	https://github.com/suno-ai/bark	TTS/TTA library, doesn't work well.		30600		doesn't work well	Suno
8		TTS (Coqui)	https://github.com/coqui-ai/TTS	TTS		27000		up next?	Coqui (RIP)
9		MPV	https://github.com/mpv-player/mpv	Terminal audio output.		25200		N/A	Large OS project
10		DeepSpeech	https://github.com/mozilla/DeepSpeech			23900		N/A	Mozilla
11		So-Vits-SVC	https://github.com/svc-develop-team/so-vits-svc			22800			Chinese OS Anons
12		Audiocraft	https://github.com/facebookresearch/audiocraft			18800			Meta
13		RVC	https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI			16500
14		GPT-SoVITS	https://github.com/RVC-Boss/GPT-SoVITS			14400
15		OpenVoice	https://github.com/myshell-ai/OpenVoice			14300
16		Vocal Remover	https://github.com/Anjok07/ultimatevocalremovergui			13700		N/A
17		Kaldi	https://github.com/kaldi-asr/kaldi			13500
18		PaddleHub	https://github.com/PaddlePaddle/PaddleHub			12400
19		Tortoise	https://github.com/neonbjb/tortoise-tts		www.nonint.com	11100	Slow, need to speed up diffuser. Used in ElevenLabs, Play.ht.	Limited, retrain	James Betker
20		PaddleSpeech	https://github.com/PaddlePaddle/PaddleSpeech			9700
21		Seamless	https://github.com/facebookresearch/seamless_communication			9700		doesn't work well	Meta
22		AudioGPT	https://github.com/AIGC-Audio/AudioGPT			9700
23		Nemo	https://github.com/NVIDIA/NeMo	Not specifically TTS	https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/tts/intro.html	9300			Nvidia
24		Mozilla TTS	https://github.com/mozilla/TTS			8600		N/A	Mozilla
25		PyDub	https://github.com/jiaaro/pydub			8200
26		so-vits-svc	https://github.com/voicepaw/so-vits-svc-fork			8000
27		whisperX	https://github.com/m-bain/whisperX	ASR		7900		N/A
28		Uberi SR	https://github.com/Uberi/speech_recognition	ASR		7900		N/A
29		Espnet	https://github.com/espnet/espnet			7600
30		Jukebox	https://github.com/openai/jukebox			7400			OpenAI
31		ASRT	https://github.com/nl8590687/ASRT_SpeechRecognition			7300		N/A
32		SpeechBrain	https://github.com/speechbrain/speechbrain			7300
33		VALL-E	https://github.com/Plachtaa/VALL-E-X			6900
34		PaddlePaddle	https://github.com/PaddlePaddle/models			6900
35		Vosk API	https://github.com/alphacep/vosk-api			6700
36		Annyang	https://github.com/TalAter/annyang			6500
37		Librosa	https://github.com/librosa/librosa			6500		N/A
38		C++ Whisper	https://github.com/Const-me/Whisper	ASR		6500		N/A
39		wav2letter	https://github.com/flashlight/wav2letter			6300
40		VITS	https://github.com/jaywalnut310/vits	TTS		6000		Production
41		Bert-VITS2	https://github.com/fishaudio/Bert-VITS2			5900
42		EmotiVoice (Netease)	https://github.com/netease-youdao/EmotiVoice			5900
43		Py Audio Analysis	https://github.com/tyiannak/pyAudioAnalysis			5600
44		CloneVoice (Coqui fork)	https://github.com/jianchang512/clone-voice			5500
45		Wukong-Robot	https://github.com/wzpan/wukong-robot			5500
46		Pedalboard	https://github.com/spotify/pedalboard			4700			Spotify
47		Pyannote	https://github.com/pyannote/pyannote-audio			4600
48		Silero	https://github.com/snakers4/silero-models			4400
49		VITS-fast-fine-tuning	https://github.com/Plachtaa/VITS-fast-fine-tuning			4300
50		DiffSinger				4000
51		STT WaveNet	https://github.com/buriburisuri/speech-to-text-wavenet			3900
52		Lyra	https://github.com/google/lyra			3700			Google
53		TensorFlowTTS				3600
54		StyleTTS2	https://github.com/yl4579/StyleTTS2			3600	Fast but not expressive	Production	Aaron
55		wenet	https://github.com/wenet-e2e/wenet			3600
56		Amphion	https://github.com/open-mmlab/Amphion			3500
57		Piper	https://github.com/rhasspy/piper			3100
58		WhisperSpeech				3000
59		Tacotron	https://github.com/keithito/tacotron	Unofficial Implementation		2900
60		TF ASR Mandarin/Eng	https://github.com/zzw922cn/Automatic_Speech_Recognition			2800
61		VALL-E				2800
62		EdgeTTS				2800
63		eSpeak	https://github.com/espeak-ng/espeak-ng			2700
64		diff-svc	https://github.com/prophesier/diff-svc			2600
65		JuliusJS	https://github.com/zzmp/juliusjs			2600
66		FunASR	https://github.com/alibaba-damo-academy/FunASR			2500
67		aeneas	https://github.com/readbeyond/aeneas			2400
68		Metavoice	https://github.com/metavoiceio/metavoice-src			2300
69		pytorch-kaldi	https://github.com/mravanelli/pytorch-kaldi			2300
70		pytorch/audio	https://github.com/pytorch/audio			2300			Meta
71		Python Speech Features	https://github.com/jameslyons/python_speech_features			2300
72		WaveNet	https://github.com/r9y9/wavenet_vocoder			2300
73		so-vits-svc-5.0	https://github.com/PlayVoice/so-vits-svc-5.0			2300
74		MaryTTS	https://github.com/marytts/marytts			2200
75		Waveglow	https://github.com/NVIDIA/waveglow			2200
76		TF SR	https://github.com/pannous/tensorflow-speech-recognition			2200
77		AudioLDM	https://github.com/haoheliu/AudioLDM			2100
78		gTTS	https://github.com/pndurette/gTTS			2100
79		DeepSpeech 2	https://github.com/SeanNaren/deepspeech.pytorch			2100
80		Coqui STT	https://github.com/coqui-ai/STT			2100
81		S3PRL	https://github.com/s3prl/s3prl			2000
82		DeepVoice3	https://github.com/r9y9/deepvoice3_pytorch			1900
83		Bark VC	https://github.com/KevinWang676/Bark-Voice-Cloning			1900
84		pyTTSx3	https://github.com/nateshmbhat/pyttsx3			1800
85		TF Tacotron	https://github.com/Kyubyong/tacotron			1800
86		mASR	https://github.com/nobody132/masr			1800
87		Julius	https://github.com/julius-speech/julius			1800
88		Deep Filter Net	https://github.com/Rikorose/DeepFilterNet			1700
89		VALL-E	https://github.com/lifeiteng/vall-e			1700
90		whisper-diarization	https://github.com/MahmoudAshraf97/whisper-diarization			1700
91		elevenlabs python	https://github.com/elevenlabs/elevenlabs-python			1600
92		soloud	https://github.com/jarikomppa/soloud			1600
93		delta	https://github.com/Delta-ML/delta			1600
94		FastSpeech2	https://github.com/ming024/FastSpeech2			1600
95		OpenSeq2Seq	https://github.com/NVIDIA/OpenSeq2Seq			1500
96		Denoiser (FB)	https://github.com/facebookresearch/denoiser			1500
97		DDSP-SVC	https://github.com/yxlllc/DDSP-SVC			1500
98		PocketSphinx.js	https://github.com/syl22-00/pocketsphinx.js			1500
99		Say.JS	https://github.com/Marak/say.js			1500
100		Web Speech API	https://github.com/mdn/web-speech-api			1400
101		Live Transcribe	https://github.com/google/live-transcribe-speech-engine			1400
102		Java ASR	https://github.com/cmusphinx/sphinx4			1400
103		RHVoice	https://github.com/RHVoice/RHVoice			1400
104		Voice Elements	https://github.com/zenorocha/voice-elements			1300
105		Praat	https://github.com/praat/praat			1300
106		Whisper Timestamp	https://github.com/linto-ai/whisper-timestamped			1300
107		eSpeak JS	https://github.com/kripken/speak.js			1300
108		Noise Reduce Py	https://github.com/timsainb/noisereduce			1200
109		NeuralSpeech (MSFT)	https://github.com/microsoft/NeuralSpeech			1200
110		Artyom	https://github.com/sdkcarlos/artyom.js			1200
111		Whisper-plus	https://github.com/kadirnar/whisper-plus			1200
112		Speech Emotion Analyzer	https://github.com/MiteshPuthran/Speech-Emotion-Analyzer			1200
113		Whisper Apple Silicon	https://github.com/argmaxinc/WhisperKit			1200
114		Speech Corpora	https://github.com/coqui-ai/open-speech-corpora			1200
115		XZVoice	https://github.com/bawangxx/XZVoice			1200
116		Subsync	https://github.com/sc0ty/subsync			1200
117		DC TTS	https://github.com/Kyubyong/dc_tts			1200
118		SAM	https://github.com/s-macke/SAM			1100
119		NaturalSpeech 2	https://github.com/lucidrains/naturalspeech2-pytorch		https://arxiv.org/pdf/2304.09116.pdf	1100
120		VOSK TTS	https://github.com/ideasman42/nerd-dictation			1100
121		World	https://github.com/mmorise/World			1100
122		LPCNet	https://github.com/xiph/LPCNet			1100
123		TransformerTTS	https://github.com/as-ideas/TransformerTTS			1100
124		Voice2JSON	https://github.com/synesthesiam/voice2json			1100
125		Ekho	https://github.com/hgneng/ekho			1100
126		SoundStorm	https://github.com/lucidrains/soundstorm-pytorch	Unofficial	https://google-research.github.io/seanet/soundstorm/examples/	1100
127		VITS Chinese	https://github.com/PlayVoice/vits_chinese			1000
128		HierSpeechpp	https://github.com/sh-lee-prml/HierSpeechpp			980
129		pyKaldi	https://github.com/pykaldi/pykaldi			975
130		MoeTTS	https://github.com/luoyily/MoeTTS			966
131		T5 (Microsoft)	https://github.com/microsoft/SpeechT5			942
132		Botium	https://github.com/codeforequity-at/botium-speech-processing			941
133		NATSpeech	https://github.com/NATSpeech/NATSpeech			941
134		Asia TTSKit	https://github.com/kuangdd/ttskit			940
135		Espresso	https://github.com/freewym/espresso			939
136		mimic3	https://github.com/MycroftAI/mimic3			932
137		Athena	https://github.com/athena-team/athena			928
138		Quillman	https://github.com/modal-labs/quillman			910
139		Vonage	https://github.com/Vonage/vonage-php-sdk-core			891
140		TensorFlow ASR	https://github.com/TensorSpeech/TensorFlowASR			891
141		SpeechPy	https://github.com/astorfi/speechpy			881
142		Flowtron	https://github.com/NVIDIA/flowtron			873
143		Loop (FB)	https://github.com/facebookarchive/loop			871
144		Voicefixer	https://github.com/haoheliu/voicefixer			845
145		FastSpeech	https://github.com/xcmyz/FastSpeech			834
146		Conformer ASR	https://github.com/sooftware/conformer			834
147		SpeechGPT	https://github.com/0nutation/SpeechGPT			822
148		Larynx	https://github.com/rhasspy/larynx			821
149		Vosk Server	https://github.com/alphacep/vosk-server			811
150		VAD Toolkit	https://github.com/jtkim-kaist/VAD			809
151		Emotion Recognition	https://github.com/Renovamen/Speech-Emotion-Recognition			805
152		Lhotse	https://github.com/lhotse-speech/lhotse			800
153		Speechmetrics	https://github.com/aliutkus/speechmetrics			795
154		Tencent Chinese Speech	https://github.com/TencentGameMate/chinese_speech_pretrain			794
155		RealtimeTTS	https://github.com/KoljaB/RealtimeTTS			794
156		Tacotron 2 multilingual	https://github.com/Tomiinek/Multilingual_Text_to_Speech			793
157		Segan	https://github.com/santi-pdp/segan			785
158		Flite	https://github.com/festvox/flite			766
159		OpenTTS	https://github.com/synesthesiam/opentts			756
160		Speech Transformer Chinese	https://github.com/kaituoxu/Speech-Transformer			752
161		SALMONN (ByteDanse)	https://github.com/bytedance/SALMONN			727
162		PPASR	https://github.com/yeyupiaoling/PPASR			724
163		AV Hubert	https://github.com/facebookresearch/av_hubert			710
164		Awni PySpeech	https://github.com/awni/speech			704
165		Sherpa NCNN	https://github.com/k2-fsa/sherpa-ncnn			704
166		Tortoise Fast	https://github.com/152334H/tortoise-tts-fast	KV Cache, New Diffusion Model		700
167		FishSpeech	https://github.com/fishaudio/fish-speech			695
168		Resemble Enhance	https://github.com/resemble-ai/resemble-enhance			693
169		Whisper Streaming	https://github.com/ufal/whisper_streaming			687
170		LibreASR	https://github.com/iceychris/LibreASR			681
171		Speech Segmenter	https://github.com/ina-foss/inaSpeechSegmenter			674
172		DSD	https://github.com/szechyjs/dsd			658
173		Denoising	https://github.com/drethage/speech-denoising-wavenet			652
174		OpenSpeech	https://github.com/openspeech-team/openspeech			635
175		SpecAugment	https://github.com/DemisEom/SpecAugment			628
176		Cboard	https://github.com/cboard-org/cboard			627
177		WavAugment	https://github.com/facebookresearch/WavAugment			624
178		TransformerTTS	https://github.com/soobinseo/Transformer-TTS			622
179		Conv-TasNet	https://github.com/kaituoxu/Conv-TasNet			621
180		Glow-TTS	https://github.com/jaywalnut310/glow-tts			620
181		Voice-Builder	https://github.com/google/voice-builder			613
182		Parrot	https://github.com/sotelo/parrot			612
183		Sonus	https://github.com/evancohen/sonus			610
184		React Use Whisper	https://github.com/chengsokdara/use-whisper			607
185		Paddle-DeepSpeech	https://github.com/yeyupiaoling/PaddlePaddle-DeepSpeech			603
186		SpeechSplit	https://github.com/auspicious3000/SpeechSplit			601
187		Parakeet	https://github.com/PaddlePaddle/Parakeet			596
188		Codec2-dev	https://github.com/drowe67/codec2-dev			591
189		Rhino	https://github.com/Picovoice/rhino			584
190		ASR/LM	https://github.com/hirofumi0810/neural_sp			583
191		VisQOL	https://github.com/google/visqol			580
192		GigaSpeech	https://github.com/SpeechColab/GigaSpeech			574
193		Chinese Text Norm	https://github.com/speechio/chinese_text_normalization			574
194		Sonic	https://github.com/JamesBrill/react-speech-recognition			574
195		Audio Denoising	https://github.com/vbelz/Speech-enhancement			563
196		xVA-Synth	https://github.com/DanRuta/xVA-Synth			562
197		Sherpa-Onnx	https://github.com/k2-fsa/sherpa-onnx			560
198		KoSpeech	https://github.com/sooftware/kospeech			559
199		Wunjo AI	https://github.com/wladradchenko/wunjo.wladradchenko.ru			559
200		Cheetah	https://github.com/Picovoice/cheetah			540
201		NISQA	https://github.com/gabrielmittag/NISQA			533
202		DTLN	https://github.com/breizhn/DTLN			530
203		Tacotron	https://github.com/google/tacotron	Samples only		525
204		Realtime STT	https://github.com/KoljaB/RealtimeSTT			520
205		GANTTS	https://github.com/r9y9/gantts			515
206		So-Vits-SVC-4.0-v2	https://github.com/justinjohn0306/so-vits-svc-4.0-v2			515
207		Voice Converter	https://github.com/leimao/Voice-Converter-CycleGAN			514
208		Meta-voicebox (Speechify)	https://github.com/SpeechifyInc/Meta-voicebox			514
209		Termit	https://github.com/pawurb/termit			507
210		Huawei Speech Backbones	https://github.com/huawei-noah/Speech-Backbones			500
211		FreeVC	https://github.com/OlaWod/FreeVC			495
212		FullSubNet	https://github.com/Audio-WestlakeU/FullSubNet			493
213		Emotion Recognition	https://github.com/x4nth055/emotion-recognition-using-speech			490
214		Allosaurus	https://github.com/xinjli/allosaurus			488
215		Vox Populi	https://github.com/facebookresearch/voxpopuli			484
216		SpecAugment	https://github.com/zcaceres/spec_augment			484
217		VRCWizard	https://github.com/VRCWizard/TTS-Voice-Wizard			479
218		mASR	https://github.com/yeyupiaoling/MASR			479
219		DeepXi	https://github.com/anicolson/DeepXi			477
220		uSpeech	https://github.com/arjo129/uSpeech			474
221		AwesomeTTS for Anki	https://github.com/AwesomeTTS/awesometts-anki-addon			470
222		Whisper-Web	https://github.com/xenova/whisper-web			461
223		VoiceBox	https://github.com/lucidrains/voicebox-pytorch		https://research.facebook.com/publications/voicebox-text-guided-multilingual-universal-speech-generation-at-scale/	460		Unofficial
224		PESQ	https://github.com/ludlows/PESQ			455
225		StarGAN	https://github.com/yl4579/StarGANv2-VC			448
226		WPE	https://github.com/fgnt/nara_wpe			444
227		ASR	https://github.com/gooofy/zamia-speech			442
228		ASR	https://github.com/espressif/esp-sr			439
229		CSS10	https://github.com/Kyubyong/css10			438
230		WenetSpeech	https://github.com/wenet-e2e/WenetSpeech			436
231		PASE	https://github.com/santi-pdp/pase			433
232		Stuttgart TTS	https://github.com/DigitalPhonetics/IMS-Toucan			431
233		FastSpeech2 Mand	https://github.com/ranchlai/mandarin-tts			430
234		SPTK	https://github.com/r9y9/pysptk			424
235		ProDiff	https://github.com/Rongjiehuang/ProDiff			419
236		Linux SR	https://github.com/JamezQ/Palaver			418
237		SwiftSpeech	https://github.com/Cay-Zhang/SwiftSpeech			411
238		German ASR	https://github.com/DeutscheKI/tevr-asr-tool			409
239		Kan-TTS	https://github.com/alibaba-damo-academy/KAN-TTS			401
240		PyCTCDecode	https://github.com/kensho-technologies/pyctcdecode			399
241		HuggingSound	https://github.com/jonatasgrosman/huggingsound			397
242		Leopard	https://github.com/Picovoice/leopard			395
243		Sherpa	https://github.com/k2-fsa/sherpa			390
244		SS Systems	https://github.com/r9y9/nnmnkwii			390
245		ContentVec	https://github.com/auspicious3000/contentvec	Disentangling speakers with SSL	https://arxiv.org/pdf/2204.09224.pdf	387
246		Speech Enhancement with Kaldi	https://github.com/funcwj/setk			386
247		SPChat	https://github.com/petewarden/spchcat			386
248		TF SR	https://github.com/llSourcell/tensorflow_speech_recognition_demo			383
249		Speech Aligner	https://github.com/open-speech/speech-aligner			382
250		Emotion Recognition	https://github.com/marcogdepinto/emotion-classification-from-audio-files			380
251		NN VC	https://github.com/bshall/knn-vc			379
252		Dual Path RNN Pytorch	https://github.com/JusperLee/Dual-Path-RNN-Pytorch			376
253		Emotion Recognition	https://github.com/xuanjihe/speech-emotion-recognition			375
254		UniSpeech (MSFT)	https://github.com/microsoft/UniSpeech			374
255		ISTFT	https://github.com/MasayaKawamura/MB-iSTFT-VITS			373
256		Self Supervised Speech	https://github.com/mailong25/self-supervised-speech-recognition			371
257		Segan	https://github.com/santi-pdp/segan_pytorch			370
258		Vonage	https://github.com/Vonage/vonage-node-sdk			370
259		Speech-Hacker	https://github.com/ParhamP/Speech-Hacker			369
260		pyTTSx	https://github.com/RapidWareTech/pyttsx			368
261		GST-Tacotron	https://github.com/RapidWareTech/pyttsx			368
262		Conv-TasNet	https://github.com/JusperLee/Conv-TasNet			366
263		EmoTTS Attempt	https://github.com/Emotional-Text-to-Speech/dl-for-emo-tts			365
264		OpenTransformer	https://github.com/ZhengkunTian/OpenTransformer			364
265		Parrots	https://github.com/shibing624/parrots			360
266		Awesome Singing Voice	https://github.com/guan-yuan/Awesome-Singing-Voice-Synthesis-and-Singing-Voice-Conversion			360
267		Dragonfly	https://github.com/dictation-toolbox/dragonfly			358
268		Dereverb	https://github.com/sp-uhh/sgmse			356
269		SpeechBrain	https://github.com/speechbrain/speechbrain.github.io			351
270		VOSK	https://github.com/alphacep/vosk			350
271		DeepVoicer	https://github.com/israelg99/deepvoice			350
272		TikTok-TTS	https://github.com/Weilbyte/tiktok-tts			350
273		Emotion Recognition	https://github.com/Demfier/multimodal-speech-emotion-recognition			349
274		Soft VC	https://github.com/bshall/soft-vc			348
275		GST-Tacotron	https://github.com/KinglittleQ/GST-Tacotron			345
276		PySEPM	https://github.com/schmiph2/pysepm			343
277		Speech Resynth	https://github.com/facebookresearch/speech-resynthesis			342
278		Festival	https://github.com/festvox/festival			342
279		Pika	https://github.com/tencent-ailab/pika			338
280		VAD	https://github.com/filippogiruzzi/voice_activity_detection			333
281		WETTS	https://github.com/wenet-e2e/wetts			332
282		StyleTTS	https://github.com/yl4579/StyleTTS			331
283		ASR Speech Dataset	https://github.com/double22a/speech_dataset			329
284		PortaSpeech	https://github.com/keonlee9420/PortaSpeech			324
285		Speechbox	https://github.com/huggingface/speechbox			321
286		Diffusion-SVC	https://github.com/CNChTu/Diffusion-SVC/tree/v1_Stable			320
287		CoVoST	https://github.com/facebookresearch/covost			317
288		VQMIVC	https://github.com/Wendison/VQMIVC			315
289		DisVoice	https://github.com/jcvasquezc/DisVoice			309
290		Vosk	https://github.com/ccoreilly/vosk-browser			309
291		STT Webcam	https://github.com/1heisuzuki/speech-to-text-webcam-overlay			307
292		Source Separation	https://github.com/AppleHolic/source_separation			306
293		HanTTS	https://github.com/junzew/HanTTS			306
294		German ASR	https://github.com/AASHISHAG/deepspeech-german			306
295		Room Impulse Corpus	https://github.com/RoyJames/room-impulse-responses			306
296		Transformer TTS	https://github.com/keonlee9420/Comprehensive-Transformer-TTS			306
297		STT Russian	https://github.com/SergeyShk/Speech-to-Text-Russian			304
298		ZeroSpeech	https://github.com/bshall/ZeroSpeech			303
299		VITS-2	https://github.com/daniilrobnikov/vits2		Unofficial repo	288
300		VQVAE	https://github.com/swasun/VQ-VAE-Speech			254
301		SONAR	https://github.com/facebookresearch/SONAR			233
302		Ocotillo	https://github.com/neonbjb/ocotillo			228
303		SpearTTS	https://google-research.github.io/seanet/speartts/examples/	https://github.com/lucidrains/spear-tts-pytorch	https://google-research.github.io/seanet/speartts/examples/	216
304		BDDM	https://github.com/tencent-ailab/bddm			209
305		MBROLA	https://github.com/numediart/MBROLA			207
306		VoiceFlow TTS	https://github.com/X-LANCE/VoiceFlow-TTS			197
307		SC-VITS	https://github.com/hcy71o/SC-VITS			26
308		PnGBERT
309		wav2vec
310		DeepSpeech	https://github.com/mozilla/DeepSpeech							PSQM
311		BASE TTS
312		YourTTS
313		NaturalSpeech	https://speechresearch.github.io/naturalspeech/
314		Tacotron 2		Sequence-to-sequence, acoustic feature prediction + WaveNet vocoder.
315		Audiobox		Meta
316
317
318		Vocoders			Notes
319		WaveNet	Google	Deep autoregressive NN from 2016, beat SOTA parametric and concatenative systems.	https://www.deepmind.com/blog/wavenet-a-generative-model-for-raw-audio
320		UnivNet			Used by Tortoise
321		BigVGAN	Nvidia	https://github.com/NVIDIA/BigVGAN	Used by Tortoise
322		HiFi-GAN		https://github.com/jik876/hifi-gan		1700
323		HiFTNet	Columbia
324		iSTFTNet		https://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/istftnet2/
325		LightVoc		https://www.isca-speech.org/archive/pdfs/interspeech_2023/dang23b_interspeech.pdf
326		EVA-GAN	Nvidia	44.1kHz
327		MelGAN		https://github.com/descriptinc/melgan-neurips		914
328		DiffWave		https://github.com/lmnt-com/diffwave		702
329		WaveGrad		https://github.com/ivanvovk/WaveGrad		394
330
331		AR vocoders were too slow, which led to early GANs such as WaveGAN & MelGAN. HifiGAN was the breakthrough.
332											This approach replaces a division with a multiplication, which can be beneficial from a performance standpoint. However, in the context of Python and PyTorch, the speedup might not be significant due to various optimizations already in place. As always, if performance is a critical aspect of your application, it's a good idea to profile your code with both methods and see if there is a noteworthy difference. Remember, such micro-optimizations should be considered after addressing more significant performance bottlenecks.
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370		Fourier transform	converts time into frequency
371		mel scale	1937 semi-log scale named after 'melody'
372		sample	number of samples per second, such as 44,100
373		spectrogram
374		spectrum	plotted on spectrogram, amplitude of audio at different frequencies
375		stft	short-term fourier transform