Martin Shkreli

	A	B	C	D
1	Main
2		2/20/2024	Towards audio language modeling - an overview
3			Authors	Wu et al.
4			Citations
5			Link	https://arxiv.org/pdf/2402.13236.pdf
6
7		1/31/2024	EVA-GAN: Enhanced Various Audio Generation via Scalable GANs
8			Authors	Nvidia
9			Citations
10			Link	https://arxiv.org/pdf/2402.00892.pdf
11			Link2	https://double-blind-eva-gan.cc/
12
13		1/30/2024	MusicGen: Simple and Controllable Music Generation
14			Authors	Meta: Copet et al.
15			Citations
16			Link	https://arxiv.org/pdf/2306.05284.pdf
17
18		1/5/2024	M2UGen: Multi-modal Music Understanding and Generation with the Power of LLMs
19			Authors	Hussain et al.
20			Citations
21			Link	https://arxiv.org/pdf/2311.11255.pdf
22
23		1/5/2024	Pheme: Efficient and Conversational Speech Generation
24			Authors	Poly AI: Paweł Budzianowski, Taras Sereda, Tomasz Cichy, Ivan Vulic
25			Citations
26			Link	https://arxiv.org/pdf/2401.02839.pdf
27
28		1/5/2024	Towards ASR Robust Spoken Language Understanding Through In-Context Learning with Word Confusion Networks
29			Authors	Amazon: Kevin Everson et al, Amazon.
30			Citations
31			Link	https://arxiv.org/abs/2401.02921
32
33		11/26/2023	WavJourney: Compositional Audio Creation with LLMs
34			Authors	Liu et al.
35			Citations
36			Link
37
38		10/23/2023	Mousai: Efficient Text-to-Music Diffusion Models
39			Authors	Schneider et al.
40			Citations
41			Link	https://arxiv.org/pdf/2301.11757.pdf
42
43		10/12/2023	PromptTTS 2: Describing and Generating Voices with Text Prompt
44			Authors	Microsoft: Leng et al.
45			Citations
46			Link	https://arxiv.org/pdf/2309.02285.pdf
47
48		9/21/2023	ERNIE-Music: Text-to-Waveform Music Generation with Diffusion Models
49			Authors	Baidu: Zhu et al.
50			Citations
51			Link	https://arxiv.org/pdf/2302.04456.pdf
52
53		9/9/2023	AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
54			Authors	Liu et al.
55			Citations
56			Link	https://arxiv.org/pdf/2301.12503.pdf
57
58		8/14/2023	SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
59			Authors	Microsoft: Wang et al.
60			Citations
61			Link	https://arxiv.org/pdf/2308.06873.pdf
62
63		7/31/2023	VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
64			Authors	SK Telecom: Jungil Kong, Jihoon Park, Beomjeong Kim, Jeongmin Kim, Dohee Kong, Sangjin Kim
65			Citations
66			Link	https://arxiv.org/abs/2307.16430
67
68		7/8/2023	Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos
69			Authors	Su et al.
70			Citations
71			Link	https://arxiv.org/pdf/2303.16897.pdf
72
73		6/25/2023	InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt
74			Authors	Yang et al.
75			Citations
76			Link	https://arxiv.org/pdf/2301.13662.pdf
77
78		5/30/2023	NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
79			Authors	Microsoft: Shen et al.
80			Citations
81			Link	https://arxiv.org/pdf/2304.09116.pdf
82
83		5/29/2023	Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation
84			Authors	Huang et al.
85			Citations
86			Link	https://arxiv.org/pdf/2305.18474.pdf
87
88		5/25/2023	Efficient Neural Music Generation
89			Authors	ByteDance: Lam et al.
90			Citations
91			Link	https://arxiv.org/pdf/2305.15719.pdf
92
93		5/23/2023	Better speech synthesis through scaling
94			Authors	James Betker
95			Citations
96			Link	https://arxiv.org/abs/2305.07243
97
98		5/3/2023	Diverse and Vivid Sound Generation from Text Descriptions
99			Authors	Li et al.
100			Citations
101			Link	https://arxiv.org/pdf/2305.01980.pdf
102
103		4/24/2023	TANGO: Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model
104			Authors	Ghosal et al.
105			Citations
106			Link	https://arxiv.org/abs/2304.13731
107
108		4/5/2023	AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models
109			Authors	Microsoft: Wang et al.
110			Citations
111			Link	https://arxiv.org/pdf/2304.00830.pdf
112
113		3/8/2023	FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model
114			Authors	Xue et al.
115			Citations
116			Link	https://arxiv.org/pdf/2303.02939v3.pdf
117
118		3/7/2023	Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling (VALL-E X)
119			Authors	Microsoft: Zhang et al.
120			Citations
121			Link	https://arxiv.org/pdf/2303.03926.pdf
122
123		3/6/2023	Noise2Music: Text-conditioned Music Generation with Diffusion Models
124			Authors	Google: Huang et al.
125			Citations
126			Link	https://arxiv.org/pdf/2302.03917.pdf
127
128		2/7/2023	Spear-TTS: Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision
129			Authors	Google: Kharitonov et al.
130			Citations
131			Link	https://arxiv.org/abs/2302.03540
132
133		1/30/2023	SingSong: Generating musical accompaniments from singing
134			Authors	Google: Donahue et al.
135			Citations
136			Link	https://arxiv.org/pdf/2301.12662.pdf
137
138		1/30/2023	Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
139			Authors	Huang et al.
140			Citations
141			Link	https://arxiv.org/abs/2301.12661
142
143		1/30/2023	ArchiSound: Audio Generation with Diffusion
144			Authors	Flavio Schneider
145			Citations
146			Link	https://arxiv.org/abs/2301.13267
147
148		1/26/2023	MusicLM: Generating Music From Text
149			Authors	Google: Agostinelli et al.
150			Citations
151			Link	https://arxiv.org/pdf/2301.11325.pdf
152
153		1/5/2023	Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers (Vall-E)
154			Authors	Microsoft: Chengyi Wang, Sanyuan Chen, Yu Wu, Ziqiang Zhang, Long Zhou, Shujie Liu, Zhuo Chen, Yanqing Liu, Huaming Wang, Jinyu Li, Lei He, Sheng Zhao, Furu Wei
155			Citations
156			Link	https://arxiv.org/abs/2301.02111
157
158		11/22/2022	PromptTTS: Controllable TTS with Text Descriptions
159			Authors	Guo et al.
160			Citations
161			Link	https://arxiv.org/pdf/2211.12171.pdf
162
163		7/20/2022	Diffsound: Discrete Diffusion Model for Text-to-sound Generation
164			Authors	Yang et al
165			Citations
166			Link	https://arxiv.org/pdf/2207.09983v1.pdf
167
168		4/20/2022	ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers
169			Authors	Qian et al.
170			Citations
171			Link	https://arxiv.org/abs/2204.09224
172
173		3/30/2022	Generative Spoken Dialogue Language Modeling
174			Authors	Meta: Tu Anh Nguyen, Eugene Kharitonov, Jade Copet, Yossi Adi, Wei-Ning Hsu, Ali Elkahky, Paden Tomasello, Robin Algayres, Benoit Sagot, Abdelrahman Mohamed, Emmanuel Dupoux
175			Citations
176			Link	https://arxiv.org/abs/2203.16502
177
178		11/3/2021	A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion
179			Authors	Ubisoft: van Niekerk et al.
180			Citations
181			Link	https://arxiv.org/abs/2111.02392
182
183		5/13/2021	Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech
184			Authors	Huawei: Vadim Popov, Ivan Vovk, Vladimir Gogoryan, Tasnima Sadekova, Mikhail Kudinov
185			Citations
186			Link	https://arxiv.org/abs/2105.06337
187
188		3/4/2021	Perceiver: General Perception with Iterative Attention
189			Authors	Google: Andrew Jaegle, Felix Gimeno, Andrew Brock, Andrew Zisserman, Oriol Vinyals, Joao Carreira
190			Citations
191			Link	https://arxiv.org/abs/2103.03206
192
193		10/12/2020	HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
194			Authors	Kakao: Kong et al.
195			Citations	1234
196			Link	https://arxiv.org/abs/2010.05646
197
198		6/8/2020	FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
199			Authors	Microsoft: Yi Ren, Chenxu Hu, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu
200			Citations
201			Link	https://arxiv.org/abs/2006.04558
202
203		5/22/2020	Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
204			Authors	Kakao: Jaehyeon Kim, Sungwon Kim, Jungil Kong, Sungroh Yoon
205			Citations
206			Link	https://arxiv.org/abs/2005.11129
207
208		5/12/2020	Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis
209			Authors	Nvidia: Rafael Valle, Kevin Shih, Ryan Prenger, Bryan Catanzaro
210			Citations
211			Link	https://arxiv.org/abs/2005.05957
212
213		5/12/2020	AdaDurIAN: Few-shot Adaptation for Neural Text-to-Speech with DurIAN
214			Authors	Tencent: Zewang Zhang et al.
215			Citations
216			Link	https://arxiv.org/abs/2005.05642
217
218
219		2/4/2020	Boffin TTS: Few-Shot Speaker Adaptation by Bayesian Optimization
220			Authors	Amazon: Henry Moss et al.
221			Citations
222			Link	https://arxiv.org/abs/2002.01953
223
224		2020	Parallel WaveGAN: a fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram
225			Authors	Yamamoto et al.
226			Citations
227			Link
228
229		10/23/2019	Zero-Shot Multi-Speaker Text-to--Speech with State-of-the-art Neural Speaker Embeddings
230			Authors	Erica Cooper et al.
231			Citations
232			Link	https://arxiv.org/abs/1910.10838
233
234		10/8/2019	MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis
235			Authors	Mila, Lyrebird: Kumar et al.
236			Citations	881
237			Link	https://arxiv.org/abs/1910.06711
238
239		5/22/2019	FastSpeech: Fast, Robust and Controllable Text to Speech
240			Authors	Microsoft: Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu
241			Citations
242			Link	https://arxiv.org/abs/1905.09263
243
244		5/2/2019	High quality, lightweight and adaptable TTS using LPCNet
245			Authors	IBM: Zvi Kons et al.
246			Citations
247			Link	https://arxiv.org/abs/1905.00590
248
249		1/2/2019	Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
250			Authors	Google: Ye Jia et al.
251			Citations
252			Link	https://arxiv.org/abs/1806.04558
253
254		11/24/2018	Representation Mixing for TTS Synthesis
255			Authors	Mila: Kyle Kastner, et al.
256			Citations
257			Link	https://arxiv.org/abs/1811.07240
258
259		10/31/2018	WaveGlow: A Flow-based Generative Network for Speech Synthesis
260			Authors	Ryan Prenger, Rafael Valle, Bryan Catanzaro
261			Citations
262			Link	https://arxiv.org/abs/1811.00002
263
264		10/12/2018	Neural Voice Cloning with a Few Samples
265			Authors	Baidu: Sercan Arik et al
266			Citations
267			Link	https://arxiv.org/abs/1802.06006
268
269		9/27/2018	Sample Efficient Adaptive Text-to-Speech
270			Authors	Google: Chen et al.
271			Citations
272			Link	https://arxiv.org/abs/1809.10460
273
274		9/19/2018	Neural Speech Synthesis with Transformer Network
275			Authors	Microsoft: Naihan Li, Shujie Liu, Yanqing Liu, Sheng Zhao, Ming Liu, Ming Zhou
276			Citations
277			Link	https://arxiv.org/abs/1809.08895
278
279		6/12/2018	Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
280			Authors	Google: Ye Jia et al.
281			Citations
282			Link	https://arxiv.org/abs/1806.04558
283
284		3/24/2018	Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron
285			Authors	Google: RJ Skerry-Ryan et al
286			Citations
287			Link	https://arxiv.org/abs/1803.09047
288
289		3/23/2018	Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
290			Authors	Google: Yuxuan Wang et al.
291			Citations
292			Link	https://arxiv.org/abs/1803.09017
293
294		2/23/2018	Efficient Neural Audio Synthesis
295			Authors	Google: Nal Kalchbrenner, Erich Elsen, Karen Simonyan, Seb Noury, Norman Casagrande, Edward Lockhart, Florian Stimberg, Aaron van den Oord, Sander Dieleman, Koray Kavukcuoglu
296			Citations	908
297			Link	https://arxiv.org/abs/1802.08435
298
299		12/16/2017	Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions (Tacotron 2)
300			Authors	Google: Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, Yonghui Wu
301			Citations	2831
302			Link	https://arxiv.org/abs/1712.05884
303
304		10/28/2017	Generalized End-to-End Loss for Speaker Verification
305			Authors	Google: Li Wan et al.
306			Citations
307			Link	https://arxiv.org/abs/1710.10467
308
309		10/28/2017	Speaker diarization with LSTM
310			Authors	Google: Quan Wang et al.
311			Citations
312			Link	https://arxiv.org/abs/1710.10468
313
314		5/5/2017	Deep Speaker: an End-to-End Neural Speaker Embedding System
315			Authors	Baidu: Chao Li et al.
316			Citations
317			Link	https://arxiv.org/abs/1705.02304
318
319		2017	Char2Wav: End-To-End Speech Synthesis
320			Authors	Mila: Jose Sotelo, et al.
321			Citations
322			Link	https://mila.quebec/wp-content/uploads/2017/02/end-end-speech.pdf
323
324		2017	Deep Neural Network Embeddings for Text-Independent Speaker Verification
325			Authors	Snyder et al.
326			Citations
327			Link	https://www.danielpovey.com/files/2017_interspeech_embeddings.pdf
328
329		2017	Non-parallel voice conversion using i-vector PLDA: towards unifying speaker verification and transformation
330			Authors	Kinnunen et al.
331			Citations	28
332			Link	https://ieeexplore.ieee.org/document/7953215
333
334		9/12/2016	WaveNet: A Generative Model for Raw Audio
335			Authors	Google: Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, Koray Kavukcuoglu
336			Citations
337			Link	https://arxiv.org/abs/1609.03499
338
339		12/17/2014	Deep Speech: Scaling up end-to-end speech recognition
340			Authors	Baidu: Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, Andrew Y. Ng
341			Citations
342			Link	https://arxiv.org/abs/1412.5567
343
344		6/3/2014	Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
345			Authors	Umontreal: Cho et al.
346			Citations
347			Link	https://arxiv.org/abs/1406.1078
348
349
350		2010	Front-end factor analysis for speaker verification. IEEE Transact Audio Speech Lang Process
351			Authors	Dehak et al.
352			Citations	2152
353			Link	https://ieeexplore.ieee.org/document/5545402
354
355
356		1996	Unit selection in a concatenative speech synthesis system using a large speech database
357			Authors	AJ Hunt, AW Black
358
359
360
361
362
363
364
365
366
367
368			ICASSP - IEEE Conf on Acoustics Speech Signal Processing