Deep learning-based expressive speech synthesis: a systematic review of approaches, challenges, and resources

Name: Deep learning-based expressive speech synthesis: a systematic review of approaches, challenges, and resources
Author: Barakat, Huda Mohammed Mohammed, Turk, O., Demiroğlu, Cenk

İsim	Deep learning-based expressive speech synthesis: a systematic review of approaches, challenges, and resources
Yazar	Barakat, Huda Mohammed Mohammed, Turk, O., Demiroğlu, Cenk
Basım Tarihi:	2024-02-12
Basım Yeri	- Springer
Konu	Speech synthesis, Expressive speech, Emotional speech, Deep learning
Tür	Süreli Yayın
Dil	İngilizce
Dijital	Evet
Yazma	Hayır
Kütüphane:	Özyeğin Üniversitesi
Demirbaş Numarası	1687-4722
Kayıt Numarası	963084cb-26c2-4056-9ada-8909ff95a686
Lokasyon	Electrical & Electronics Engineering
Tarih	2024-02-12
Örnek Metin	Speech synthesis has made significant strides thanks to the transition from machine learning to deep learning models. Contemporary text-to-speech (TTS) models possess the capability to generate speech of exceptionally high quality, closely mimicking human speech. Nevertheless, given the wide array of applications now employing TTS models, mere high-quality speech generation is no longer sufficient. Present-day TTS models must also excel at producing expressive speech that can convey various speaking styles and emotions, akin to human speech. Consequently, researchers have concentrated their efforts on developing more efficient models for expressive speech synthesis in recent years. This paper presents a systematic review of the literature on expressive speech synthesis models published within the last 5 years, with a particular emphasis on approaches based on deep learning. We offer a comprehensive classification scheme for these models and provide concise descriptions of models falling into each category. Additionally, we summarize the principal challenges encountered in this research domain and outline the strategies employed to tackle these challenges as documented in the literature. In the Section 8, we pinpoint some research gaps in this field that necessitate further exploration. Our objective with this work is to give an all-encompassing overview of this hot research area to offer guidance to interested researchers and future endeavors in this field.
DOI	10.1186/s13636-024-00329-7
Cilt	2024

Kaynağa git Özyeğin Üniversitesi

Aramaya Dön

Özyeğin Üniversitesi

Kaynağa git

Deep learning-based expressive speech synthesis: a systematic review of approaches, challenges, and resources

Yazar Barakat, Huda Mohammed Mohammed, Turk, O., Demiroğlu, Cenk

Basım Tarihi 2024-02-12

Basım Yeri - Springer

Konu Speech synthesis, Expressive speech, Emotional speech, Deep learning

Tür Süreli Yayın

Dil İngilizce

Dijital Evet

Yazma Hayır

Kütüphane Özyeğin Üniversitesi

Demirbaş Numarası 1687-4722

Kayıt Numarası 963084cb-26c2-4056-9ada-8909ff95a686

Lokasyon Electrical & Electronics Engineering

Tarih 2024-02-12

Örnek Metin Speech synthesis has made significant strides thanks to the transition from machine learning to deep learning models. Contemporary text-to-speech (TTS) models possess the capability to generate speech of exceptionally high quality, closely mimicking human speech. Nevertheless, given the wide array of applications now employing TTS models, mere high-quality speech generation is no longer sufficient. Present-day TTS models must also excel at producing expressive speech that can convey various speaking styles and emotions, akin to human speech. Consequently, researchers have concentrated their efforts on developing more efficient models for expressive speech synthesis in recent years. This paper presents a systematic review of the literature on expressive speech synthesis models published within the last 5 years, with a particular emphasis on approaches based on deep learning. We offer a comprehensive classification scheme for these models and provide concise descriptions of models falling into each category. Additionally, we summarize the principal challenges encountered in this research domain and outline the strategies employed to tackle these challenges as documented in the literature. In the Section 8, we pinpoint some research gaps in this field that necessitate further exploration. Our objective with this work is to give an all-encompassing overview of this hot research area to offer guidance to interested researchers and future endeavors in this field.

DOI 10.1186/s13636-024-00329-7

Cilt 2024