Finding relevant features for statistical speech synthesis adaptation

Name: Finding relevant features for statistical speech synthesis adaptation
Author: Bruneau, P., Parisot, O., Mohammadi, Amir, Demiroğlu, Cenk, Ghoniem, M., Tamisier, T.

İsim	Finding relevant features for statistical speech synthesis adaptation
Yazar	Bruneau, P., Parisot, O., Mohammadi, Amir, Demiroğlu, Cenk, Ghoniem, M., Tamisier, T.
Basım Tarihi:	2014-05
Basım Yeri	- European Language Resources Association
Konu	Speech synthesis, Speaker adaptation, Feature selection, Visual analytics
Tür	Belge
Dil	İngilizce
Dijital	Evet
Yazma	Hayır
Kütüphane:	Özyeğin Üniversitesi
Demirbaş Numarası	978-2-9517408-8-4
Kayıt Numarası	9573e97d-8e6b-4092-b9f5-bddc3d47470d
Lokasyon	Electrical & Electronics Engineering
Tarih	2014-05
Örnek Metin	Statistical speech synthesis (SSS) models typically lie in a very high-dimensional space. They can be used to allow speech synthesis on digital devices, using only few sentences of input by the user. However, the adaptation algorithms of such weakly trained models suffer from the high dimensionality of the feature space. Because creating new voices is easy with the SSS approach, thousands of voices can be trained and a nearest-neighbor algorithm can be used to obtain better speaker similarity in those limited-data cases. Nearest-neighbor methods require good distance measures that correlate well with human perception. This paper investigates the problem of finding good low-cost metrics, i.e. simple functions of feature values that map with objective signal quality metrics. To this aim, we use high-dimensional data visualization and dimensionality reduction techniques. Data mining principles are also applied to formulate a tractable view of the problem, and propose tentative solutions. With a performance index improved by 36% w.r.t. a naive solution, while using only 0.77% of the respective amount of features, our results are promising. Perspectives on new adaptation algorithms, and tighter integration of data mining and visualization principles are eventually given.

Kaynağa git Özyeğin Üniversitesi

Aramaya Dön

Özyeğin Üniversitesi

Kaynağa git

Finding relevant features for statistical speech synthesis adaptation

Yazar Bruneau, P., Parisot, O., Mohammadi, Amir, Demiroğlu, Cenk, Ghoniem, M., Tamisier, T.

Basım Tarihi 2014-05

Basım Yeri - European Language Resources Association

Konu Speech synthesis, Speaker adaptation, Feature selection, Visual analytics

Tür Belge

Dil İngilizce

Dijital Evet

Yazma Hayır

Kütüphane Özyeğin Üniversitesi

Demirbaş Numarası 978-2-9517408-8-4

Kayıt Numarası 9573e97d-8e6b-4092-b9f5-bddc3d47470d

Lokasyon Electrical & Electronics Engineering

Tarih 2014-05

Örnek Metin Statistical speech synthesis (SSS) models typically lie in a very high-dimensional space. They can be used to allow speech synthesis on digital devices, using only few sentences of input by the user. However, the adaptation algorithms of such weakly trained models suffer from the high dimensionality of the feature space. Because creating new voices is easy with the SSS approach, thousands of voices can be trained and a nearest-neighbor algorithm can be used to obtain better speaker similarity in those limited-data cases. Nearest-neighbor methods require good distance measures that correlate well with human perception. This paper investigates the problem of finding good low-cost metrics, i.e. simple functions of feature values that map with objective signal quality metrics. To this aim, we use high-dimensional data visualization and dimensionality reduction techniques. Data mining principles are also applied to formulate a tractable view of the problem, and propose tentative solutions. With a performance index improved by 36% w.r.t. a naive solution, while using only 0.77% of the respective amount of features, our results are promising. Perspectives on new adaptation algorithms, and tighter integration of data mining and visualization principles are eventually given.