Eigenvoice speaker adaptation with minimal data for statistical speech synthesis systems using a MAP approach and nearest-neighbors | Kütüphane.osmanlica.com

Eigenvoice speaker adaptation with minimal data for statistical speech synthesis systems using a MAP approach and nearest-neighbors

İsim Eigenvoice speaker adaptation with minimal data for statistical speech synthesis systems using a MAP approach and nearest-neighbors
Yazar Mohammadi, Amir, Sarfjoo, Seyyed Saeed, Demiroğlu, Cenk
Basım Tarihi: 2014-12
Basım Yeri - IEEE
Konu Cluster adaptive training, Eigenvoice adaptation, Nearest neighbor, Speaker adaptation, Statistical speech synthesis
Tür Süreli Yayın
Dil İngilizce
Dijital Evet
Yazma Hayır
Kütüphane: Özyeğin Üniversitesi
Demirbaş Numarası 2329-9290
Kayıt Numarası 381836f7-5637-4ed5-aa56-4a55290fcaf5
Lokasyon Electrical & Electronics Engineering
Tarih 2014-12
Notlar Due to copyright restrictions, the access to the full text of this article is only available via subscription.
Örnek Metin Statistical speech synthesis (SSS) systems have the ability to adapt to a target speaker with a couple of minutes of adaptation data. Developing adaptation algorithms to further reduce the number of adaptation utterances to a few seconds of data can have substantial effect on the deployment of the technology in real-life applications such as consumer electronics devices. The traditional way to achieve such rapid adaptation is the eigenvoice technique which works well in speech recognition but known to generate perceptual artifacts in statistical speech synthesis. Here, we propose three methods to alleviate the quality problems of the baseline eigenvoice adaptation algorithm while allowing speaker adaptation with minimal data. Our first method is based on using a Bayesian eigenvoice approach for constraining the adaptation algorithm to move in realistic directions in the speaker space to reduce artifacts. Our second method is based on finding pre-trained reference speakers that are close to the target speaker and utilizing only those reference speaker models in a second eigenvoice adaptation iteration. Both techniques performed significantly better than the baseline eigenvoice method in the objective tests. Similarly, they both improved the speech quality in subjective tests compared to the baseline eigenvoice method. In the third method, tandem use of the proposed eigenvoice method with a state-of-the-art linear regression based adaptation technique is found to improve adaptation of excitation features.
DOI 10.1109/TASLP.2014.2362009
Cilt 22
Kaynağa git Özyeğin Üniversitesi Özyeğin Üniversitesi
Özyeğin Üniversitesi Özyeğin Üniversitesi
Kaynağa git

Eigenvoice speaker adaptation with minimal data for statistical speech synthesis systems using a MAP approach and nearest-neighbors

Yazar Mohammadi, Amir, Sarfjoo, Seyyed Saeed, Demiroğlu, Cenk
Basım Tarihi 2014-12
Basım Yeri - IEEE
Konu Cluster adaptive training, Eigenvoice adaptation, Nearest neighbor, Speaker adaptation, Statistical speech synthesis
Tür Süreli Yayın
Dil İngilizce
Dijital Evet
Yazma Hayır
Kütüphane Özyeğin Üniversitesi
Demirbaş Numarası 2329-9290
Kayıt Numarası 381836f7-5637-4ed5-aa56-4a55290fcaf5
Lokasyon Electrical & Electronics Engineering
Tarih 2014-12
Notlar Due to copyright restrictions, the access to the full text of this article is only available via subscription.
Örnek Metin Statistical speech synthesis (SSS) systems have the ability to adapt to a target speaker with a couple of minutes of adaptation data. Developing adaptation algorithms to further reduce the number of adaptation utterances to a few seconds of data can have substantial effect on the deployment of the technology in real-life applications such as consumer electronics devices. The traditional way to achieve such rapid adaptation is the eigenvoice technique which works well in speech recognition but known to generate perceptual artifacts in statistical speech synthesis. Here, we propose three methods to alleviate the quality problems of the baseline eigenvoice adaptation algorithm while allowing speaker adaptation with minimal data. Our first method is based on using a Bayesian eigenvoice approach for constraining the adaptation algorithm to move in realistic directions in the speaker space to reduce artifacts. Our second method is based on finding pre-trained reference speakers that are close to the target speaker and utilizing only those reference speaker models in a second eigenvoice adaptation iteration. Both techniques performed significantly better than the baseline eigenvoice method in the objective tests. Similarly, they both improved the speech quality in subjective tests compared to the baseline eigenvoice method. In the third method, tandem use of the proposed eigenvoice method with a state-of-the-art linear regression based adaptation technique is found to improve adaptation of excitation features.
DOI 10.1109/TASLP.2014.2362009
Cilt 22
Özyeğin Üniversitesi
Özyeğin Üniversitesi yönlendiriliyorsunuz...

Lütfen bekleyiniz.