Back to Search

A small footprint hybrid statistical/unit selection text-to-speech synthesis system for agglutinative languages

Title	A small footprint hybrid statistical/unit selection text-to-speech synthesis system for agglutinative languages
Author	Güner, Ekrem, Demiroğlu, Cenk
Publication Date:	2012
Publication Place	- IEEE
Subject	Hidden Markov models, Natural language processing, Speech intelligibility, Speech synthesis, Statistical analysis
Type	Document
Language	English
Digital	Yes
Manuscript	No
Library:	Özyeğin University
Library Asset ID	978-1-4673-0044-5
Record ID	519c7f1b-fd96-47cb-89c3-be7fdf3f926c
Library Location	Electrical & Electronics Engineering
Date	2012
Notes	Due to copyright restrictions, the access to the full text of this article is only available via subscription.
Sample Text	Despite its success, unit selection based text-to-speech synthesis (TTS) has has some disadvantages such as sudden discontinuities in speech that distract the listeners. The HMM-based TTS (HTS) approach has been increasingly getting more attention from the TTS research community. One of the advantage is the lack of spurious errors that are observed in the unit selection scheme. Another advantage of the HTS system is the small memory footprint requirement which makes it attractive for embedded devices. Here, we propose a novel hybrid statistical unit selection TTS system for agglutinative languages that aims at improving the quality of the baseline HTS system while keeping the memory footprint small. The intelligibility and quality scores of the baseline system are comparable to the MOS scores of English reported in the Blizzard Challenge tests. Listeners preferred the hybrid system over the baseline system in the A/B preference tests.
DOI	10.1109/ICASSP.2012.6288927

View in source Özyeğin University Özyeğin University - Historical works, archives, and periodicals search engine

Özyeğin University

A small footprint hybrid statistical/unit selection text-to-speech synthesis system for agglutinative languages

Author Güner, Ekrem, Demiroğlu, Cenk

Publication Date 2012

Publication Place - IEEE

Subject Hidden Markov models, Natural language processing, Speech intelligibility, Speech synthesis, Statistical analysis

Type Document

Language English

Digital Yes

Manuscript No

Library Özyeğin University

Library Asset ID 978-1-4673-0044-5

Record ID 519c7f1b-fd96-47cb-89c3-be7fdf3f926c

Library Location Electrical & Electronics Engineering

Date 2012

Notes Due to copyright restrictions, the access to the full text of this article is only available via subscription.

Sample Text Despite its success, unit selection based text-to-speech synthesis (TTS) has has some disadvantages such as sudden discontinuities in speech that distract the listeners. The HMM-based TTS (HTS) approach has been increasingly getting more attention from the TTS research community. One of the advantage is the lack of spurious errors that are observed in the unit selection scheme. Another advantage of the HTS system is the small memory footprint requirement which makes it attractive for embedded devices. Here, we propose a novel hybrid statistical unit selection TTS system for agglutinative languages that aims at improving the quality of the baseline HTS system while keeping the memory footprint small. The intelligibility and quality scores of the baseline system are comparable to the MOS scores of English reported in the Blizzard Challenge tests. Listeners preferred the hybrid system over the baseline system in the A/B preference tests.

DOI 10.1109/ICASSP.2012.6288927