Spoofing voice verification systems with statistical speech synthesis using limited adaptation data

Name: Spoofing voice verification systems with statistical speech synthesis using limited adaptation data
Author: Khodabakhsh, Ali, Mohammadi, Amir, Demiroğlu, Cenk

İsim	Spoofing voice verification systems with statistical speech synthesis using limited adaptation data
Yazar	Khodabakhsh, Ali, Mohammadi, Amir, Demiroğlu, Cenk
Basım Tarihi:	2017-03
Basım Yeri	- Elsevier
Konu	Statistical speech synthesis, Hybrid speech synthesis, Spoofing verification systems, Speaker adaptation, Synthetic speech detection
Tür	Süreli Yayın
Dil	İngilizce
Dijital	Evet
Yazma	Hayır
Kütüphane:	Özyeğin Üniversitesi
Demirbaş Numarası	0885-2308
Kayıt Numarası	973d3380-8171-488c-9eed-66c47f3d29be
Lokasyon	Electrical & Electronics Engineering
Tarih	2017-03
Örnek Metin	State-of-the-art speaker verification systems are vulnerable to spoofing attacks using speech synthesis. To solve the issue, high-performance synthetic speech detectors (SSDs) for attack methods have been proposed recently. Here, as opposed to developing new detectors, we investigate new attack strategies. Investigating new techniques that are specifically tailored for spoofing attacks that can spoof the voice verification system and are difficult to detect is expected to increase the security of voice verification systems by enabling the development of better detectors. First, we investigated the vulnerability of an i-vector based verification system to attacks using statistical speech synthesis (SSS), with a particular focus on the case where the attacker has only a very limited amount of data from the target speaker. Even with a single adaptation utterance, the false alarm rate was found to be 23%. Still, SSS-generated speech is easy to detect (Wu et al., 2015a, 2015b), which dramatically reduces its effectiveness. For more effective attacks with limited data, we propose a hybrid statistical/concatenative synthesis approach and show that hybrid synthesis significantly increases the false alarm rate in the verification system compared to the baseline SSS method. Moreover, proposed hybrid synthesis makes detecting synthetic speech more difficult compared to SSS even when very limited amount of original speech recordings are available to the attacker. To further increase the effectiveness of the attacks, we propose a linear regression method that transforms synthetic features into more natural features. Even though the regression approach is more effective at spoofing the detectors, it is not as effective as the hybrid synthesis approach in spoofing the verification system. An interpolation approach is proposed to combine the linear regression and hybrid synthesis methods, which is shown to provide the best spoofing performance in most cases.
DOI	10.1016/j.csl.2016.08.004
Cilt	42

Kaynağa git Özyeğin Üniversitesi

Aramaya Dön

Özyeğin Üniversitesi

Kaynağa git

Spoofing voice verification systems with statistical speech synthesis using limited adaptation data

Yazar Khodabakhsh, Ali, Mohammadi, Amir, Demiroğlu, Cenk

Basım Tarihi 2017-03

Basım Yeri - Elsevier

Konu Statistical speech synthesis, Hybrid speech synthesis, Spoofing verification systems, Speaker adaptation, Synthetic speech detection

Tür Süreli Yayın

Dil İngilizce

Dijital Evet

Yazma Hayır

Kütüphane Özyeğin Üniversitesi

Demirbaş Numarası 0885-2308

Kayıt Numarası 973d3380-8171-488c-9eed-66c47f3d29be

Lokasyon Electrical & Electronics Engineering

Tarih 2017-03

Örnek Metin State-of-the-art speaker verification systems are vulnerable to spoofing attacks using speech synthesis. To solve the issue, high-performance synthetic speech detectors (SSDs) for attack methods have been proposed recently. Here, as opposed to developing new detectors, we investigate new attack strategies. Investigating new techniques that are specifically tailored for spoofing attacks that can spoof the voice verification system and are difficult to detect is expected to increase the security of voice verification systems by enabling the development of better detectors. First, we investigated the vulnerability of an i-vector based verification system to attacks using statistical speech synthesis (SSS), with a particular focus on the case where the attacker has only a very limited amount of data from the target speaker. Even with a single adaptation utterance, the false alarm rate was found to be 23%. Still, SSS-generated speech is easy to detect (Wu et al., 2015a, 2015b), which dramatically reduces its effectiveness. For more effective attacks with limited data, we propose a hybrid statistical/concatenative synthesis approach and show that hybrid synthesis significantly increases the false alarm rate in the verification system compared to the baseline SSS method. Moreover, proposed hybrid synthesis makes detecting synthetic speech more difficult compared to SSS even when very limited amount of original speech recordings are available to the attacker. To further increase the effectiveness of the attacks, we propose a linear regression method that transforms synthetic features into more natural features. Even though the regression approach is more effective at spoofing the detectors, it is not as effective as the hybrid synthesis approach in spoofing the verification system. An interpolation approach is proposed to combine the linear regression and hybrid synthesis methods, which is shown to provide the best spoofing performance in most cases.

DOI 10.1016/j.csl.2016.08.004

Cilt 42