For example, for little girls who have lost their voices, the improved artificial voice devices can produce age-appropriate voices, instead of the usual voice of an adult male. These advances in artificial voice production have been made possible by results achieved in a research project led by Professor Samuli Siltanen, results that are good news indeed for the approximately 30,000 Finns with vocal cord problems. Siltanen’s project is part of the Academy of Finland’s Computational Science Research Programme (LASTU).
One of the fundamental problems of speech signal analysis is to find the vocal cord excitation signal from a digitally recorded speech sound and to determine the shape of the vocal tract, i.e. the mouth and the throat. This so-called glottal inverse filtering of the speech signal requires a highly specialised form of computer calculation. With traditional techniques, inverse filtration is only possible for low-pitch male voices. Women’s and children’s voices are trickier cases as the higher pitch comes too close in frequency to the lowest resonance of the vocal tract. The novel inverse calculation method developed by Siltanen and his team significantly improves glottal inverse filtering in these cases.
Besides in speech synthesis, inverse filtering is needed in automatic speech recognition. In speech synthesis, a computer will transform text into synthetic speech. The old-fashioned way is to record individual words and play them one after the other, but this seldom produces natural-sounding speech.
“Most speech sounds are a result of a specific process. The air flowing between the vocal folds makes them vibrate. This vibration, if we could hear it, would produce a weird buzzing sound. However, as it moves through the vocal tract, that buzz is transformed into some familiar vowel,” explains Siltanen.
Singing, says Siltanen, is a perfect example of this interplay between the vocal cord response and the vocal tract: “When we sing the vowel ‘a’ in different pitches, our vocal tracts remain unchanged but the frequency of the vocal cord excitation changes. On the other hand, we can also sing different vowels in the same pitch, whereby the shape of the tract changes and the excitation stays the same.”
Speech recognition is widely used, for example, in mobile phones and automatic telephone services. High-quality glottal inverse filtering improves the success rate of speech recognition in noisy environments.
Professor Samuli Siltanen, Department of Mathematics and Statistics, University of Helsinki, tel. +358 9 191 51420 or +358 40 594 3560, firstname.lastname@example.org
Harri Auvinen, Tuomo Raitio, Samuli Siltanen, Paavo Alku: Utilizing Markov Chain Monte Carlo (MCMC) Method for Improved Glottal Inverse Filtering, to appear in the proceedings of InterSpeech 2012 conference, Portland, Oregon, USA, 9–13 Sep 2012.
Tuomo Raitio, Antti Suni, Hannu Pulakka, Martti Vainio, Paavo Alku: Utilizing glottal source pulse library for generating improved excitation signal for HMM-based speech synthesis. In CD Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP’11), Prague, Czech Republic, 22–27 May 2011.
Academy of Finland Communications
Communications Specialist Leena Vähäkylä
Tel. +358 29 533 5139