next up previous contents index
Next: Friedrich Neubarth: Tuning speech Up: Thursday, March 26 - Previous: Ziga Golob: Lexical Stress   Contents   Index

Lado Leskovec: Lexical Stress Assignment and Pronunciation Formalization in Expressive TTS


Author: Mr. Lado Leskovec
Email: lado.leskovec@alpineon.com
Affilliation: Alpineon

Abstract:

authors: Ziga Golob, Lado Leskovec, Jerneja Zganec Gros

abstract: Work on lexical stress determination for languages with free lexical stress (e.g., Slovenian) is presented. The current approach to lexical stress assignment in Slovenian TTS is a combination of rule-based methods and large pronunciation lexica. The lexicon file contains the orthography, corresponding pronunciations, lemmas, and morphosyntactic descriptors of lexical entries in a format based on requirements defined by the W3C Voice Browser Activity called PLS (Pronunciation Lexicon Specification).

Providing multiple pronunciations for items that share the same orthography and meaning is important for speech recognition lexicons because they provide information on pronunciation variations within a language. In TTS applications, typically only one pronunciation among the multiple pronunciation possibilities is required. However, sometimes several pronunciation variations are (almost) equally preferred, whereas the actual preferred pronunciation for the TTS engine may depend on the application, and the developers would like to have a mechanism enabling them to systematically choose the preferred one. Typically, one of two almost equally preferred pronunciations yields better rendering of input text if the application requires either over-articulated or fluent pronunciation, or is associated with the intended affective pronunciation variation. Therefore, we propose a new optional attribute to the <phoneme> element in PLS: the pron-style attribute, indicating the preferred pronunciation variation of a lexeme with respect to the desired pronunciation style.

In order to reduce the pronunciation lexicon size for implementation in embedded devices, such as the VideoTRAN translating telephone, which is being developed in our laboratory, machine-learning techniques were used to automatically derive lexical stress positions. Several classifiers were trained to determine the lexical stress position based on the phonological values of the graphemes. Methods yielding compact models suitable for embedded implementation were pursued further. The Quinlan C4.5 decision tree turned out to be the most efficient method in terms of a combination of stress-prediction accuracy, computational speed, and memory consumption. In a pronunciation lexicon containing close to 1.3 million inflected word forms, we were able to correctly determine lexical stress position for approximately 91% of the words and thus reduce the lexicon to only 9% of its original size. In a test with unknown words, the stress position was assigned correctly in 80% of the words.


next up previous contents index
Next: Friedrich Neubarth: Tuning speech Up: Thursday, March 26 - Previous: Ziga Golob: Lexical Stress   Contents   Index
COST2102