PENTATrainer2: A hypothesis-driven prosody modeling tool
DOI:
https://doi.org/10.36505/ExLing-2012/05/0024/000230Keywords:
prosody modeling, analysis-by-synthesis, parallel encoding, target approximation, stochastic optimizationAbstract
Prosody is an essential aspect of speech, as it carries both lexical and non-lexical information. A conventional approach to studying speech prosody is to collect and analyze $F_0$ (fundamental frequency) data based on certain hypotheses and then develop a theory based on the observations, which constitutes the final conclusion of the study. This process is, however, far from complete, as the developed theory has not been actually tested for its ability to predict actual acoustic data.
This paper presents PENTATrainer2, a prosody modeling tool based on the Parallel Encoding and Target Approximation (PENTA) framework. PENTATrainer2 can facilitate prosody studies in testing hypotheses and theories using an automatic analysis-by-synthesis and stochastic learning algorithm. Users can flexibly design the annotation scheme based on their own hypotheses and then find out whether the hypothesized categories can lead to accurate synthetic $F_0$ contours.
PENTATrainer2 consists of three main components: multi-layer annotation, target approximation, and stochastic optimization. First, acoustic data are annotated in parallel layers, each of which corresponds to a functional category that may affect $F_0$ contours. These layers are then compiled into unique functional combinations. The combinations represent underlying invariant representations of communicative functions and their interaction with each other. Target approximation parameters of each combination are then learned through analysis-by-synthesis and stochastic optimization.
Pilot tests of PENTATrainer2 conducted on Thai, Mandarin, and English demonstrate not only high accuracy of the synthesized $F_0$ contours but also distinctive contrasts in the distribution of pitch target parameters. This indicates the effectiveness of PENTATrainer2 in modeling speech prosody.
References
Chen, Y. and Xu, Y. 2006. Production of weak elements in speech—evidence from $F_0$ patterns of neutral tone in standard Chinese. *Phonetica*, 63(1), 47–75.
Gandour, J. 1977. On the interaction between tone and vowel length: Evidence from Thai dialects. *Phonetica*, 34(1), 54–65.
Gandour, J., Potisuk, S., Dechongkit, S. and Ponglorpisit, S. 1992. Tonal coarticulation in Thai disyllabic utterances: A preliminary study. *Linguistics of the Tibeto-Burman Area*, 15(1), 93–110.
Liu, F. and Xu, Y. 2007. Question intonation as affected by word stress and focus in English. In *Proceedings of the 16th International Congress of Phonetic Sciences (ICPhS 2007)*, 1189–1192. Saarbrücken, Germany.
Potisuk, S., Gandour, J. and Harper, M. P. 1997. Contextual variations in trisyllabic sequences of Thai tones. *Phonetica*, 54(1), 22–42.
Prom-on, S., Liu, F. and Xu, Y. 2011. Functional modeling of tone, focus and sentence type in Mandarin Chinese. In *Proceedings of the 17th International Congress of Phonetic Sciences (ICPhS 2011)*, 1638–1641. Hong Kong.
Prom-on, S., Liu, F. and Xu, Y. 2012. Post-low bouncing in Mandarin Chinese: Acoustic analysis and computational modeling. *The Journal of the Acoustical Society of America*, 132(1), 421–432.
Prom-on, S. and Xu, Y. 2012. Pitch target representation of Thai tones. In *Proceedings of the 3rd International Symposium on Tonal Aspects of Languages (TAL 2012)*. Nanjing, China.
Prom-on, S., Xu, Y. and Thipakorn, B. 2009. Modeling tone and intonation in Mandarin and English as a process of target approximation. *The Journal of the Acoustical Society of America*, 125(1), 405–424.
Vainio, M., Järvikivi, J., Aalto, D. and Suni, A. 2010. Phonetic tone signals phonological quantity and word structure. *The Journal of the Acoustical Society of America*, 128(3), 1313–1321.
Xu, Y. 2005. Speech melody as articulatorily implemented communicative functions. *Speech Communication*, 46(3-4), 220–251.
Xu, Y. and Wang, Q. E. 2001. Pitch targets and their realization: Evidence from Mandarin Chinese. *Speech Communication*, 33(4), 319–337.
Downloads
Published
Issue
Section
License
Articles are published under the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.