A new trainable trajectory formation system for facial animation
DOI:
https://doi.org/10.36505/ExLing-2006/01/0004/000004Abstract
A new trainable trajectory formation system for facial animation is here proposed that dissociates parametric spaces and methods for movement planning and execution. Movement planning is achieved by HMM-based trajectory formation. Movement execution is performed by concatenation of multi-represented diphones. Movement planning ensures that the essential visual characteristics of visemes are reached (lip closing for bilabials, rounding and opening for palatal fricatives, etc.) and that appropriate coarticulation is planned. Movement execution grafts phonetic details and idiosyncratic articulatory strategies (asymmetries, importance of jaw movements, etc.) to the planned gestural score.
References
Badin, P., G. Bailly, L. Revéret, M. Baciu, C. Segebarth and C. Savariaux (2002). “Three-dimensional linear articulatory modelling of tongue, lips and face based on MRI and video images.” Journal of Phonetics 30 (3): 533-553.
Bailly, G. (1998). “Learning to speak. Sensori-motor control of speech movements.” Speech Communication 22 (2-3): 251-267.
Bailly, G., G. Gibert and M. Odisio (2002). Evaluation of movement generation systems using the pointlight technique. IEEE Workshop on Speech Synthesis, Santa Monica, CA: 27-30.
Donovan, R. (1996). Trainable speech synthesis. PhD thesis. Univ. Eng. Dept. Cambridge, UK, University of Cambridge: 164 p.
Gibert, G., G. Bailly, D. Beautemps, F. Elisei and R. Brun (2005). “Analysis and synthesis of the 3D movements of the head, face and hand of a speaker using cued speech.” Journal of Acoustical Society of America 118 (2): 1144-1153.
Govokhina, O., G. Bailly, G. Breton and P. Bagshaw (2006). Evaluation de systèmes de génération de mouvements faciaux. Journées d'Etudes sur la Parole, Rennes - France: accepted.
Hardcastle, W. J. and N. Hewlett (1999). Coarticulation: Theory, Data, and Techniques. Cambridge, UK, Press Syndicate of the University of Cambridge.
Munhall, K. G. and Y. Tohkura (1998). “Audiovisual gating and the time course of speech perception.” Journal of the Acoustical Society of America 104: 530-539.
Odisio, M. and G. Bailly (2004). “Tracking talking faces with shape and appearance models.” Speech Communication 44 (1-4): 63-82.
Öhman, S. E. G. (1967). “Numerical model of coarticulation.” Journal of the Acoustical Society of America 41: 310-320.
Revéret, L., G. Bailly and P. Badin (2000). MOTHER: a new generation of talking heads providing a flexible articulatory control for video-realistic speech animation. International Conference on Speech and Language Processing, Beijing - China: 755-758.
Saltzman, E. L. and K. G. Munhall (1989). “A dynamical approach to gestural patterning in speech production.” Ecological Psychology 1 (4): 1615-1623.
Tamura, M., S. Kondo, T. Masuko and T. Kobayashi (1999). Text-to-audio-visual speech synthesis based on parameter generation from HMM. EUROSPEECH, Budapest, Hungary: 959–962.
Tokuda, K., T. Yoshimura, T. Masuko, T. Kobayashi and T. Kitamura (2000). Speech parameter generation algorithms for HMM-based speech synthesis. IEEE International Conference on Acoustics, Speech, and Signal Processing, Istanbul, Turkey: 1315–1318.
Whalen, D. H. (1990). “Coarticulation is largely planned.” Journal of Phonetics 18 (1): 3-35.
Zen, H., K. Tokuda and T. Kitamura (2004). An introduction of trajectory model into HMM-based speech synthesis. ISCA Speech Synthesis Workshop, Pittsburgh, PE: 191-196.
Downloads
Published
Issue
Section
License
Copyright (c) 2006 Oxana Govokhina (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.
Articles are published under the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.