Deep neural networks identify sensitive regions of an acoustic tube

Runhui Song; Johan Sjons; Axel Ekström

doi:10.36505/TheLinguisticProceedings/2025/16/01/023/000683

Authors

Runhui Song KTH Royal Institute of Technology, Sweden Author
Johan Sjons KTH Royal Institute of Technology, Sweden Author
Axel Ekström KTH Royal Institute of Technology, Sweden Author

DOI:

https://doi.org/10.36505/TheLinguisticProceedings/2025/16/01/023/000683

Keywords:

tube, machine learning, vocal tract, speech production

Abstract

Tube vocal tract modelling has long been a central component of phonetics and speech acoustics research. This study applies modern data analysis methods, specifically deep neural networks, to derive relationships between perturbations in acoustic tube configurations and resulting formant frequencies across tens of thousands of possible vocal tract configurations. The study demonstrates the validity of this broader methodological framework and shows that the proposed deep neural network pipeline achieves highly accurate formant predictions generated from a computer simulation of the acoustic properties of a close-to-open tube.

References

Carré, R., Divenyi, P., & Mrayati, M. (2017). Speech: A dynamic process. De Gruyter. https://doi.org/10.1515/9781501502019

Fant, G. (1971). Acoustic theory of speech production: With calculations based on X-ray studies of Russian articulations. Walter de Gruyter.

Ingard, U. (1953). On the theory and design of acoustic resonators. The Journal of the Acoustical Society of America, 25(6), 1037–1061. https://doi.org/10.1121/1.1907235

Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. In I. Guyon et al. (Eds.), Advances in Neural Information Processing Systems, 30 (NIPS 2017).

Liljencrants, J., & Fant, G. (1975). Computer program for VT-resonance frequency calculations. STL-QPSR, 16, 15–21.

Mrayati, M., Carré, R., & Guérin, B. (1988). Distinctive regions and modes: A new theory of speech production. Speech Communication, 7(3), 257–286. https://doi.org/10.1016/0167-6393(88)90073-8

Shapley, L. S. (1953/1997). A value for n-person games. In H. W. Kuhn (Ed.), Contributions to the theory of games (pp. 307–317). Princeton University Press.

Stevens, K. N. (1989). On the quantal nature of speech. Journal of Phonetics, 17(1–2), 3–45. https://doi.org/10.1016/S0095-4470(19)31520-7

Deep neural networks identify sensitive regions of an acoustic tube

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Share

Similar Articles

Keywords

Browse Articles