Deep neural networks identify sensitive regions of an acoustic tube

Authors

  • Runhui Song KTH Royal Institute of Technology, Sweden Author
  • Johan Sjons KTH Royal Institute of Technology, Sweden Author
  • Axel Ekström KTH Royal Institute of Technology, Sweden Author

DOI:

https://doi.org/10.36505/TheLinguisticProceedings/2025/16/01/023/000683

Keywords:

tube, machine learning, vocal tract, speech production

Abstract

Tube vocal tract modelling has long been a central component of phonetics and speech acoustics research. This study applies modern data analysis methods, specifically deep neural networks, to derive relationships between perturbations in acoustic tube configurations and resulting formant frequencies across tens of thousands of possible vocal tract configurations. The study demonstrates the validity of this broader methodological framework and shows that the proposed deep neural network pipeline achieves highly accurate formant predictions generated from a computer simulation of the acoustic properties of a close-to-open tube.

References

Carré, R., Divenyi, P., & Mrayati, M. (2017). Speech: A dynamic process. De Gruyter. https://doi.org/10.1515/9781501502019

Fant, G. (1971). Acoustic theory of speech production: With calculations based on X-ray studies of Russian articulations. Walter de Gruyter.

Ingard, U. (1953). On the theory and design of acoustic resonators. The Journal of the Acoustical Society of America, 25(6), 1037–1061. https://doi.org/10.1121/1.1907235

Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. In I. Guyon et al. (Eds.), Advances in Neural Information Processing Systems, 30 (NIPS 2017).

Liljencrants, J., & Fant, G. (1975). Computer program for VT-resonance frequency calculations. STL-QPSR, 16, 15–21.

Mrayati, M., Carré, R., & Guérin, B. (1988). Distinctive regions and modes: A new theory of speech production. Speech Communication, 7(3), 257–286. https://doi.org/10.1016/0167-6393(88)90073-8

Shapley, L. S. (1953/1997). A value for n-person games. In H. W. Kuhn (Ed.), Contributions to the theory of games (pp. 307–317). Princeton University Press.

Stevens, K. N. (1989). On the quantal nature of speech. Journal of Phonetics, 17(1–2), 3–45. https://doi.org/10.1016/S0095-4470(19)31520-7

Downloads

Published

01-09-2025

How to Cite

Deep neural networks identify sensitive regions of an acoustic tube. (2025). Linguistic Proceedings Series, 16(1), 89-92. https://doi.org/10.36505/TheLinguisticProceedings/2025/16/01/023/000683

Share

Similar Articles

1-10 of 255

You may also start an advanced similarity search for this article.