AI vs. human (automatic) speech recognition: silence-replacement paradigm as a diagnostic
DOI:
https://doi.org/10.36505/TheLinguisticProceedings/2025/17/02/002/000688Keywords:
vowel importance, consonant importance, ASR, English, silence-replacement paradigmAbstract
This study tests how vowels and consonants contribute to sentence-level word recognition in automatic speech recognition (ASR), using a silence-replacement paradigm modeled on classic human-perception research. I recorded 48 English sentences divided into two sets: 24 with a symmetrical ratio and 24 with an asymmetrical ratio. For each sentence I created two processed versions: CO (consonant-only; vowels replaced by silence) and VO (vowel-only; consonants replaced by silence). I then submitted all stimuli to two state-of-the-art ASR systems, TurboScribe and Whisper, and quantified word recognition as the percentage of original words correctly transcribed. When the material was symmetrical, VO speech outperformed CO speech, mirroring human patterns. However, with asymmetrical material, this advantage reversed dramatically, showing a strong interaction between segment type and stimulus structure.References
Aldholmi, Y. 2018. Segmental contributions to speech intelligibility in nonconcatenative vs. concatenative languages. Doctoral dissertation, University of Wisconsin–Milwaukee.
Aldholmi, Y., Pycha, A. 2023. Segmental contributions to word recognition in Arabic sentences. Poznan Studies in Contemporary Linguistics, 59(2), 257-287.
Chen, F., Wong, L.L., Wong, E.Y. 2013. Assessing the perceptual contributions of vowels and consonants to Mandarin sentence intelligibility. The Journal of the Acoustical Society of America, 134(2), EL178-EL184.
Chen, F., Wong, M.L., Zhu, S., Wong, L.L. 2015. Relative contributions of vowels and consonants in recognizing isolated Mandarin words. Journal of Phonetics, 52, 26-34.
Cole, R.A., Yan, Y., Mak, B., Fanty, M., Bailey, T. 1996. The contribution of consonants versus vowels to word recognition in fluent speech. In 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, Vol. 2, 853-856. IEEE.
Cutler, A., Sebastián-Gallés, N., Soler-Vilageliu, O., Van Ooijen, B. 2000. Constraints of vowels and consonants on lexical selection: Cross-linguistic comparisons. Memory & Cognition, 28(5), 746-755.
Fogerty, D., Kewley-Port, D., Humes, L.E. 2012. The relative importance of consonant and vowel segments to the recognition of words and sentences: Effects of age and hearing loss. The Journal of the Acoustical Society of America, 132(3), 1667-1678.
Van Ooijen, B. 1996. Vowel mutability and lexical selection in English: Evidence from a word reconstruction task. Memory & Cognition, 24(5), 573-583.
Yan, Y., Chen, F., Li, J. 2025. An overview of the impacts of vowels and consonants in speech understanding and their applications. npj Acoustics, 1, 1-8.
Downloads
Published
Issue
Section
License
Articles are published under the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.