Improving intelligibility of time-scale compressed speech for visually impaired and sighted listeners
DOI:
https://doi.org/10.36505/TheLinguisticProceedings/2025/17/02/016/000702Keywords:
speech, transformations, intelligibility, visual-impairment, perceptionAbstract
Time-scale compression enables faster speech playback but often reduces intelligibility, especially under high compression rates where non-stationary speech components are distorted. This work investigates improving intelligibility for visually impaired and sighted listeners by protecting non-stationary regions during time-compression. Using Waveform Similarity Overlap Add (WSOLA), we propose a protection method that adapts scale factors based on three non-stationarity criteria derived from Root-Mean-Square (RMS) energy and Line Spectrum Frequencies. Experiments with visually impaired and control participants evaluate intelligibility and listener preference across uniform and protected WSOLA variants. Results show that RMS-based-protected WSOLA improves intelligibility, while equal word per minute comparisons reveal smaller perceptual differences. Findings highlight the importance of preserving transient information for accessible high-speed speech.References
Choi, D., Kwak, D., Cho, M., Lee, S. 2020. “Nobody speaks that fast!” An empirical study of speech rate in conversational agents for people with vision impairments. CHI Conference on Human Factors in Computing Systems, 1–13.
Kapilow, D., Stylianou, Y., Schroeter, J. 1999. Detection of non-stationarity in speech signals and its application to time-scaling. Proc. 6th European Conference on Speech Communication and Technology, 2307-2310.
Pantalos, P. 2023. Exploration of non-stationary speech protection for highly intelligible time-scale compression (Master’s thesis). University of Crete, Greece.
Sfakianaki, A. 2021. Designing a Modern Greek sentence corpus for audiological and speech technology research. Proc. 14th International Conference on Greek Linguistics (ICGL14), 1119-1129. University of Patras, Greece.
Verhelst, W., Roelands, M. 1993. An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech. Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2, 554–557.
Downloads
Published
Section
License
Articles are published under the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.