Improving intelligibility of synthesized speech in noise with emphasized prosody

Sunil R. Shukla

doi:10.36505/ExLing-2010/03/0041/000161

Authors

Sunil R. Shukla Department Electrical and Computer Engineering, Georgia Institute of Technology, USA Author

DOI:

https://doi.org/10.36505/ExLing-2010/03/0041/000161

Keywords:

TTS, speech synthesis, linear prediction, prosody, noisy speech

Abstract

The performance of current high quality concatenative text-to-speech (TTS) systems is limited under noisy environments. This paper investigates whether or not the intelligibility of synthesized speech in noise can be improved by emphasizing the prosody. Additionally, the paper presents a method that can effectively emphasize the prosody of units in existing TTS databases. The circular linear prediction (CLP) model is combined with the constant-pitch transform (CPT) to perform pitch and duration modifications to concatenative TTS units with little impact to the subjective quality. Test utterances are generated using the method and compared to reference utterances synthesized by a high quality TTS engine. The subjective test results demonstrate a preference for emphasized prosody in the majority of the test cases.

References

Ertan, A.E., Shukla, S., & Barnwell, T. 2002. Circular LPC modeling and constant pitch transform for accurate speech analysis and high quality speech synthesis. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2002), Vol. 1, 269-272. Orlando, FL.

Ertan, A.E. 2004. Pitch-synchronous processing of speech. Doctoral dissertation, School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA.

Langner, B., & Black, A.W. 2005. Improving the understandability of speech synthesis by modeling speech in noise. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2005), Vol. 4, 265-268. Philadelphia, PA.

Rudnicky, A., Thayer, E., Constantinides, P., Tchou, C., Shern, R., Lenzo, K., Xu, W., & Oh, A. 1999. Creating natural dialogs in the Carnegie Mellon Communicator system. In: Proceedings of the 6th European Conference on Speech Communication and Technology (Eurospeech 1999), Vol. 4, 1531-1534. Budapest, Hungary.

Shukla, S., & Barnwell, T. 2007. Improving high quality TTS using circular linear prediction and constant pitch transform. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2007), Vol. 4, 681-684. Honolulu, HI.

Venkatagiri, H.S. 2003. Segmental intelligibility of four currently used text-to-speech synthesis methods. The Journal of the Acoustical Society of America, 113(4), 2095-2104.

Improving intelligibility of synthesized speech in noise with emphasized prosody

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Share

Similar Articles

Keywords

Browse Articles