Saudi accented Arabic voice bank

Mansour Alghamdi; Fayez Alhargan; Mohamed Alkanhal; Ashraf Alkhairy; Munir Eldesouki; Ammar Alenazi

doi:10.36505/ExLing-2008/02/0003/000062

Authors

Mansour Alghamdi King Abdulaziz City for Science and Technology, Saudi Arabia Author
Fayez Alhargan King Abdulaziz City for Science and Technology, Saudi Arabia Author
Mohamed Alkanhal King Abdulaziz City for Science and Technology, Saudi Arabia Author
Ashraf Alkhairy King Abdulaziz City for Science and Technology, Saudi Arabia Author
Munir Eldesouki King Abdulaziz City for Science and Technology, Saudi Arabia Author
Ammar Alenazi King Abdulaziz City for Science and Technology, Saudi Arabia Author

DOI:

https://doi.org/10.36505/ExLing-2008/02/0003/000062

Keywords:

Arabic speech database Saudi

Abstract

The aim of this paper is to present an Arabic speech database that represents Arabic native speakers from all the cities of Saudi Arabia. The database is called the Saudi Accented Arabic Voice Bank (SAAVB). Preparing the prompt sheets, selecting the right speakers and transcribing their speech are some of the challenges that faced the project team. The procedures that met these challenges are highlighted. In the project, 1033 speakers speak in Modern Standard Arabic with a Saudi accent. The SAAVB content was analyzed and the results are illustrated. The content was verified internally by the project team and externally by IBM Cairo and can be used to train speech engines such as automatic speech recognition and speaker verification systems.

References

Alghamdi, M., F. Alhargan, M. Alkanhal, A. Alkhairi, M. Aldusuqi. 2003. Saudi Accented Arabic Voice Bank. Final Report. Computer and Electronic Research Institute, King Abdulaziz City for Science and Technology.

Bernstein, J. Taussig, K. And Godfrey, J. 1994. Macrophone: an American English telephone speech corpus for the Polyphone project. Acoustics, Speech, and Signal Processing, 1: I/81-I/84.

Langmann, D., R. Haeb-Umbach, L. Boves and E. den Os. 1996. FRESCO: The French Telephone Speech Data Collection - Part of the European SpeechDat(M) Project. FRESCO. The Fourth International Conference on Spoken Language Processing. Philadelphia. 1: 1918-1921.

Lo, W. K., T. Lee and P. C. Ching. 1998. Development of Cantonese spoken language corpora for speech applications. Proceedings of the First International Symposium on Chinese Spoken Language Processing. 102-107. Singapore.

Ministry of Economy and Planning:

Muthusamy, Y., E. Holliman, B. Wheatley, J. Picone and J. Godfrey. 1995. Voice across Hispanic America: A telephone speech corpus of American Spanish. Acoustics, 1995 International Conference on Speech, and Signal Processing, ICASSP-95. 1: 85-88.

Robinson, T., J. Fransen, D. Pye, J. Foote and S. Renals. 1995. WSJCAMO: A British English speech corpus for large vocabulary continuous speech recognition. 1995 International Conference on Acoustics, Speech, and Signal Processing, ICASSP-95. 1: 81-84.

TIMIT: Acoustic-Phonetic Continuous Speech Corpus. DMI. 1990.

Tseng, C., Y. Cheng, W. Lee and F. Huang. 2003. Collecting Mandarin Speech Databases for Prosody Investigations, The Oriental COCOSDA. Singapore.

Vonwiller, J. P., et. al., 1996. (Speaker and Material Selection for the Australian National Database of Spoken Language), Journal of Quantitative Linguistics, 27.

Zheng, T. F., P. Yan1, H. Sun, M. Xu, and W. Wu. 2002. Collection of a Chinese Spontaneous Telephone Speech Corpus and Proposal of Robust Rules for Robust Natural Language Parsing. Joint International Conference of SNLP-OCOCOSDA, Hua Hin, Thailand: 60-67.

Saudi accented Arabic voice bank

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles

Keywords

Browse Articles

Share