Arabic character diacritization using DNN
DOI:
https://doi.org/10.36505/ExLing-2018/09/0011/000344Keywords:
Arabic characters, diacritic signs, feedforward DNN, input featuresAbstract
In this paper, automatic Arabic character diacritization is more accurately achieved using deep neural networks. Actually, though diacritic signs represent short vowels and/or indicate gemination on consonants, they are omitted in modern standard Arabic (MSA). However, most speech processing applications like speech synthesis and machine translation need such marks to convey the right meaning. Therefore in this work, automatic diacritization accuracy is enhanced using feedforward DNN. The results show that using more significant and Arabic-specific input features increases the prediction accuracy of diacritic signs.References
Habash, N., Rambow, O., Roth, R. 2009. MADA+ TOKAN: A toolkit for Arabic tokenization, diacritization, morphological disambiguation, POS tagging, stemming and lemmatization. In Proceedings of the 2nd international conference on Arabic language resources and tools (MEDAR), Cairo, Egypt (Vol. 41, p. 62).
Halabi, N., Wald, M. 2016. Phonetic inventory for an Arabic speech corpus. In Proceedings of the Tenth International Conference on Language Resources+ and Evaluation (LREC 2016), Slovenia, 734-738.
Rashwan, M.A., Al-Badrashiny, M.A., Attia, M., Abdou, S.M., Rafea, A. 2011. A stochastic Arabic diacritizer based on a hybrid of factorized and unfactorized textual features. IEEE Transactions on Audio, Speech, and Language Processing, 19(1), 166-175.
Rebai, I., BenAyed, Y. 2015. Text-to-speech synthesis system with Arabic diacritic recognition system. Computer Speech & Language, 34(1), 43-60.
Downloads
Published
Issue
Section
License
Articles are published under the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.