Automatic assignment of labels in Topic Modelling for Russian Corpora
DOI:
https://doi.org/10.36505/ExLing-2016/07/0025/000284Keywords:
topic modelling, topic labelling, Russian corporaAbstract
The main goal of this paper was to improve topic modelling algorithms by introducing automatic topic labelling, a procedure which chooses a label for a cluster of words in a topic. Topic modelling is a widely used statistical technique which allows to reveal internal conceptual organization of text corpora. We have chosen an unsupervised graph-based method and elaborated it with regard to Russian. The proposed algorithm consists of two stages: candidate generation by means of PageRank and morphological filters, and candidate ranking. Our topic labelling experiments on a corpus of encyclopedic texts on linguistics has shown the advantages of labelled topic models for NLP applications.
References
Aletras N., Stevenson M., Court R. 2014. Labelling Topics using Unsupervised Graph-based Methods. In *Proc. of the 52nd Annual Meeting of the Association for Computational Linguistics*, vol. 2, 631-636, Baltimore, USA.
Daud A., Li J., Zhou L., Muhammad F. 2010. Knowledge discovery through directed probabilistic topic models: a survey. *Frontiers of Computer Science in China* 4, 280–301.
Lau J., Grieser K., Newman D., Baldwin T. 2011. Automatic Labelling of Topic Models. In *Proc. of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies*, vol. 1, 1536–1545, Stroudsburg, USA.
Mei Q., Shen X., Zhai C. 2007. Automatic labeling of multinomial topic models. In *Proc. of the 13th Intern. Conference on Knowledge discovery and data mining*, 490, New York, USA.
Mihalcea R. 2004. TextRank: Bringing Order into Texts. In *Proc. of EMNLP 2004*, 404-411, Barcelona, Spain.
Mitrofanova, O.A. 2015. Verojatnostnoje modelirovanije tematiki russkojazychnyh korpusov tekstov s ispol’zovanijem kompjuternogo instrumenta GenSim. [Probabilistic topic modeling of the Russian text corpora by means of GenSim toolkit]. In *Trudy mezhdunarodnoj konferencii «Korpusnaja lingvistika – 2015»*, St.-Petersburg, Russia.
Downloads
Published
Issue
Section
License
Articles are published under the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.