Automatic assignment of labels in Topic Modelling for Russian Corpora

Authors

  • Aliya Mirzagitova Saint Petersburg State University, Russia Author
  • Olga Mitrofanova Saint Petersburg State University, Russia Author

DOI:

https://doi.org/10.36505/ExLing-2016/07/0025/000284

Keywords:

topic modelling, topic labelling, Russian corpora

Abstract

The main goal of this paper was to improve topic modelling algorithms by introducing automatic topic labelling, a procedure which chooses a label for a cluster of words in a topic. Topic modelling is a widely used statistical technique which allows to reveal internal conceptual organization of text corpora. We have chosen an unsupervised graph-based method and elaborated it with regard to Russian. The proposed algorithm consists of two stages: candidate generation by means of PageRank and morphological filters, and candidate ranking. Our topic labelling experiments on a corpus of encyclopedic texts on linguistics has shown the advantages of labelled topic models for NLP applications.

References

Aletras N., Stevenson M., Court R. 2014. Labelling Topics using Unsupervised Graph-based Methods. In *Proc. of the 52nd Annual Meeting of the Association for Computational Linguistics*, vol. 2, 631-636, Baltimore, USA.

Daud A., Li J., Zhou L., Muhammad F. 2010. Knowledge discovery through directed probabilistic topic models: a survey. *Frontiers of Computer Science in China* 4, 280–301.

Lau J., Grieser K., Newman D., Baldwin T. 2011. Automatic Labelling of Topic Models. In *Proc. of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies*, vol. 1, 1536–1545, Stroudsburg, USA.

Mei Q., Shen X., Zhai C. 2007. Automatic labeling of multinomial topic models. In *Proc. of the 13th Intern. Conference on Knowledge discovery and data mining*, 490, New York, USA.

Mihalcea R. 2004. TextRank: Bringing Order into Texts. In *Proc. of EMNLP 2004*, 404-411, Barcelona, Spain.

Mitrofanova, O.A. 2015. Verojatnostnoje modelirovanije tematiki russkojazychnyh korpusov tekstov s ispol’zovanijem kompjuternogo instrumenta GenSim. [Probabilistic topic modeling of the Russian text corpora by means of GenSim toolkit]. In *Trudy mezhdunarodnoj konferencii «Korpusnaja lingvistika – 2015»*, St.-Petersburg, Russia.

Downloads

Published

01-01-2016

How to Cite

Automatic assignment of labels in Topic Modelling for Russian Corpora. (2016). Linguistic Proceedings Series, 7(1), 115-118. https://doi.org/10.36505/ExLing-2016/07/0025/000284

Share

Similar Articles

1-10 of 85

You may also start an advanced similarity search for this article.