A search of significant phrases for building topic models in text documents

Ożdżyński, P.; Zakrzewska, D.

Artykuł - szczegóły

Tytuł artykułu

A search of significant phrases for building topic models in text documents

Autorzy

Ożdżyński P. , Zakrzewska D.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

A huge amount of documents in the digitalized libraries requires efficient methods for exploring contained there information. ìTopic modelingî is considered as one of the most effective among them. In spite of commonly used approaches for finding occurrences of single words, in the paper building topic models based on phrases is pondered. We propose a methodology, which enables to create a set of significant word sequences and thus limiting the search area to phrases which contain them. The methodology is evaluated on experiments performed on real text datasets. Obtained results are compared with those received by using LDA algorithm.

Słowa kluczowe

topic model frequent sequences LDA

Wydawca

Wydawnictwo Szkoły Głównej Gospodarstwa Wiejskiego w Warszawie

Czasopismo

Information Systems in Management

Rocznik

2016

Tom

Vol. 5, No. 2

Strony

205--214

Opis fizyczny

Bibliogr.11 poz., rys., tab.

Twórcy

autor

Ożdżyński P.

Institute of Information Technology, Lodz University of Technology

autor

Zakrzewska D.

Institute of Information Technology, Lodz University of Technology

Bibliografia

[1] Papadimitriou C., Raghavan P., Tamaki H.; Vempala S. (2000) Latent Semantic Indexing: A probabilistic analysis, Journal of Computer and System Sciences, Vol. 61 (2), 217ñ235
[2] Blei D., Ng A, Jordan M. (2003) Latent Dirichlet allocation, Journal of Machine Learning Research, 3, 993ñ1022
[3] Blei D. (2012) Probabilistic topic models, Communications of the ACM, 55 (4), 77ñ84 214
[4] Danilevsky M., Wang C., Desai N.,, Ren X., Guo J., Han J. (2014) Automatic Construction and Ranking of Topical Keyphrases on Collections of Short Documents, SDMí14
[5] Han J., Pei J., Yin Y., Mao R. (2004) Mining frequent patterns without candidate generation: A frequent-pattern tree approach, Data Min. Knowl. Discov., 8 (1), 53ñ87
[6] El-Kishky A., Song Y., Wang C., Voss C., Han J. (2014) Scalable Topical Phrase Mining from Text Corpora, Proceedings of the VLDB Endowment, Vol. 8 (3), 305−316
[7] Agrawal R., Srikant R. (1995) Fast algorithms for mining association rules in large databases, In Proceedings of the 20th International Conference on Very Large Data Bases, VLDB í94, pages 487ñ499, San Francisco, CA, USA, 1994. Morgan Kaufmann Publishers Inc.
[8] Machine Learning for Language Toolkit http://mallet.cs.umass.edu/
[9] Hamming R.W. (1950) Error detecting and error correcting codes, The Bell System Technical Journal, Vol. 29 (2)
[10] ftp://medir.ohsu.edu/pub/ohsumed
[11] http://www.ai.mit.edu/people/jrennie/20Newsgroups/

Uwagi

Opracowanie ze środków MNiSW w ramach umowy 812/P-DUN/2016 na działalność upowszechniającą naukę.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-a9107ed8-27eb-40bd-bbbd-172dc009835d