Using frequent pattern mining algorithms in text analysis

Ożdżyński, P.; Zakrzewska, D.

Artykuł - szczegóły

Tytuł artykułu

Using frequent pattern mining algorithms in text analysis

Autorzy

Ożdżyński P. , Zakrzewska D.

Treść / Zawartość

Pełne teksty:

Piotr Ożdżyński, Danuta Zakrzewska USING FREQUENT PATTERN MINING ALGORITHMS IN TEXT ANALYSIS.pdf

Pobierz

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

In text mining, effectiveness of methods depends on document representations. The ones based on frequent word sequences are used in such tasks as categorization, clustering and topic modelling. In the paper a comparison of different algorithms for finding frequent word sequences is presented. There are considered techniques dedicated for market basket analysis such as GSP and PrefixSpan as well as a method based on a suffix array. The investigated techniques are compared with the new approach of searching maximum frequent word sequences in document sets. Performance of the algorithms is examined taking into account execution times for the considered test collections.

Słowa kluczowe

GSP SuffixArray PrefixSpan N-Gram frequent sequences

Wydawca

Wydawnictwo Szkoły Głównej Gospodarstwa Wiejskiego w Warszawie

Czasopismo

Information Systems in Management

Rocznik

2017

Tom

Vol. 6, No. 3

Strony

213--222

Opis fizyczny

Bibliogr. 17 poz., rys., wykr.

Twórcy

autor

Ożdżyński P.

Institute of Information Technology, Lodz University of Technology

autor

Zakrzewska D.

Institute of Information Technology, Lodz University of Technology

Bibliografia

[1] Manning Ch. D., Raghavan P, Schütze H. (2008) An Introduction to Information Retrieval, Cambridge University Press.
[2] Robertson S., Zaragoza H. (2009) The Probabilistic Relevance Framework: BM25 and Beyond, Found. Trends Inf. Retr, 3(4), 333–389.
[3] Burges Ch. J. C. (1998) A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 2, 121–167.
[4] Zhong N., Li Y., Wu Sh.-T. (2012) Effective Pattern Discovery for Text Mining, IEEE Transactions on Data Engineering, 24(1), 30-44.
[5] Aggarwal Ch. C., Han J. [eds] (2014) Frequent Pattern Mining, Springer International Publishing Switzerland.
[6] Garcia-Hernández R. A., Martínez-Trinidad J.F., Carrasco-Ochoa J.A. (2010) Finding maximal sequential patterns in text document collections and single documents, Informatica, 34, 93–101.
[7] Ahonen-Myka H. (2002) Discovery of frequent word sequences in text, Proc. the ESF Exploratory Workshop on Pattern Detection and Discovery, London, UK, 180–189.
[8] Ożdżyński P., Zakrzewska D. (2017) Topic Modeling Based on Frequent Sequences Graphs, Świątek J., Tomczak J.M. (eds.), Advances in Systems Science, Advances in Intelligent Systems and Computing 539, Springer International Publishing, 86-97.
[9] Agrawal, R., Srikant R. (1994) Fast algorithms for mining association rules in large databases, Proc. the 20th International Conference on Very Large Data Bases, VLDB, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 487-499.
[10] Agrawal R., Srikant R. (1995) Mining sequential patterns, Proc. 1995 Int. Conf. Data Engineering (ICDE’95), 3–14
[11] Slimani T., Lazzez A., (2013) Sequential Mining: Patterns and Algorithms Analysis, International Journal of Computer and Electronics Research, 2 (5), 639-647.
[12] Pei J, Han J., Mortazavi-Asl J., Pinto H., Chen Q., Dayal U., Hsu M. (2001) PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth, Proc. 2001 Int. Conf. Data Engineering ( ICDE ’01), 215-224.
[13] Manber U., Myers G. (1989) Suffix arrays: A new method for on-line string searches, SODA ’90 Proc. the first ACM-SIAM symposium on Discrete algorithms, 319-327.
[14] Ożdżyński P. (2014) Text Document Categorization Based on Word Frequent Sequence Mining, Information Systems Architecture and Technology, Contemporary Approaches to Design and Evaluation of Information Systems, 129-138.
[15] ftp://medir.ohsu.edu/pub/ohsumed
[16] http://www.ai.mit.edu/people/jrennie/20Newsgroups/
[17] Fournier-Viger, P., Lin, C.W., Gomariz, A., Gueniche, T., Soltani, A., Deng, Z., Lam, H. T. (2016). The SPMF Open-Source Data Mining Library Version 2. Proc. PKDD 2016 Part III, Springer LNCS 9853, 36-40.

Uwagi

Opracowanie rekordu w ramach umowy 509/P-DUN/2018 ze środków MNiSW przeznaczonych na działalność upowszechniającą naukę (2018).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-cffeb1d0-9c31-4a8f-914a-520df6de12d5