A study of parallel techniques for dimensionality reduction and its impact on the quality of text processing algorithms

Pietroń, M.; Wielgosz, M.; Karwatowski, M.; Wiatr, K.

Artykuł - szczegóły

Tytuł artykułu

A study of parallel techniques for dimensionality reduction and its impact on the quality of text processing algorithms

Autorzy

Pietroń M. , Wielgosz M. , Karwatowski M. , Wiatr K.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

The presented algorithms employ the Vector Space Model (VSM) and its enhancements such as TFIDF (Term Frequency Inverse Document Frequency) with Singular Value Decomposition (SVD). TFIDF were applied to emphasize the important features of documents and SVD was used to reduce the analysis space. Consequently, a series of experiments were conducted. They revealed important properties of the algorithms and their accuracy. The accuracy of the algorithms was estimated in terms of their ability to match the human classification of the subject. For unsupervised algorithms the entropy was used as a quality evaluation measure. The combination of VSM, TFIDF, and SVD came out to be the best performing unsupervised algorithm with entropy of 0.16.

Słowa kluczowe

singular value decomposition vector space model TFIDF

Wydawca

Wydawnictwo PAK

Czasopismo

Measurement Automation Monitoring

Rocznik

2015

Tom

Vol. 61, No. 7

Strony

352--353

Opis fizyczny

Biobliogr. 11 poz., rys., tab., wykr., wzory

Twórcy

autor

Pietroń M.

pietron@agh.edu.pl

AGH University of Science and Technology, 30 Mickiewicza Ave., 30-059 Krakow, Poland
ACC CYFRONET AGH, 11 Nawojki St., 30-950 Krakow, Poland

autor

Wielgosz M.

wielgosz@agh.edu.pl

AGH University of Science and Technology, 30 Mickiewicza Ave., 30-059 Krakow, Poland
ACC CYFRONET AGH, 11 Nawojki St., 30-950 Krakow, Poland

autor

Karwatowski M.

mkarwat@agh.edu.pl

AGH University of Science and Technology, 30 Mickiewicza Ave., 30-059 Krakow, Poland
ACC CYFRONET AGH, 11 Nawojki St., 30-950 Krakow, Poland

autor

Wiatr K.

AGH University of Science and Technology, 30 Mickiewicza Ave., 30-059 Krakow, Poland
ACC CYFRONET AGH, 11 Nawojki St., 30-950 Krakow, Poland

Bibliografia

[1] Russek P., Pietroń M, Żurek D, Janiszewski M, Wiatr K., Jamro E., Wielgosz M: Implementation of algorithms for fast text search and files comparison. Proceedings of the High Performance Computer Users Conference KU KDM 2013, pp. 83-84. Academic Computer Centre Cyfronet AGH, Academic Computer Centre Cyfronet AGH, 2013.
[2] Janiszewski M., Pietroń M., Russek P., Jamro E., Dabrowska-Boruch A., Wiatr K., Wielgosz M., Koryciak S.: Parallel mpi implementation of n-gram algorithm for document comparison. ACACES 2013 : the 9th international summer school on Advanced Computer Architecture and Compilation for High-performance and Embedded Systems, pages 217-220, 2013.
[3] Interia.pl. http://interia.pl
[4] Niaz~Arifin S.M., Dasgupta S.: Examining the role of linguistic knowledge sources in the automatic identification and classification of reviews. Proceedings of the 21st international Computational Linguistics Conference, pp. 611-618, 2006.
[5] Seo J., Ko Y.: Automatic text categorization by unsupervised learning. Proceedings of the 18th international conference on computational linguistics}, pages 453-459, 2000.
[6] Seo J., Ko Y., Park J.: Improving text categorization using the importance of sentences. Information Processing and Management, pp. 65-79, 2004.
[7] Boughanem M., Saad Missen M.M.: Using wordnet’s semantic relations for opinion detection in blogs. In Advances in Information Retrieval, vol. 5478, Lecture Notes in Computer Science, pp. 729-733. Springer Berlin, Heidelberg, 2009.
[8] Ghose A. K., Polpinij J.: An ontology-based sentiment classification methodology for online consumer reviews. In Proceedings of the IEEE international conference on Web Intelligence and Intelligent Agent, pp. 518-524, 2008.
[9] Smith M. D., Durant K. T.: Predicting the political sentiment of web log posts using supervised machine learning techniques coupled with feature selection. In Advances in Web Mining and Web Usage Analysis, vol. 4811, Lecture Notes in Computer Science, pp. 187-206. Springer Berlin, Heidelberg, 2007.
[10] Chunping L. Zhao L.: Ontology based opinion mining for movie reviews. In Knowledge Science, Engineering and Management, vol. 5914, Lecture Notes in Computer Science, pp. 204-214. Springer Berlin, Heidelberg, 2009.
[11] Montoyo A., Balahur A.: A feature dependent method for opinion mining and classification. In Proceedings of the IEEE international conference on Natural Language Processing and Knowledge Engineering, pp. 1-7.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-0cf09b69-db04-43be-8985-e3f476478fc1