PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

A study of parallel techniques for dimensionality reduction and its impact on the quality of text processing algorithms

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
The presented algorithms employ the Vector Space Model (VSM) and its enhancements such as TFIDF (Term Frequency Inverse Document Frequency) with Singular Value Decomposition (SVD). TFIDF were applied to emphasize the important features of documents and SVD was used to reduce the analysis space. Consequently, a series of experiments were conducted. They revealed important properties of the algorithms and their accuracy. The accuracy of the algorithms was estimated in terms of their ability to match the human classification of the subject. For unsupervised algorithms the entropy was used as a quality evaluation measure. The combination of VSM, TFIDF, and SVD came out to be the best performing unsupervised algorithm with entropy of 0.16.
Wydawca
Rocznik
Strony
352--353
Opis fizyczny
Biobliogr. 11 poz., rys., tab., wykr., wzory
Twórcy
autor
  • AGH University of Science and Technology, 30 Mickiewicza Ave., 30-059 Krakow, Poland
  • ACC CYFRONET AGH, 11 Nawojki St., 30-950 Krakow, Poland
autor
  • AGH University of Science and Technology, 30 Mickiewicza Ave., 30-059 Krakow, Poland
  • ACC CYFRONET AGH, 11 Nawojki St., 30-950 Krakow, Poland
  • AGH University of Science and Technology, 30 Mickiewicza Ave., 30-059 Krakow, Poland
  • ACC CYFRONET AGH, 11 Nawojki St., 30-950 Krakow, Poland
autor
  • AGH University of Science and Technology, 30 Mickiewicza Ave., 30-059 Krakow, Poland
  • ACC CYFRONET AGH, 11 Nawojki St., 30-950 Krakow, Poland
Bibliografia
  • [1] Russek P., Pietroń M, Żurek D, Janiszewski M, Wiatr K., Jamro E., Wielgosz M: Implementation of algorithms for fast text search and files comparison. Proceedings of the High Performance Computer Users Conference KU KDM 2013, pp. 83-84. Academic Computer Centre Cyfronet AGH, Academic Computer Centre Cyfronet AGH, 2013.
  • [2] Janiszewski M., Pietroń M., Russek P., Jamro E., Dabrowska-Boruch A., Wiatr K., Wielgosz M., Koryciak S.: Parallel mpi implementation of n-gram algorithm for document comparison. ACACES 2013 : the 9th international summer school on Advanced Computer Architecture and Compilation for High-performance and Embedded Systems, pages 217-220, 2013.
  • [3] Interia.pl. http://interia.pl
  • [4] Niaz~Arifin S.M., Dasgupta S.: Examining the role of linguistic knowledge sources in the automatic identification and classification of reviews. Proceedings of the 21st international Computational Linguistics Conference, pp. 611-618, 2006.
  • [5] Seo J., Ko Y.: Automatic text categorization by unsupervised learning. Proceedings of the 18th international conference on computational linguistics}, pages 453-459, 2000.
  • [6] Seo J., Ko Y., Park J.: Improving text categorization using the importance of sentences. Information Processing and Management, pp. 65-79, 2004.
  • [7] Boughanem M., Saad Missen M.M.: Using wordnet’s semantic relations for opinion detection in blogs. In Advances in Information Retrieval, vol. 5478, Lecture Notes in Computer Science, pp. 729-733. Springer Berlin, Heidelberg, 2009.
  • [8] Ghose A. K., Polpinij J.: An ontology-based sentiment classification methodology for online consumer reviews. In Proceedings of the IEEE international conference on Web Intelligence and Intelligent Agent, pp. 518-524, 2008.
  • [9] Smith M. D., Durant K. T.: Predicting the political sentiment of web log posts using supervised machine learning techniques coupled with feature selection. In Advances in Web Mining and Web Usage Analysis, vol. 4811, Lecture Notes in Computer Science, pp. 187-206. Springer Berlin, Heidelberg, 2007.
  • [10] Chunping L. Zhao L.: Ontology based opinion mining for movie reviews. In Knowledge Science, Engineering and Management, vol. 5914, Lecture Notes in Computer Science, pp. 204-214. Springer Berlin, Heidelberg, 2009.
  • [11] Montoyo A., Balahur A.: A feature dependent method for opinion mining and classification. In Proceedings of the IEEE international conference on Natural Language Processing and Knowledge Engineering, pp. 1-7.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-0cf09b69-db04-43be-8985-e3f476478fc1
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.