Analysis of data pre-processing methods for sentiment analysis of reviews

Parlar, Tuba; Ozel, Selma; Song, Fei

doi:10.7494/csci.2019.20.1.3097

Artykuł - szczegóły

Tytuł artykułu

Analysis of data pre-processing methods for sentiment analysis of reviews

Autorzy

Parlar Tuba , Ozel Selma , Song Fei

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.7494/csci.2019.20.1.3097

Warianty tytułu

Języki publikacji

Abstrakty

The goals of this study are to analyze the effects of data pre-processing methods for sentiment analysis and determine which of these pre-processing methods (and their combinations) are effective for English as well as for an agglutinative language like Turkish. We also try to answer the research question of whether there are any differences between agglutinative and non-agglutinative languages in terms of pre-processing methods for sentiment analysis. We find that the performance results for the English reviews are generally higher than those for the Turkish reviews due to the differences between the two languages in terms of vocabularies, writing styles, and agglutinative property of the Turkish language.

Słowa kluczowe

data pre-processing feature selection sentiment analysis text classification

Wydawca

Wydawnictwa AGH

Czasopismo

Computer Science

Rocznik

2019

Tom

Vol. 20 (1)

Strony

123--141

Opis fizyczny

Bibliogr. 28 poz., tab.

Twórcy

autor

Parlar Tuba

tparlar@mku.edu.tr

Mustafa Kemal University, Hatay, Turkiye

autor

Ozel Selma

saozel@cu.edu.tr

Çukurova University, Adana, Turkiye

autor

Song Fei

fsong@uoguelph.ca

University of Guelph, Ontario, Canada

Bibliografia

[1] Abbasi A., Chen H., Salem A.: Sentiment Analysis in Multiple Languages: Feature Selection for Opinion Classification in Web Forums, ACM Transactions on Information Systems, vol. 26(3), pp. 1–34, 2008. http://dx.doi.org/10.1145 /1361684.1361685.
[2] Agarwal B., Mittal N.: Prominent feature extraction for review analysis: an empirical study, Journal of Experimental & Theoretical Artificial Intelligence, vol. 28(3), pp. 485–498, 2016. http://dx.doi.org/10.1080/0952813X.2014.9 77830.
[3] Akba F., Uçan A., Sezer E.A., Sever H.: Assessment of feature selection metrics for sentiment analyses: Turkish movie reviews. In: 8th European Conference on Data Mining, pp. 180–184, Lisbon, Portugal, 2014. http://humir.cs.hacette pe.edu.tr/file/AkbaFUcanA.pdf.
[4] Akın A.A., Akın M.D.: Zemberek, An Open Source Nlp Framework for Turkic Languages, Structure, vol. 10, pp. 1–5, 2007. http://zemberek.googlecode.co m/files/zemberek_makale.pdf.
[5] Asgarian E., Kahani M., Sharifi S.: The Impact of Sentiment Features on the Sentiment Polarity Classification in Persian Reviews, Cognitive Computation, vol. 10(1), pp. 117–135, 2018. http://dx.doi.org/10.1007/s12559-017- 9513-1.
[6] Bird S., Klein E., Loper E.: Natural Language Processing with Python, O’Reilly, 2009. http://www.nltk.org/book_1ed/.
[7] Blitzer J., Dredze M., Pereira F.: Biographies, Bollywood, Boom-Boxes and Blenders: Domain Adaptation for Sentiment Classification. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 440–447, 2007. https://www.aclweb.org/anthology/P07-1056.
[8] Bojanowski P., Grave E., Joulin A., Mikolov T.: Enriching Word Vectors with Subword Information, Transactions of the Association for Computational Linguistics, vol. 5, pp. 135–146, 2017.
[9] Çakici R.: Wide-coverage parsing for Turkish, Ph.D. thesis, PhD Thesis, University of Edinburgh, 2009. http://hdl.handle.net/1842/3807.
[10] Cetin M., Amasyali M.F.: Supervised and traditional term weighting methods for sentiment analysis. In: 21st Signal Processing and Communications Applications Conference (SIU), pp. 1–4, 2013. http://dx.doi.org/10.1109/SIU.2013.6531 173.
[11] Demirtas E., Pechenizkiy M.: Cross-lingual polarity detection with machine translation. In: Second International Workshop on Issues of Sentiment Discovery and Opinion Mining – WISDOM ’13, pp. 1–8. ACM Press, New York, 2013. http://dx.doi.org/10.1145/2502069.2502078.
[12] Despotovic V., Tanikic D.: Sentiment Analysis of Microblogs Using Multilayer Feed-Forward Artificial Neural Networks, Computing and Informatics, vol. 36(5), pp. 1127–1142, 2017. http://www.cai.sk/ojs/index.php/cai/article/viewA rticle/2017_5_1127.
[13] Devitt A., Ahmad K.: Sentiment Polarity Identification in Financial News: A Cohesion-Based Approach. In: Proceedings of Annual Meeting of the Association of Computational Linguistics, pp. 984–991, June 2007. https://www.ac lweb.org/anthology/P07-1124.
[14] Duwairi R., El-Orfali M.: A study of the effects of preprocessing strategies on sentiment analysis for Arabic text, Journal of Information Science, vol. 40(4), pp. 501–513, 2014. http://dx.doi.org/10.1177/0165551514534143.
[15] Erogul U.: Sentiment Analysis in Turkish, Master’s thesis, Middle East Technical University, Turkey, 2009.
[16] Kaya M., Fidan G., Toroslu I.H.: Sentiment Analysis of Turkish Political News. In: 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, pp. 174–180, Macau, China, 2012. http://dx.doi .org/10.1109/WI-IAT.2012.115.
[17] Kim Y.: Convolutional Neural Networks for Sentence Classification. In: Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751. 2014. http: //arxiv.org/abs/1408.5882.
[18] Liu Y., Bi J.W., Fan Z.P.: Multi-class sentiment classification: The experimental comparisons of feature selection and machine learning algorithms, Expert Systems with Applications, vol. 80, pp. 323–339, 2017. http://dx.doi.org/10.10 16/j.eswa.2017.03.042.
[19] Mladenovic M., Mitrovic J., Krstev C., Vitas D.: Hybrid sentiment analysis framework for a morphologically rich language, Journal of Intelligent Information Systems, vol. 46(3), pp. 599–620, 2016. http://dx.doi.org/10.1007/s10844- 015-0372-5.
[20] Nicholls C., Song F.: Comparison of Feature Selection Methods for Sentiment Analysis. In: Farzindar A., Kešelj V. (eds.), Advances in Artificial Intelligence, pp. 286–289, Springer, Berlin, Heidelberg, 2010. https://doi.org/10.1007/97 8-3-642-13059-5_30.
[21] Pang B., Lee L.: A sentimental education. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics – ACL ’04, pp. 1–8, 2004. http://dx.doi.org/10.3115/1218955.1218990.
[22] Pang B., Lee L.: Opinion Mining and Sentiment Analysis, Foundations and Trends in Information Retrieval, vol. 2(1–2), pp. 1–135, 2008. http://dx.d oi.org/10.1561/1500000011.
[23] Pang B., Lee L., Vaithyanathan S.: Thumbs up?: Sentiment classification using machine learning techniques In: Proceedings of the ACL-02 conference on Empirical methods in natural language processing – EMNLP ’02, vol. 10, pp. 79–86, Association for Computational Linguistics, Stroudsburg, PA, USA, 2002. http://dx.doi.org/10.3115/1118693.1118704.
[24] Parlar T., Özel S.A., Song F.: QER: a new feature selection method for sentiment analysis, Human-centric Computing and Information Sciences, vol. 8(1), p. 10, 2018. http://dx.doi.org/10.1186/s13673-018-0135-8.
[25] Sevindi B.I.: Türkçe Metinlerde Denetimli ve Sözlük Tabanlı Duygu Analizi Yaklasımlarının Karsılastırılması, MSc Thesis, Gazi University, 2013.
[26] Witten I.H., Frank E., Hall M.A.: Data mining: Practical Machine Learning Tools and Techniques (Third Edition), Morgan Kaufmann, 2011. https://doi. org/10.1016/B978-0-12-374856-0.00026-2
[27] Yang D.H., Yu G.: A method of feature selection and sentiment similarity for Chinese micro-blogs, Journal of Information Science, vol. 39(4), pp. 429–441, 2013. http://dx.doi.org/10.1177/0165551513480308.
[28] Zheng L., Wang H., Gao S.: Sentimental feature selection for sentiment analysis of Chinese online reviews, International Journal of Machine Learning and Cybernetics, vol. 9(1), pp. 75–84, 2018. http://dx.doi.org/10.1007/s13042-015- 0347-4.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-c8123943-cf0e-46d6-acd2-8b42784e4235