Exploring the use of syntactic dependency features for document-level sentiment classification

Kalaivani, K. S.; Kuppuswami, S.

doi:10.24425/bpas.2019.128608

Artykuł - szczegóły

Tytuł artykułu

Exploring the use of syntactic dependency features for document-level sentiment classification

Autorzy

Kalaivani K. S. , Kuppuswami S.

Treść / Zawartość

Pełne teksty:

19_339-347_00982_Bpast.No.67-2_28.04.19_K3.pdf

Pobierz

Identyfikatory

DOI

10.24425/bpas.2019.128608

Warianty tytułu

Języki publikacji

Abstrakty

An automatic analysis of product reviews requires deep understanding of the natural language text by machine. The limitation of bag-of-words (BoW) model is that a large amount of word relation information from the original sentence is lost and the word order is ignored. Higher-order-N-grams also fail to capture the long-range dependency relations and word order information. To address these issues, syntactic features extracted from the dependency relations can be used for machine learning based document-level sentiment classification. Generalization of syntactic dependency features and negation handling is used to achieve more accurate classification. Further to reduce the huge dimensionality of the feature space, feature selection methods based on information gain (IG) and weighted frequency and odds (WFO) are used. A supervised feature weighting scheme called delta term frequency-inverse document frequency (TF-IDF) is also employed to boost the importance of discriminative features using the observed uneven distribution of features between the two classes. Experimental results show the effectiveness of generalized syntactic dependency features over standard features for sentiment classification using Boolean multinomial naive Bayes (BMNB) classifier.

Słowa kluczowe

document-level sentiment classification syntactic dependency features generalized dependency features information gain weighted frequency weighted odds

zdobywanie informacji częstotliwość szanse

Wydawca

Polska Akademia Nauk, Wydział IV Nauk Technicznych

Czasopismo

Bulletin of the Polish Academy of Sciences. Technical Sciences

Rocznik

2019

Tom

Vol. 67, nr 2

Strony

339--347

Opis fizyczny

Bibliogr. 30 poz., wykr., tab.

Twórcy

autor

Kalaivani K. S.

kalaiprani123@gmail.com

Kongu Engineering College, Perundurai – 638060, Erode, India

autor

Kuppuswami S.

Kongu Engineering College, Perundurai – 638060, Erode, India

Bibliografia

[1] C.C. Aggarwal, “Opinion Mining and Sentiment Analysis.” Machine Learning for Text, pp. 413‒434. Springer, Cham, 2018.
[2] Y. Dang, Y. Zhang, and H. Chen, “A lexicon enhanced method for sentiment classification: an experiment on online product reviews”, IEEE Intell Syst 25(4), 46‒53 (2010).
[3] M. Dragoni, S. Poria, and E. Cambria., “OntoSenticNet: A commonsense ontology for sentiment analysis”, IEEE Intelligent Systems, 33 (3), pp. 77‒85, 2018.
[4] P. Turney, “Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews”, Proceedings of the 40th annual meeting of the Association for Computational Linguistics (ACL), Philadelphia, 417‒424 (2002).
[5] A. Pak and P. Paroubek, “Text representation using dependency tree sub-graphs for sentiment analysis”, Proceedings of the 16th international conference DASFAA workshop, vol. 6637, pp. 323–332, Hong Kong, 2011.
[6] B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up? sentiment classification using machine learning techniques”, Proceedings of the conference on empirical methods in natural language processing (EMNLP), Prague, pp. 79‒86 (2002).
[7] R. Xia and C. Zong, “Exploring the use of word relation features for sentiment classification”, Proceedings of the 23rd International Conference on Computational Linguistics, pp. 1336‒1344 (2010).
[8] G. Sidorov, F. Velasquez, E. Stamatatos, A. Gelbukh, and L. Chanona-Hernández, “Syntactic dependency-based n-grams as classification features”, Mexican International Conference on Artificial Intelligence, Springer, Berlin, Heidelberg, pp. 1‒11 (2012).
[9] G. Sidorov, “Syntactic dependency based n-grams in rule based automatic English as second language grammar correction”, International Journal of Computational Linguistics and Applications, 4 (2),169‒88 (2013).
[10] G. Sidorov, F. Velasquez, E. Stamatatos, A. Gelbukh, and L. Chanona-Hernández, “Syntactic n-grams as machine learning features for natural language processing”, Expert Systems with Applications. 41 (3), 853‒60 (2014).
[11] A. Segura-Olivares, A. García, and H. Calvo, “Feature Analysis for paraphrase recognition and textual entailment”, Research in Computing Science, 119‒44 (2013).
[12] A. Esuli and F. Sebastiani, “SentiWordNet: a publicly available lexical resource for opinion mining”, Proceedings of Language Resources and Evaluation (2006).
[13] L.P. Hung and R. Alfred, “A performance comparison of feature extraction methods for sentiment analysis”, Advanced Topics in Intelligent Information and Database Systems, Springer International Publishing, 2017.
[14] S.D. Sarkar and S. Goswami, “Empirical study on filter based feature selection methods for text classification”, Int. J. Comput. Appl., 81 (6), 0975–8887 (2013).
[15] A. Sharma and S. Dey, “Performance investigation of feature selection methods and sentiment lexicons for sentiment analysis”, IJCA Special Issue on Advanced Computing and Comm Technologies for HPC Applications, vol. 3, pp. 15–20 (2012).
[16] A. Novikov, M. Trofimov and I. Oseledets. “Exponential machines”, Bull. Pol. Ac.: Tech. 66, no. 6 (2018).
[17] Y. Mejova and P. Srinivasan, “Exploring feature definition and selection for sentiment classifiers”, ICWSM (2011).
[18] M. Gamon, “Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis”, Proceedings of the 20th international conference on Computational Linguistics, (2004).
[19] S. Matsumoto, H. Takamura, and M. Okumura, “Sentiment classification using word sub-sequences and dependency sub-trees”, Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, Berlin, Heidelberg, pp. 301‒311 (2005).
[20] T. Wilson, J. Wiebe, and P. Hoffmann, “Recognizing contextual polarity in phrase-level sentiment analysis”, Proceedings of the conference on human language technology and empirical methods in natural language processing (2005).
[21] K. Dave, S. Lawrence, and D.M. Pennock, “Mining the peanut gallery: opinion extraction and semantic classification of product reviews”, Proceedings of the 12th international conference on World Wide Web (WWW), Budapest, 519‒528 (2003).
[22] V. Ng, S. Dasgupta, and S.M. Arifin, “Examining the role of linguistic knowledge sources in the automatic identification and classification of reviews”, Proceedings of the COLING/ACL on Main conference poster sessions, pp.611‒618 (2006).
[23] J. Wiebe and E. Riloff, “Creating subjective and objective sentence classifiers from unannotated texts”, International Conference on Intelligent Text Processing and Computational Linguistics, Springer, Berlin, Heidelberg, pp. 486‒497 (2005).
[24] M. Joshi and C. Penstein-Rosé, “Generalizing dependency features for opinion mining”, Proceedings of the ACL-IJCNLP 2009 conference short papers, pp. 313‒316 (2009).
[25] C.D. Manning, P. Raghvan, and H. Schutze, “Introduction to information retrieval”, Cambridge University Press, Cambridge (2008).
[26] Y. Yang and J. Pedersen, “A comparative study on feature selection in text categorization”, Proceedings of International Conference of Machine Learning, pp. 412‒420 (1997).
[27] T. Parlar, S.A. Özel, and F. Song, “QER: a new feature selection method for sentiment analysis”, Human-centric Computing and Information Sciences, 8 (1), p. 10 (2018).
[28] Z.H. Deng, K.H. Luo, and H.L. Yu, “A study of supervised term Weighting Scheme for sentiment analysis”, Expert Systems with Applications, pp.3506‒3513 (2014).
[29] B. Agarwal and N. Mittal, “Optimal feature selection for sentiment analysis”, CICLing. 7817 (1), pp. 13–24 (2013).
[30] J. Blitzer, M. Dredze, and F. Pereira, “Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification”, ACL (2007).

Uwagi

Opracowanie rekordu w ramach umowy 509/P-DUN/2018 ze środków MNiSW przeznaczonych na działalność upowszechniającą naukę (2019).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-7e5eb7da-1fc0-446c-965b-44168f18479f