Cloud-based sentiment analysis for measuring customer satisfaction in the Moroccan banking sector using Naïve Bayes and Stanford NLP

Riadsolh, Anouar; Lasri, Imane; ElBelkacemi, Mourad

doi:10.14313/JAMRIS/4-2020/47

Artykuł - szczegóły

Tytuł artykułu

Cloud-based sentiment analysis for measuring customer satisfaction in the Moroccan banking sector using Naïve Bayes and Stanford NLP

Autorzy

Riadsolh Anouar , Lasri Imane , ElBelkacemi Mourad

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.14313/JAMRIS/4-2020/47

Warianty tytułu

Języki publikacji

Abstrakty

In a world where every day we produce 2.5 quintillion bytes of data, sentiment analysis has been a key for making sense of that data. However, to process huge text data in real-time requires building a data processing pipeline in order to minimize the latency to process data streams. In this paper, we explain and evaluate our proposed real-time customer’ sentiment analysis pipeline on the Moroccan banking sector through data from the web and social network using open-source big data tools such as data ingestion using Apache Kafka, In-memory data processing using Apache Spark, Apache HBase for storing tweets and the satisfaction indicator, and ElasticSearch and Kibana for visualization then NodeJS for building a web application. The performance evaluation of Naïve Bayesian model show that for French Tweets the accuracy has reached 76.19% while for English Tweets the result was unsatisfactory and the resulting accuracy is 56%. To remedy this problem, we used the Stanford core NLP which, for English Tweets, reaches a precision of 80.7%.

Słowa kluczowe

Big Data processing Apache Spark Apache Kafka real-time text processing sentiment analysis Stanford core NLP Naïve Bayes classifier

Wydawca

Łukasiewicz Industrial Research Institute for Automation and Measurements PIAP

Czasopismo

Journal of Automation Mobile Robotics and Intelligent Systems

Rocznik

2020

Tom

Vol. 14, No. 4

Strony

64--71

Opis fizyczny

Bibliogr. 22 poz., rys.

Twórcy

autor

Riadsolh Anouar

anouarriadsolh@yahoo.fr

Laboratory of Conception and Systems, Faculty of Sciences, Mohammed V University in Rabat, Morocco

autor

Lasri Imane

imanelasri95@gmail.com

Laboratory of Conception and Systems, Faculty of Sciences, Mohammed V University in Rabat, Morocco

autor

ElBelkacemi Mourad

mourad_prof@yahoo.fr.

Laboratory of Conception and Systems, Faculty of Sciences, Mohammed V University in Rabat, Morocco

Bibliografia

[1] F. A. Pozzi, E. Fersini, E. Messina and B. Liu, (eds.) Sentiment analysis in social networks,Morgan Kaufmann, 2017, DOI: 10.1016/C2015-0-01864-0.
[2] C. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. Bethard and D. McClosky, “The Stanford CoreNLP Natural Language Processing Toolkit”. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2014, 55–60, DOI: 10.3115/v1/P14-5010.
[3] A. Riadsolh and M. E. Belkacemi, “Toward a Good Decision to Improve the Weight of Control of Expenditure for Local Communities”, American Journal of Applied Sciences, vol. 13, no. 3, 2016, 299–306, DOI: 10.3844/ajassp.2016.299.306.
[4] K. Ming Leung, “Naive Bayesian Classifier”, Polytechnic University, 2007, https://cse.engineering.nyu.edu/~mleung/FRE7851/f07/naiveBayesianClassifier.pdf. Accessed on: 2021-02-05.
[5] “Apache Kafka: a distributed straming platform”, kafka.apache.org. Accessed on: 2021-02-05.
[6] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker and I. Stoica, “Spark: Cluster Computing with Working Sets”. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing (HotCloud’10), 2010.
[7] “Apache HBase ™ Home”, https://hbase.apache.org. Accessed on: 2021-02-05.
[8] A. G. Shoro and T. R. Soomro, “Big Data Analysis: Apache Spark Perspective”, Global Journal of Computer Science and Technology, 2015.
[9] J. Bollen, H. Mao and A. Pepe, “Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena”. In: ICWSM, vol. 11, 2011, 450–453.
[10] L. Dey, S. Chakraborty, A. Biswas, B. Bose and S. Tiwari, “Sentiment Analysis of Review Datasets Using Naïve Bayes’ and K-NN Classifier”, International Journal of Information Engineering and Electronic Business (IJIEEB), vol. 8, no. 4, 2016, DOI: 10.5815/ijieeb.2016.04.07.
[11] “Welcome to Apache Flume — Apache Flume”, http://flume.apache.org/. Accessed on: 2021-02-05.
[12] A. Videla and J. J. W. Williams, RabbitMQ in Action, Manning, 2012.
[13] “Apache ZooKeeper”, https://zookeeper.apache.org. Accessed on: 2021-02-05.
[14] O. O’Malley, “Terabyte sort on apache hadoop”, 2008, sortbenchmark.org/YahooHadoop.pdf. Accessed on: 2021-02-05.
[15] A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, N. Bhagat, S. Mittal and D. Ryaboy, “Storm@twitter”. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, 2014, 147–156, DOI: 10.1145/2588555.2595641.
[16] M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker and I. Stoica, “Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing”. In: Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, 2012.
[17] H. Jing, E. Haihong, L. Guan, and D. Jian, “Survey on NoSQL database”. In: 2011 6th International Conference on Pervasive Computing and Applications, 2011, 363–366, DOI: 10.1109/ICPCA.2011.6106531.
[18] K. Chodorow, “Introduction to MongoDB”, https://archive.fosdem.org/2010/schedule/events/nosql_mongodb_intro.html. Accessed on: 2021-02-05.
[19] “Apache Cassandra”, https://cassandra.apache.org/. Accessed on: 2021-02-05.
[20] K. Shvachko, H. Kuang, S. Radia and R. Chansler, “The Hadoop Distributed File System”. In: Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), 2010, 1–10, DOI: 10.1109/MSST.2010.5496972.
[21] J.-M. Spaggiari and K. O’Dell, Architecting HBase Applications: a Guidebook for Successful Development and Design, O’Reilly Media, 2016.
[22] B. Pang and L. Lee, “Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales”. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, 2005, 115–124, DOI: 10.3115/1219840.1219855.

Uwagi

Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2021).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-9ae46ec4-0ffb-4c73-82fb-e522d258c778