PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Cloud-based sentiment analysis for measuring customer satisfaction in the Moroccan banking sector using Naïve Bayes and Stanford NLP

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
In a world where every day we produce 2.5 quintillion bytes of data, sentiment analysis has been a key for making sense of that data. However, to process huge text data in real-time requires building a data processing pipeline in order to minimize the latency to process data streams. In this paper, we explain and evaluate our proposed real-time customer’ sentiment analysis pipeline on the Moroccan banking sector through data from the web and social network using open-source big data tools such as data ingestion using Apache Kafka, In-memory data processing using Apache Spark, Apache HBase for storing tweets and the satisfaction indicator, and ElasticSearch and Kibana for visualization then NodeJS for building a web application. The performance evaluation of Naïve Bayesian model show that for French Tweets the accuracy has reached 76.19% while for English Tweets the result was unsatisfactory and the resulting accuracy is 56%. To remedy this problem, we used the Stanford core NLP which, for English Tweets, reaches a precision of 80.7%.
Twórcy
  • Laboratory of Conception and Systems, Faculty of Sciences, Mohammed V University in Rabat, Morocco
autor
  • Laboratory of Conception and Systems, Faculty of Sciences, Mohammed V University in Rabat, Morocco
  • Laboratory of Conception and Systems, Faculty of Sciences, Mohammed V University in Rabat, Morocco
Bibliografia
  • [1] F. A. Pozzi, E. Fersini, E. Messina and B. Liu, (eds.) Sentiment analysis in social networks,Morgan Kaufmann, 2017, DOI: 10.1016/C2015-0-01864-0.
  •  [2] C. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. Bethard and D. McClosky, “The Stanford CoreNLP Natural Language Processing Toolkit”. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2014, 55–60, DOI: 10.3115/v1/P14-5010.
  •  [3] A. Riadsolh and M. E. Belkacemi, “Toward a Good Decision to Improve the Weight of Control of Expenditure for Local Communities”, American Journal of Applied Sciences, vol. 13, no. 3, 2016, 299–306, DOI: 10.3844/ajassp.2016.299.306.
  •  [4] K. Ming Leung, “Naive Bayesian Classifier”, Polytechnic University, 2007, https://cse.engineering.nyu.edu/~mleung/FRE7851/f07/naiveBayesianClassifier.pdf. Accessed on: 2021-02-05.
  •  [5] “Apache Kafka: a distributed straming platform”, kafka.apache.org. Accessed on: 2021-02-05.
  •  [6] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker and I. Stoica, “Spark: Cluster Computing with Working Sets”. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing (HotCloud’10), 2010.
  •  [7] “Apache HBase ™ Home”, https://hbase.apache.org. Accessed on: 2021-02-05.
  •  [8] A. G. Shoro and T. R. Soomro, “Big Data Analysis: Apache Spark Perspective”, Global Journal of Computer Science and Technology, 2015.
  •  [9] J. Bollen, H. Mao and A. Pepe, “Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena”. In: ICWSM, vol. 11, 2011, 450–453.
  • [10] L. Dey, S. Chakraborty, A. Biswas, B. Bose and S. Tiwari, “Sentiment Analysis of Review Datasets Using Naïve Bayes’ and K-NN Classifier”, International Journal of Information Engineering and Electronic Business (IJIEEB), vol. 8, no. 4, 2016, DOI: 10.5815/ijieeb.2016.04.07.
  • [11] “Welcome to Apache Flume — Apache Flume”, http://flume.apache.org/. Accessed on: 2021-02-05.
  • [12] A. Videla and J. J. W. Williams, RabbitMQ in Action, Manning, 2012.
  • [13] “Apache ZooKeeper”, https://zookeeper.apache.org. Accessed on: 2021-02-05.
  • [14] O. O’Malley, “Terabyte sort on apache hadoop”, 2008, sortbenchmark.org/YahooHadoop.pdf. Accessed on: 2021-02-05.
  • [15] A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, N. Bhagat, S. Mittal and D. Ryaboy, “Storm@twitter”. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, 2014, 147–156, DOI: 10.1145/2588555.2595641.
  • [16] M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker and I. Stoica, “Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing”. In: Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, 2012.
  • [17] H. Jing, E. Haihong, L. Guan, and D. Jian, “Survey on NoSQL database”. In: 2011 6th International Conference on Pervasive Computing and Applications, 2011, 363–366, DOI: 10.1109/ICPCA.2011.6106531.
  • [18] K. Chodorow, “Introduction to MongoDB”, https://archive.fosdem.org/2010/schedule/events/nosql_mongodb_intro.html. Accessed on: 2021-02-05.
  • [19] “Apache Cassandra”, https://cassandra.apache.org/. Accessed on: 2021-02-05.
  • [20] K. Shvachko, H. Kuang, S. Radia and R. Chansler, “The Hadoop Distributed File System”. In: Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), 2010, 1–10, DOI: 10.1109/MSST.2010.5496972.
  • [21] J.-M. Spaggiari and K. O’Dell, Architecting HBase Applications: a Guidebook for Successful Development and Design, O’Reilly Media, 2016.
  • [22] B. Pang and L. Lee, “Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales”. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, 2005, 115–124, DOI: 10.3115/1219840.1219855.
Uwagi
Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2021).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-9ae46ec4-0ffb-4c73-82fb-e522d258c778
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.