PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

A contemporary multi-objective feature selection model for depression detection using a hybrid pBGSK optimization algorithm

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Depression is one of the primary causes of global mental illnesses and an underlying reason for suicide. The user generated text content available in social media forums offers an opportunity to build automatic and reliable depression detection models. The core objective of this work is to select an optimal set of features that may help in classifying depressive contents posted on social media. To this end, a novel multi-objective feature selection technique (EFS-pBGSK) and machine learning algorithms are employed to train the proposed model. The novel feature selection technique incorporates a binary gaining-sharing knowledge-based optimization algorithm with population reduction (pBGSK) to obtain the optimized features from the original feature space. The extensive feature selector (EFS) is used to filter out the excessive features based on their ranking. Two text depression datasets collected from Twitter and Reddit forums are used for the evaluation of the proposed feature selection model. The experimentation is carried out using naive Bayes (NB) and support vector machine (SVM) classifiers for five different feature subset sizes (10, 50, 100, 300 and 500). The experimental outcome indicates that the proposed model can achieve superior performance scores. The top results are obtained using the SVM classifier for the SDD dataset with 0.962 accuracy, 0.929 F1 score, 0.0809 log-loss and 0.0717 mean absolute error (MAE). As a result, the optimal combination of features selected by the proposed hybrid model significantly improves the performance of the depression detection system.
Rocznik
Strony
117--131
Opis fizyczny
Bibliogr. 44 poz., rys., tab., wykr.
Twórcy
  • Department of Computer Science and Engineering, Mepco Schlenk Engineering College (Autonomous), Sivakasi, 626005, Tamil Nadu, India
  • Department of Computer Science and Engineering, Mepco Schlenk Engineering College (Autonomous), Sivakasi, 626005, Tamil Nadu, India
Bibliografia
  • [1] Agrawal, P., Ganesh, T. and Mohamed, A. (2021). A novel binary gaining-sharing knowledge-based optimization algorithm for feature selection, Neural Computing and Applications 33: 5989-6008.
  • [2] Asim, M., Wasim, M., Sajid Ali, M. and Rehman, A. (2017). Comparison of feature selection methods in text classification on highly skewed datasets, 2017 1st International Conference on Latest trends in Electrical Engineering and Computing Technologies (INTELLECT), Karachi, Pakistan, pp. 1-8.
  • [3] Babu, N. and Kanaga, E. (2022). Sentiment analysis in social media data for depression detection using artificial intelligence: A review, SN Computer Science 3: 74.
  • [4] Burdisso, S., Errecalde, M. and Montes, M. (2019). A text classification framework for simple and effective early depression detection over social media streams, Expert Systems with Applications 133: 182-197.
  • [5] Chen, J., Huang, H., Tian, S. and Qu, Y. (2009). Feature selection for text classification with naïve Bayes, Expert Systems with Applications 36(3): 5432-5435.
  • [6] Chiong, R., Satia Budhi, G., Dhakal, S. and Chiong, F. (2021). A textual-based featuring approach for depression detection using machine learning classifiers and social media texts, Computers in Biology and Medicine 135: 104499.
  • [7] Deng, X., Li, Y., Weng, J. and Zhang, J. (2019). Feature selection for text classification: A review, Multimedia Tools and Applications 78: 3797-3816.
  • [8] Derek, A. and David, M. (2020). Support vector machine, in A. Mechelli and S. Vieira (Eds), Machine Learning, Academic Press, Chicago, pp. 101-121.
  • [9] Ding, Y., Chen, X., Fu, Q. and Zhong, S. (2020). A depression recognition method for college students using deep integrated support vector algorithm, IEEE Access 8: 75616-75629.
  • [10] Durgalakshmi, B. and Vijayakumar, V. (2020). Feature selection and classification using support vector machine and decision tree, Computational Intelligence 36: 1480-1492.
  • [11] Emary, E., Zawbaa, H. and Aboul Ella, H. (2016a). Binary ant lion approaches for feature selection, Neurocomputing 213: 54-65.
  • [12] Emary, E., Zawbaa, H.M. and Hassanien, A.E. (2016b). Binary grey wolf optimization approaches for feature selection, Neurocomputing 172: 371-381.
  • [13] Friedrich, M. (2017). Depression is the leading cause of disability around the world, Journal of the American Medical Association (JAMA) 15: 1517.
  • [14] Gao, Z., Xu, Y., Meng, F., Qi, F. and Lin, Z. (2014). Improved information gain-based feature selection for text categorization, 4th International Conference on Wireless Communication, VITAE, Aalborg, Denmark, pp. 1-5.
  • [15] Hayyolalam, V. and Kazem, A. (2020). Black widow optimization algorithm: A novel meta-heuristic approach for solving engineering optimization problems, Engineering Applications of Artificial Intelligence 87: 103249.
  • [16] Hussain, J., Satti, F., Afzal, M., Khan, W., Bilal, H., Ansaar, Z., Ahmad, H., Hur, T., Bang, J., Kim, J., Park, G., Seung, H. and Lee, S. (2019). Exploring the dominant features of social media for depression detection, Journal of Information Science 46(6): 739-759.
  • [17] Husseini Orabi, A., Buddhitha, P., Husseini Orabi, M.M. and Inkpen, D. (2018). Deep learning for depression detection of twitter users, Proceedings of the 5th Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic, New Orleans, USA, pp. 88-97.
  • [18] Hussien, A.G., Oliva, D., Houssein, E.H., Juan, A.A. and Yu, X. (2020). Binary whale optimization algorithm for dimensionality reduction, Mathematics 8(10): 1821.
  • [19] Islam, M., Kabir, M., Ahmed, A., Kamal, A., Wang, H. and Ulhaq, A. (2018). Depression detection from social network data using machine learning techniques, Health Information Science and Systems 6(1): 8.
  • [20] Kowal, M., Skobel, M. and Nowicki, N. (2018). The feature selection problem in computer-assisted cytology, International Journal of Applied Mathematics and Computer Science 28(4): 759-770, DOI: 10.2478/amcs-2018-0058.
  • [21] Li, B., Yan, Q., Xu, Z. and Wang, G. (2015). Weighted document frequency for feature selection in text classification, International Conference on Asian Language Processing (IALP), Suzhou, China, pp. 132-135.
  • [22] Mohamed, A., Hadi, A. and Mohamed, A. (2020). Gaining-sharing knowledge based algorithm for solving optimization problems: A novel nature-inspired algorithm, International Journal of Machine Learning and Cybernetics 11: 1501-1529.
  • [23] Moorthy, U. and Gandhi, U. (2019). Forest optimization algorithm-based feature selection using classifier ensemble, Computational Intelligence 36(4): 1445-1462.
  • [24] Moradi, P. and Gholampour, M. (2016). A hybrid particle swarm optimization for feature subset selection by integrating a novel local search strategy, Applied Soft Computing 43: 117-130.
  • [25] Parlak, B. and Uysal, A. (2021). A novel filter feature selection method for text classification: Extensive feature selector, Journal of Information Science 49(1): 59-78.
  • [26] Peng, H., Long, F. and Ding, C. (2005). Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy, IEEE Transactions on Pattern Analysis and Machine Intelligence 27(8): 1226-1238.
  • [27] Połap, D. and Woźniak, M. (2021). Red fox optimization algorithm, Expert Systems with Applications 166: 114107.
  • [28] Prachi, A., Abutarboush, H., Ganesh, T. and Mohamed, A. (2021). Metaheuristic algorithms on feature selection: A survey of one decade of research (2009-2019), IEEE Access 9: 26766-26791.
  • [29] Rajalakshmi, R. and Aravindan, C. (2018). A naive Bayes approach for URL classification with supervised feature selection and rejection framework: NB for URL classification with FS and RF, Computational Intelligence 34(1): 363-396.
  • [30] Rao, R. (2016). Jaya: A simple and new optimization algorithm for solving constrained and unconstrained optimization problems, International Journal of Industrial Engineering Computations 7: 19-34.
  • [31] Rehman, A., Javed, K. and Babri, H. (2017). Feature selection based on a normalized difference measure for text classification, Information Processing and Management 53(2): 473-489.
  • [32] Sanasam, R., Murthy, H. and Gonsalves, T. (2010). Feature selection for text classification based on Gini coefficient of inequality, Proceedings of Machine Learning Research 10: 76-85.
  • [33] Shen, J. and Rudzicz, F. (2017). Detecting anxiety through Reddit, Proceedings of the 4th Workshop on Computational Linguistics and Clinical Psychology-From Linguistic Signal to Clinical Reality, Vancouver, Canada, pp. 58-65.
  • [34] Suthaharan, S. (2016). Machine Learning Models and Algorithms for Big Data Classification, Springer, Boston, chapter “Support vector machine”, pp. 207-235.
  • [35] Tadesse, M., Lin, H., Xu, B. and Yang, L. (2019). Detection of depression-related posts in Reddit social media forum, IEEE Access 7: 44883-44893.
  • [36] Thirumoorthy, K. and Muneeswaran, K. (2020). Optimal feature subset selection using hybrid binary Jaya optimization algorithm for text classification, Sādhanā 45(201).
  • [37] Thorstad, R. and Wolff, P. (2019). Predicting future mental illness from social media: A big-data approach, Behavior Research Methods 51: 1586-1600.
  • [38] Trotzek, M., Koitka, S. and Friedrich, C. (2018). Utilizing neural networks and linguistic metadata for early detection of depression indications in text sequences, IEEE Transactions on Knowledge and Data Engineering 32(3): 588-601.
  • [39] Unler, A., Murat, A. and Chinnam, R. (2011). MR2PSO: A maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification, Information Sciences 181(20): 4625-4641.
  • [40] Uysal, A. (2018). On two-stage feature selection methods for text classification, IEEE Access 6: 43233-43251.
  • [41] Wang, W., Chen, X., Musial, J. and Blazewicz, J. (2020). Two meta-heuristic algorithms for scheduling on unrelated machines with the late work criterion, International Journal of Applied Mathematics and Computer Science 30(3): 573-584, DOI: 10.34768/amcs-2020-0042.
  • [42] William, D. and Suhartono, D. (2021). Text-based depression detection on social media posts: A systematic literature review, Procedia Computer Science 179: 582-589.
  • [43] Xue, B., Zhang, M., Browne, W. and Yao, X. (2016). A survey on evolutionary computation approaches to feature selection, IEEE Transactions on Evolutionary Computation 20(4): 606-626.
  • [44] Zhu, X., Wang, Y., Li, Y., Tan, Y., Wang, G. and Song, Q. (2019). A new unsupervised feature selection algorithm using similarity-based feature clustering, Computational Intelligence 35(1): 2-22.
Uwagi
PL
Opracowanie rekordu ze środków MEiN, umowa nr SONP/SP/546092/2022 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2022-2023)
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-a46d35cb-1f65-433c-b6f9-39f603659549
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.