Explainable spark-based pso clustering for intrusion detection

Ben Ncir, Chiheb-Eddine; Ben HajKacem, Mohamed Aymen; Alattas, Mohammed

doi:10.7494/csci.2024.25.2.5891

Artykuł - szczegóły

Tytuł artykułu

Explainable spark-based pso clustering for intrusion detection

Autorzy

Ben Ncir Chiheb-Eddine , Ben HajKacem Mohamed Aymen , Alattas Mohammed

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.7494/csci.2024.25.2.5891

Warianty tytułu

Języki publikacji

Abstrakty

Given the exponential growth of available data in large networks, the existence of rapid, transparent, and explainable intrusion detection systems has become of highly necessity to effectively discover attacks in such huge networks. To deal with this challenge, we propose a novel explainable intrusion detection system based on Spark, Particle Swarm Optimization (PSO) clustering, and eXplainable Artificial Intelligence (XAI) techniques. Spark is used as a parallel processing model for the effective processing of large-scale data, PSO is integrated to improve the quality of the intrusion detection system by avoiding sensitive initialization and premature convergence of the clustering algorithm and finally, XAI techniques are used to enhance interpretability and explainability of intrusion recommendations by providing both micro and macro explanations of detected intrusions. Experiments are conducted on large collections of real datasets to show the effectiveness of the proposed intrusion detection system in terms of explainability, scalability, and accuracy. The proposed system has shown high transparency in assisting security experts and decision-makers to understand and interpret attack behavior.

Słowa kluczowe

Intrusion Detection System IDS Artificial Intelligence AI Explainable AI XAI particle swarm optimization PSO Spark framework

Wydawca

Wydawnictwa AGH

Czasopismo

Computer Science

Rocznik

2024

Tom

T. 25 (2)

Strony

211--237

Opis fizyczny

Bibliogr. 46 poz., rys., tab., wykr.

Twórcy

autor

Ben Ncir Chiheb-Eddine

cbenncir@uj.edu.sa

University of Jeddah, College of Business, Saudi Arabia

autor

Ben HajKacem Mohamed Aymen

edaymenhajkacem@gmail.com

University of Tunis, LARODEC Laboratory, Tunisia

autor

Alattas Mohammed

mialatas@uj.edu.sa

University of Jeddah, College of Business, Saudi Arabia

Bibliografia

[1] Abdi H., Williams L.J.: Principal component analysis, WIREs Computational Statistics, vol. 2(4), pp. 433–459, 2010. doi: 10.1002/wics.101.
[2] Ahmad Z., Shahid Khan A., Wai Shiang C., Abdullah J., Ahmad F.: Network intrusion detection system: A systematic study of machine learning and deep learning approaches, Transactions on Emerging Telecommunications Technologies, vol. 32(1), e4150, 2021. doi: 10.1002/ett.4150.
[3] Ali S., Abuhmed T., El-Sappagh S., Muhammad K., Alonso-Moral J.M., Confalonieri R., Guidotti R., et al.: Explainable Artificial Intelligence (XAI): What we know and what is left to attain Trustworthy Artificial Intelligence, Information Fusion, 101805, 2023. doi: 10.1016/j.inffus.2023.101805.
[4] Aljarah I., Ludwig S.A.: Towards a scalable intrusion detection system based on parallel PSO clustering using mapreduce. In: GECCO ’13 Companion: Proceedings of the 15th annual conference companion on Genetic and evolutionary computation, pp. 169–170, 2013. doi: 10.1145/2464576.2464661.
[5] Awan M.J., Rahim M.S.M., Nobanee H., Yasin A., Khalaf O.I., Ishfaq U.: A Big Data Approach to Black Friday Sales, Intelligent Automation & Soft Computing, vol. 27(3), pp. 785–797, 2021. doi: 10.32604/iasc.2021.014216.
[6] Bandyapadhyay S., Fomin F., Golovach P.A., Lochet W., Purohit N., Simonov K.: How to Find a Good Explanation for Clustering? In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36(4), pp. 3904–3912, 2022. doi: 10.1609/aaai.v36i4.20306.
[7] Bradley P.S., Fayyad U.M.: Refining initial points for k-means clustering. In: ICML ’98: Proceedings of the Fifteenth International Conference on Machine Learning, vol. 98, pp. 91–99, Citeseer, 1998.
[8] Carvalho D.V., Pereira E.M., Cardoso J.S.: Machine learning interpretability: A survey on methods and metrics, Electronics, vol. 8(8), 2019. doi: 10.3390/ electronics8080832.
[9] Cura T.: A particle swarm optimization approach to clustering, Expert Systems with Applications, vol. 39(1), pp. 1582–1588, 2012. doi: 10.1016 / j.eswa.2011.07.123.
[10] Dafir Z., Lamari Y., Slaoui S.C.: A survey on parallel clustering algorithms for big data, Artificial Intelligence Review, vol. 54, pp. 2411–2443, 2021.
[11] Dasgupta S., Nave Frost M.M., Rashtchian C.: Explainable k-Means and k-Medians Clustering. In: ICML’20: Proceedings of the 37th International Conference on Machine Learning, pp. 7055–7065, 2020.
[12] Dean J., Ghemawat S.: MapReduce: simplified data processing on large clusters, Communications of the ACM, vol. 51(1), pp. 107–113, 2008. doi: 10.1145/ 1327452.1327492.
[13] Dhanabal L., Shantharajah S.P.: A study on NSL-KDD dataset for intrusion detection system based on classification algorithms, International Journal of Advanced Research in Computer and Communication Engineering, vol. 4(6), pp. 446–452, 2015.
[14] Guan Y., Ghorbani A.A., Belacel N.: Y-means: A clustering method for intrusion detection. In: CCECE 2003 – Canadian Conference on Electrical and Computer Engineering. Toward a Caring and Humane Technology (Cat. No. 03CH37436), vol. 2, pp. 1083–1086, IEEE, 2003.
[15] HajKacem M.A.B., Moslah M., Essoussi N.: Spark Based Intrusion Detection System Using Practical Swarm Optimization Clustering. In: Artificial Intelligence and Blockchain for Future Cybersecurity Applications, pp. 197–216, Springer, 2021. doi: 10.1007/978-3-030-74575-2 11.
[16] Horel E., Giesecke K., Storchan V., Chittar N.: Explainable clustering and application to wealth management compliance. In: Proceedings of the First ACM International Conference on AI in Finance, pp. 1–6, 2020. doi: 10.1145/ 3383455.3422530.
[17] Kennedy J., Eberhart R.: Particle swarm optimization. In: Proceedings of ICNN’95 – International Conference on Neural Networks, Perth, WA, Australia, 1995, vol. 4, pp. 1942–1948, IEEE, 1995. doi: 10.1109/ICNN.1995.488968.
[18] Keshk M., Koroniotis N., Pham N., Moustafa N., Turnbull B., Zomaya A.Y.: An explainable deep learning-enabled intrusion detection framework in IoT networks, Information Sciences, vol. 639, 2023. doi: 10.1016/j.ins.2023.119000.
[19] Li Z., Li Y., Xu L.: Anomaly intrusion detection method based on k-means clustering algorithm with particle swarm optimization. In: 2011 International Conference of Information Technology, Computer Engineering and Management Sciences, vol. 2, pp. 157–161, IEEE, 2011. doi: 10.1109/icm.2011.184.
[20] Liao H.J., Lin C.H.R., Lin Y.C., Tung K.Y.: Intrusion detection system: A comprehensive review, Journal of Network and Computer Applications, vol. 36(1), pp. 16–24, 2013. doi: 10.1016/j.jnca.2012.09.004.
[21] Linardatos P., Papastefanopoulos V., Kotsiantis S.: Explainable AI: A review of machine learning interpretability methods, Entropy, vol. 23(1), 18, 2020. doi: 10.3390/e23010018.
[22] Liu H., Lang B.: Machine learning and deep learning methods for intrusion detection systems: A survey, Applied Sciences, vol. 9(20), 4396, 2019. doi: 10.3390/app9204396.
[23] Liu Y., Chen K., Liao X., Zhang W.: A genetic clustering method for intrusion detection, Pattern Recognition, vol. 37(5), pp. 927–942, 2004. doi: 10.1016/ j.patcog.2003.09.011.
[24] Lundberg S.M., Lee S.I.: A Unified Approach to Interpreting Model Predictions. In: I. Guyon, U.V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (eds.), Proceedings of 31st Conference on Neural Information Processing Systems (NIPS 2017), Advances in Neural Information Processing Systems, vol. 30, Long Beach, CA, USA, 2017. https://proceedings.neurips.cc/paper files/ paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf.
[25] MacQueen J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics, pp. 281–297, 1967.
[26] Madhulatha T.S.: An overview on clustering methods, IOSR Journal of Engineering, vol. 2(4), pp. 719–725, 2012. doi: 10.9790/3021-0204719725.
[27] Merwe van der D.W., Engelbrecht A.P.: Data clustering using particle swarm optimization. In: The 2003 Congress on Evolutionary Computation, 2003. CEC’03., vol. 1, pp. 215–220, IEEE, 2003.
[28] Moore J.D., Swartout W.R.: Explanation in expert systems: A survey, Technical report, USC University of Southern California, Marina del Rey Information Sciences Institute, 1988.
[29] Morichetta A., Casas P., Mellia M.: EXPLAIN-IT: Towards Explainable AI for Unsupervised Network Traffic Analysis. In: Big-DAMA ’19: Proceedings of the 3rd ACM CoNEXT Workshop on Big DAta, Machine Learning and Artificial Intelligence for Data Communication Networks, pp. 22–28, Association for Computing Machinery, New York, NY, USA, 2019. doi: 10.1145/3359992.3366639.
[30] Moslah M., HajKacem M.A.B., Essoussi N.: Spark-Based Design of Clustering Using Particle Swarm Optimization. In: O. Nasraoui, C.E. Ben N’Cir (eds.), Clustering Methods for Big Data Analytics: Techniques, Toolboxes and Applications, pp. 91–113, Springer, Cham, 2019. doi: 10.1007/978-3-319-97864-2 5.
[31] Nanda S.J., Panda G.: A survey on nature inspired metaheuristic algorithms for partitional clustering, Swarm and Evolutionary Computation, vol. 16, pp. 1–18, 2014. doi: 10.1016/j.swevo.2013.11.003.
[32] Neupane S., Ables J., Anderson W., Mittal S., Rahimi S., Banicescu I., Seale M.: Explainable Intrusion Detection Systems (X-IDS): A Survey of Current Methods, Challenges, and Opportunities, IEEE Access, vol. 10, pp. 112392–112415, 2022. doi: 10.1109/access.2022.3216617.
[33] Peng K., Leung V.C., Huang Q.: Clustering Approach Based on Mini Batch Kmeans for Intrusion Detection System Over Big Data, IEEE Access, vol. 6, pp. 11897–11906, 2018. doi: 10.1109/access.2018.2810267.
[34] Ribeiro M.T., Singh S., Guestrin C.: “Why should I trust you?”: Explaining the predictions of any classifier. In: KDD ’16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144, 2016. doi: 10.1145/2939672.2939778.
[35] Sculley D.: Web-scale k-means clustering. In: WWW ’10: Proceedings of the 19th international conference on World wide web, pp. 1177–1178, 2010. doi: 10.1145/ 1772690.1772862.
[36] Shinde P.P., Shah S.: A Review of Machine Learning and Deep Learning Applications. In: 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), Pune, India, pp. 1–6, IEEE, 2018. doi: 10.1109/iccubea.2018.8697857.
[37] Tsai C.F., Hsu Y.F., Lin C.Y., Lin W.Y.: Intrusion detection by machine learning: A review, Expert Systems with Applications, vol. 36(10), pp. 11994–12000, 2009. doi: 10.1016/j.eswa.2009.05.029.
[38] Wang G., Hao J., Ma J., Huang L.: A new approach to intrusion detection using Artificial Neural Networks and fuzzy clustering, Expert Systems with Applications, vol. 37(9), pp. 6225–6232, 2010. doi: 10.1016/j.eswa.2010.02.102.
[39] Wang M., Zheng K., Yang Y., Wang X.: An explainable machine learning framework for intrusion detection systems, IEEE Access, vol. 8, pp. 73127–73141, 2020. doi: 10.1109/access.2020.2988359.
[40] White T.: Hadoop: The definitive guide, O’Reilly Media, Inc., 3rd ed., 2012.
[41] Wu W., Xu S.: Intrusion Detection Based on Dynamic Gemini Population DE-K-mediods Clustering on Hadoop Platform, International Journal of Pattern Recognition and Artificial Intelligence, vol. 35(01), 2150001, 2021. doi: 10.1142/ S0218001421500014.
[42] Xu Y., Qu W., Li Z., Min G., Li K., Liu Z.: Efficient k-Means++ Approximation with MapReduce, IEEE Transactions on Parallel and Distributed Systems, vol. 25(12), pp. 3135–3144, 2014.
[43] Younisse R., Ahmad A., Abu Al-Haija Q.: Explaining Intrusion Detection Based Convolutional Neural Networks Using Shapley Additive Explanations (SHAP), Big Data and Cognitive Computing, vol. 6(4), 126, 2022. doi: 10.3390/ bdcc6040126.
[44] Zaharia M., Xin R.S., Wendell P., Das T., Armbrust M., Dave A., Meng X., et al.: Apache spark: a unified engine for big data processing, Communications of the ACM, vol. 59(11), pp. 56–65, 2016. doi: 10.1145/2934664.
[45] Zhao W., Ma H., He Q.: Parallel K-Means Clustering Based on MapReduce. In: M.G. Jaatun, G. Zhao, C. Rong (eds.), Cloud Computing: First International Conference, CloudCom 2009, Beijing, China, December 1–4, 2009. Proceedings 1, pp. 674–679, Springer, 2009. doi: 10.1007/978-3-642-10665-1 71.
[46] Zhu W., Zeng N., Wang N.: Sensitivity, Specificity, Accuracy, Associated Confidence Interval and ROC Analysis with Practical SAS Implementations, NESUG Proceedings: Health Care and Life Sciences, Baltimore, Maryland, vol. 19, 67, 2010. https://lexjansen.com/nesug/nesug10/hl/hl07.pdf.

Uwagi

Opracowanie rekordu ze środków MNiSW, umowa nr SONP/SP/546092/2022 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2024).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-9aff406a-9f3f-44e0-9e2b-9c9d07d8f9b9