Comparative features reduction investigation for Android malware detection on Boolean data

Irawan, Ary; Bilski, Piotr; Gwardys, Grzegorz

doi:10.24425/ijet.2025.155487

Artykuł - szczegóły

Tytuł artykułu

Comparative features reduction investigation for Android malware detection on Boolean data

Autorzy

Irawan Ary , Bilski Piotr , Gwardys Grzegorz

Treść / Zawartość

Pełne teksty:

IJET_2025_71_4_IRAWAN_Comparative features.pdf

Pobierz

Identyfikatory

DOI

10.24425/ijet.2025.155487

Warianty tytułu

Języki publikacji

Abstrakty

The aim of this research is to enhance the effectiveness of Android malware detection systems by implementing dimensionality reduction techniques on Boolean data. Algorithms such as Linear Discriminant Analysis (LDA), Principal Component Analysis (PCA), and Multi-Correspondence Analysis (MCA) serve as operations preceding the classification stage. The analysis is carried out using multiple classifiers such as Random Forest Classifier, Logistic Regression, and Support Vector Machines to measure how effective they can detect cyber threats. Results show that the Decision Tree Classifier, implemented without dimensionality reduction, achieved the optimal results with 100% accuracy. Efficient feature selection and rapid computation in the context of malware detection are necessary for real-time mobile cyber environment applications.

Słowa kluczowe

android malware machine learning algorithms dimensionality reduction classification

Wydawca

Polish Academy of Sciences, Committee of Electronics and Telecommunication

Czasopismo

International Journal of Electronics and Telecommunications

Rocznik

2025

Tom

Vol. 71, No. 4

Strony

Opis fizyczny

Bibliogr. 45 poz., tab., rys.

Twórcy

autor

Irawan Ary

01186007@pw.edu.pl

Warsaw University of Technology, Poland

autor

Bilski Piotr

piotr.bilski@pw.edu.pl

Warsaw University of Technology, Poland

https://orcid.org/0000-0002-5463-9411

autor

Gwardys Grzegorz

grzegorz.gwardys@pw.edu.pl

Warsaw University of Technology, Poland

Bibliografia

[1] P. Benedusi, “Improving reverse engineering models with test-case related knowledge,” Inf. Softw. Technol., vol. 38, no. 11, pp. 711-718, Nov. 1996, https://doi.org/10.1016/0950-5849(96)01119-6
[2] L. Li et al., “Static analysis of android apps: A systematic literature review,” Inf. Softw. Technol., vol. 88, pp. 67-95, Aug. 2017, https://doi.org/10.1016/j.infsof.2017.04.001
[3] H. Binder, K. Krohn, and S. Preibisch, “‘Hook’-calibration of GeneChip-microarrays: Chip characteristics and expression measures,” Algorithms Mol. Biol., vol. 3, no. 1, p. 11, Dec. 2008. https://doi.org/10.1186/1748-7188-3-11
[4] N. Mohapatra, B. Satapathy, B. Mohapatra, and B. K. Mohanta, “Malware Detection using Artificial Intelligence,” in 2022 13th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kharagpur, India: IEEE, Oct. 2022, pp. 1-6. https://doi.org/10.1109/ICCCNT54827.2022.9984218
[5] J. Sahs and L. Khan, “A Machine Learning Approach to Android Malware Detection,” in 2012 European Intelligence and Security Informatics Conference, Odense, Denmark: IEEE, Aug. 2012, pp. 141-147. https://doi.org/10.1109/EISIC.2012.34
[6] A. Fatima, R. Maurya, M. K. Dutta, R. Burget, and J. Masek, “Android Malware Detection Using Genetic Algorithm based Optimized Feature Selection and Machine Learning,” in 2019 42nd International Conference on Telecommunications and Signal Processing (TSP), Budapest, Hungary: IEEE, Jul. 2019, pp. 220-223. https://doi.org/10.1109/TSP.2019.8769039
[7] M. Z. Mas’ud, S. Sahib, M. F. Abdollah, S. R. Selamat, and R. Yusof, “Analysis of Features Selection and Machine Learning Classifier in Android Malware Detection,” in 2014 International Conference on Information Science & Applications (ICISA), Seoul, South Korea: IEEE, May 2014, pp. 1-5. https://doi.org/10.1109/ICISA.2014.6847364
[8] S. K. Smmarwar, G. P. Gupta, and S. Kumar, “A Hybrid Feature Selection Approach-Based Android Malware Detection Framework Using Machine Learning Techniques,” in Cyber Security, Privacy and Networking, vol. 370, D. P. Agrawal, N. Nedjah, B. B. Gupta, and G. Martinez Perez, Eds., in Lecture Notes in Networks and Systems, vol. 370. , Singapore: Springer Nature Singapore, 2022, pp. 347-356. https://doi.org/10.1007/978-981-16-8664-1_30
[9] K. Deepa, G. Radhamani, and P. Vinod, “Investigation of Feature Selection Methods for Android Malware Analysis,” Procedia Comput. Sci., vol. 46, pp. 841-848, 2015. https://doi.org/10.1016/j.procs.2015.02.153
[10] A. Bhattacharya and R. T. Goswami, “Community Based Feature Selection Method for Detection of Android Malware:,” J. Glob. Inf. Manag., vol. 26, no. 3, pp. 54-77, Jul. 2018. https://doi.org/10.4018/JGIM.2018070105
[11] D. Ö. Şahin, O. E. Kural, S. Akleylek, and E. Kılıç, “Permission-based Android malware analysis by using dimension reduction with PCA and LDA,” J. Inf. Secur. Appl., vol. 63, p. 102995, Dec. 2021. https://doi.org/10.1016/j.jisa.2021.102995
[12] T. R. Payne and P. Edwards, “Dimensionality Reduction through Correspondence Analysis,” Oct. 14, 1999. Accessed: Dec. 06, 2024. [Online]. Available: https://eprints.soton.ac.uk/263091/1/camap.pdf
[13] W. Zhao, “Research on the deep learning of the small sample data based on transfer learning,” presented at the GREEN ENERGY AND SUSTAINABLE DEVELOPMENT I: Proceedings of the International Conference on Green Energy and Sustainable Development (GESD 2017), Chongqing City, China, 2017, p. 020018. https://doi.org/10.1063/1.4992835
[14] L. Brigato and L. Iocchi, “A Close Look at Deep Learning with Small Data,” Oct. 25, 2020, arXiv: arXiv:2003.12843. https://doi.org/10.48550/arXiv.2003.12843
[15] S. Feng, H. Zhou, and H. Dong, “Using deep neural network with small dataset to predict material defects,” Mater. Des., vol. 162, pp. 300-310, Jan. 2019. https://doi.org/10.1016/j.matdes.2018.11.060
[16] J. Jiang, R. Wang, M. Wang, K. Gao, D. D. Nguyen, and G.-W. Wei, “Boosting Tree-Assisted Multitask Deep Learning for Small Scientific Datasets,” J. Chem. Inf. Model., vol. 60, no. 3, pp. 1235-1244, Mar. 2020. https://doi.org/10.1021/acs.jcim.9b01184
[17] B. Labbé, R. Hérault, and C. Chatelain, “Learning Deep Neural Networks for High Dimensional Output Problems,” in 2009 International Conference on Machine Learning and Applications, Miami, FL, USA: IEEE, Dec. 2009, pp. 63-68. https://doi.org/10.1109/ICMLA.2009.48
[18] A. Naway and Y. LI, “Using Deep Neural Network for Android Malware Detection,” 2019. https://doi.org/10.48550/ARXIV.1904.00736
[19] A. Martín, “ADROIT.” Mendeley, Nov. 15, 2017. https://doi.org/10.17632/YR92XBRVGX.2
[20] D. P. Farrington and R. Loeber, “Some benefits of dichotomization in psychiatric and criminological research,” Crim. Behav. Ment. Health, vol. 10, no. 2, pp. 100-122, Jun. 2000. https://doi.org/10.1002/cbm.349
[21] C. S. Calhoun, J. Reinhart, G. A. Alarcon, and A. Capiola, “Establishing Trust in Binary Analysis in Software Development and Applications,” in 2020 IEEE International Conference on Human-Machine Systems (ICHMS), Rome, Italy: IEEE, Sep. 2020, pp. 1-4. https://doi.org/10.1109/ICHMS49158.2020.9209473
[22] M. Alazab, S. Venkatraman, P. Watters, and M. Alazab, “Zero-day Malware Detection based on Supervised Learning Algorithms of API call Signatures,” in AusDM '11: Proceedings of the Ninth Australasian Data Mining Conference, vol 121, pp. 171-182, December 2011. https://dl.acm.org/doi/10.5555/2483628.2483648
[23] S. Choi, H. Park, H. Lim, and T. Han, “A static API birthmark for Windows binary executables,” J. Syst. Softw., vol. 82, no. 5, pp. 862-873, May 2009. https://doi.org/10.1016/j.jss.2008.11.848
[24] B. A. Draper, K. Baek, M. S. Bartlett, and J. R. Beveridge, “Recognizing faces with PCA and ICA,” Comput. Vis. Image Underst., vol. 91, no. 1-2, pp. 115-137, Jul. 2003. https://doi.org/10.1016/S1077-3142(03)00077-8
[25] P. Xanthopoulos, P. M. Pardalos, and T. B. Trafalis, “Linear Discriminant Analysis,” in Robust Data Mining, in SpringerBriefs in Optimization. , New York, NY: Springer New York, 2013, pp. 27-33. https://doi.org/10.1007/978-1-4419-9878-1_4
[26] D. Ayele, T. Zewotir, and H. Mwambi, “Multiple correspondence analysis as a tool for analysis of large health surveys in African settings,” Afr. Health Sci., vol. 14, no. 4, p. 1036, Jan. 2015, https://doi.org/10.4314/ahs.v14i4.35
[27] J. Lever, M. Krzywinski, and N. Altman, “Principal component analysis,” Nat. Methods, vol. 14, no. 7, pp. 641-642, Jul. 2017, https://doi.org/10.1038/nmeth.4346
[28] G. Feng, D. Hu, M. Li, and Z. Zhou, “A Novel LDA Approach for High-Dimensional Data,” in Advances in Natural Computation, vol. 3610, L. Wang, K. Chen, and Y. S. Ong, Eds., in Lecture Notes in Computer Science, vol. 3610. , Berlin, Heidelberg: Springer Berlin Heidelberg, 2005, pp. 209-212. https://doi.org/10.1007/11539087_23
[29] B. Broeksema, A. C. Telea, and T. Baudel, “Visual Analysis of Multi‐Dimensional Categorical Data Sets,” Comput. Graph. Forum, vol. 32, no. 8, pp. 158-169, Dec. 2013. https://doi.org/10.1111/cgf.12194
[30] G. S and S. Brindha, “Hyperparameters Optimization using Gridsearch Cross Validation Method for machine learning models in Predicting Diabetes Mellitus Risk,” in 2022 International Conference on Communication, Computing and Internet of Things (IC3IoT), Chennai, India: IEEE, Mar. 2022, pp. 1-4. https://doi.org/10.1109/IC3IOT53935.2022.9768005
[31] S. Wold, K. Esbensen, and P. Geladi, “Principal component analysis,” Chemom. Intell. Lab. Syst., vol. 2, no. 1-3, pp. 37-52, Aug. 1987, doi: https://doi.org/10.1016/0169-7439(87)80084-9
[32] S. T. Mueller, “Psychology and Human Factors in R II,” Advanced Statistical Analysis & Design II. Accessed: Jan. 17, 2024. [Online]. Available: https://pages.mtu.edu/~shanem/psy5220/index.html
[33] C. Vidaurre, M. Kawanabe, P. Von Bünau, B. Blankertz, and K. R. Müller, “Toward Unsupervised Adaptation of LDA for Brain-Computer Interfaces,” IEEE Trans. Biomed. Eng., vol. 58, no. 3, pp. 587-597, Mar. 2011. https://doi.org/10.1109/TBME.2010.2093133
[34] F. Tang and H. Tao, “Fast linear discriminant analysis using binary bases,” Pattern Recognit. Lett., vol. 28, no. 16, pp. 2209-2218, Dec. 2007. https://doi.org/10.1016/j.patrec.2007.07.007
[35] M. Greenacre, “From Correspondence Analysis to Multiple and Joint Correspondence Analysis,” SSRN Electron. J., 2005. https://doi.org/10.2139/ssrn.847664
[36] A. Mahindru, “Android permissions dataset, Android Malware and benign Application Data set (consist of permissions and API calls).” Mendeley, Mar. 04, 2020. https://doi.org/10.17632/B4MXG7YDB7.3
[37] M. Galar, A. Fernández, E. Barrenechea, H. Bustince, and F. Herrera, “An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes,” Pattern Recognit., vol. 44, no. 8, pp. 1761-1776, Aug. 2011. https://doi.org/10.1016/j.patcog.2011.01.017
[38] J. Jenkins and H. Cai, “Dissecting Android Inter-component Communications via Interactive Visual Explorations,” in 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), Shanghai: IEEE, Sep. 2017, pp. 519-523. https://doi.org/10.1109/ICSME.2017.74
[39] V. Sihag, A. Swami, M. Vardhan, and P. Singh, “Signature Based Malicious Behavior Detection in Android,” in Computing Science, Communication and Security, vol. 1235, N. Chaubey, S. Parikh, and K. Amin, Eds., in Communications in Computer and Information Science, vol. 1235. , Singapore: Springer Singapore, 2020, pp. 251-262. https://doi.org/10.1007/978-981-15-6648-6_20
[40] F. Shen, “Android Security via Static Program Analysis,” in Proceedings of the 2017 Workshop on MobiSys 2017 Ph.D. Forum, Niagara Falls New York USA: ACM, Jun. 2017, pp. 19-20. https://doi.org/10.1145/3086467.3086469
[41] B. Sanz et al., “MAMA: MANIFEST ANALYSIS FOR MALWARE DETECTION IN ANDROID,” Cybern. Syst., vol. 44, no. 6-7, pp. 469-488, Oct. 2013. https://doi.org/10.1080/01969722.2013.803889
[42] H. A. Martens and P. Dardenne, “Validation and verification of regression in small data sets,” Chemom. Intell. Lab. Syst., vol. 44, no. 1-2, pp. 99-121, Dec. 1998. https://doi.org/10.1016/S0169-7439(98)00167-1
[43] G. Yan, N. Brown, and D. Kong, “Exploring Discriminatory Features for Automated Malware Classification,” in Detection of Intrusions and Malware, and Vulnerability Assessment, vol. 7967, K. Rieck, P. Stewin, and J.-P. Seifert, Eds., in Lecture Notes in Computer Science, vol. 7967, Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, pp. 41-61. https://doi.org/10.1007/978-3-642-39235-1_3
[44] J.-S. Hong and G.-S. Hwang, “Interpretability Comparison of Popular Decision Tree Algorithms,” J. Soc. Korea Ind. Syst. Eng., vol. 44, no. 2, pp. 15-23, Jun. 2021. https://doi.org/10.11627/jkise.2021.44.2.015
[45] M. Moshkov, “On the depth of decision trees over infinite 1-homogeneous binary information systems,” Array, vol. 10, p. 100060, Jul. 2021, https://doi.org/10.1016/j.array.2021.100060

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-7bef5805-0c69-4485-8b8c-3b6e7b752857