Email Phishing Detection with BLSTM and Word Embeddings

Wolert, Rafał; Rawski, Mariusz

doi:10.24425/ijet.2023.146496

Artykuł - szczegóły

Tytuł artykułu

Email Phishing Detection with BLSTM and Word Embeddings

Autorzy

Wolert Rafał , Rawski Mariusz

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.24425/ijet.2023.146496

Warianty tytułu

Języki publikacji

Abstrakty

Phishing has been one of the most successful attacks in recent years. Criminals are motivated by increasing financial gain and constantly improving their email phishing methods. A key goal, therefore, is to develop effective detection methods to cope with huge volumes of email data. In this paper, a solution using BLSTM neural network and FastText word embeddings has been proposed. The solution uses preprocessing techniques like stop-word removal, tokenization, and padding. Two datasets were used in three experiments: balanced and imbalanced, whereas in the imbalanced dataset, the effect of maximum token size was investigated. Evaluation of the model indicated the best metrics: 99.12% accuracy, 98.43% precision, 99.49% recall, and 98.96% f1-score on the imbalanced dataset. It was compared to an existing solution that uses the DL model and word embeddings. Finally, the model and solution architecture were implemented as a browser plug-in.

Słowa kluczowe

phishing BLSTM word embeddings

Wydawca

Polish Academy of Sciences, Committee of Electronics and Telecommunication

Czasopismo

International Journal of Electronics and Telecommunications

Rocznik

2023

Tom

Vol. 69, No. 3

Strony

485--491

Opis fizyczny

Bibliogr. 26 poz., rys., wykr.

Twórcy

autor

Wolert Rafał

rafal.wolert.stud@pw.edu.pl

Institute of Telecommunications, Faculty of Electronics and Information Technology, Warsaw University of Technology, Poland

autor

Rawski Mariusz

mariusz.rawski@pw.edu.pl

Institute of Telecommunications, Faculty of Electronics and Information Technology, Warsaw University of Technology, Poland

Bibliografia

[1] A. Almomani, B. B. Gupta, S. Atawneh, A. Meulenberg, and E. Almomani, “A survey of phishing email filtering techniques,” IEEE Communications Surveys Tutorials, vol. 15, no. 4, pp. 2070-2090, 2013. [Online]. Available: https://doi.org/10.1109/SURV.2013.030713.00020
[2] F. Labs, “2020 phishing and fraud report,” 2020, [Accessed: 14 June 2023]. [Online]. Available: https://www.f5.com/labs/articles/threat-intelligence/2020-phishing-and-fraud-report
[3] A. Das, S. Baki, A. El Aassal, R. Verma, and A. Dunbar, “Sok: A comprehensive reexamination of phishing research from the security perspective,” IEEE Communications Surveys Tutorials, vol. 22, no. 1, pp. 671-708, 2020.
[4] S. Salloum, T. Gaber, S. Vadera, and K. Shaalan, “A systematic literature review on phishing email detection using natural language processing techniques,” IEEE Access, vol. 10, pp. 65 703-65 727, 2022. [Online]. Available: https://doi.org/10.1109/ACCESS.2022.3183083
[5] A. Safi and S. Singh, “A systematic literature review on phishing website detection techniques,” Journal of King Saud University - Computer and Information Sciences, vol. 35, no. 2, pp. 590-611, 2023. [Online]. Available: https://doi.org/10.1016/j.jksuci.2023.01.004
[6] M. Dewis and T. Viana, “Phish responder: A hybrid machine learning approach to detect phishing and spam emails,” Applied System Innovation, vol. 5, no. 4, 2022. [Online]. Available: https: //doi.org/10.3390/asi5040073
[7] P. Boyle and L. Shepherd, “Mailtrout: a machine learning browser extension for detecting phishing emails,” in 34th British Human Computer Interaction Conference 2021 proceedings, ser. Electronic Workshops in Computing, J. Nocera, H. Petrie, G. Sim, T. Clemmensen, and F. Spyridonis, Eds. BCS Learning Development Ltd., Jul. 2021, pp. 104-115, 34rd British Human Computer Interaction Conference : Post-Pandemic HCI - Living digitally ; Conference date: 19-07-2021 Through 21-07-2021. [Online]. Available: https://doi.org/10.14236/ewic/HCI2021.10
[8] S. M. and A. R. Pais, “Classification of phishing email using word embedding and machine learning techniques,” Journal of Cyber Security and Mobility, vol. 11, no. 03, p. 279-320, May 2022. [Online]. Available: https://doi.org/10.13052/jcsm2245-1439.1131
[9] Enron email dataset. [Accessed: 14 June 2023]. [Online]. Available: https://www.cs.cmu.edu/∼enron/
[10] Jose nazario phishing email corpus. [Accessed: 14 June 2023]. [Online]. Available: https://monkey.org/∼jose/phishing/
[11] B. Klimt and Y. Yang, “Introducing the enron corpus,” in First Conference on Email and Anti-Spam (CEAS), Mountain View, CA, 2004, [Accessed: 14 June 2023]. [Online]. Available: https://www.ceas.cc/papers-2004/168.pdf
[12] spacy python model. [Accessed: 14 June 2023]. [Online]. Available: https://github.com/explosion/spacy-models/releases/tag/en_core web sm-3.5.0
[13] spacy token class attributes. [Accessed: 14 June 2023]. [Online]. Available: https://spacy.io/api/token#attributes
[14] S. V. K. S. Said Salloum, Tarek Gaber, “A systematic literature review on phishing email detection using natural language processing techniques,” IEEE Access, vol. 10, pp. 2169-3536, June 2022. [Online]. Available: https://doi.org/10.1109/ACCESS.2022.3183083
[15] A. J. T. M. Piotr Bojanowski, Edouard Grave, “Enriching word vectors with subword information,” Transactions of the Association for Computational Linguistics, vol. 5, pp. 135-147, June 2017. [Online]. Available: https://doi.org/10.1162/tacl_a_00051
[16] S. J. H. Marcus Butavicius, Ronnie Taib, “Why people keep falling for phishing scams: The effects of time pressure and deception cues on the detection of phishing emails,” Computers and Security, vol. 123, December 2022. [Online]. Available: https: //doi.org/10.1016/j.cose.2022.102937
[17] Common crawl. [Accessed: 14 June 2023]. [Online]. Available: https://commoncrawl.org/
[18] G. P. J. A. M. T. Grave Edouard, Bojanowski Piotr, “Learning word vectors for 157 languages,” in Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018), 2018.
[19] Tensorflow pad sequences method. [Accessed: 14 June 2023]. [Online]. Available: https://www.tensorflow.org/api_docs/python/tf/keras/utils/pad_sequences
[20] Tensorflow data performance. [Accessed: 14 June 2023]. [Online]. Available: https://www.tensorflow.org/guide/data performance
[21] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Man´e, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Vi´egas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, “TensorFlow: Large-scale machine learning on heterogeneous systems,” 2015, software available from tensorflow.org. [Online]. Available: https://www.tensorflow.org/
[22] F. Chollet et al., “Keras,” https://keras.io, 2015, [Accessed: 14 June 2023].
[23] P. Wang, Y. Qian, F. K. Soong, L. He, and H. Zhao, “Part-of-speech tagging with bidirectional long short-term memory recurrent neural network,” 2015. [Online]. Available: https://doi.org/10.48550/arXiv. 1510.06168
[24] Gmail api. [Accessed: 14 June 2023]. [Online]. Available: https://developers.google.com/gmail/api/guides
[25] Fastapi. [Accessed: 14 June 2023]. [Online]. Available: https://fastapi.tiangolo.com/
[26] P. P. M. A. K. S. K. Vinayakumar Ravi, Barathi Ganesh Hb, “Deepanti-phishnet: Applying deep neural networks for phishing email detection cen-aisecurity@iwspa-2018.” Tempe AZ USA, March 2018, 1st AntiPhishing Shared Pilot at 4th ACM International Workshop on Security and Privacy Analytics (IWSPA 2018) [Accessed: 14 June 2023]. [Online]. Available: https://ceur-ws.org/Vol-2124/paper 9.pdf

Uwagi

Opracowanie rekordu ze środków MEiN, umowa nr SONP/SP/546092/2022 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2022-2023).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-63f9d9b7-9c82-4866-9ed5-b47be7f38896