PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Email Phishing Detection with BLSTM and Word Embeddings

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Phishing has been one of the most successful attacks in recent years. Criminals are motivated by increasing financial gain and constantly improving their email phishing methods. A key goal, therefore, is to develop effective detection methods to cope with huge volumes of email data. In this paper, a solution using BLSTM neural network and FastText word embeddings has been proposed. The solution uses preprocessing techniques like stop-word removal, tokenization, and padding. Two datasets were used in three experiments: balanced and imbalanced, whereas in the imbalanced dataset, the effect of maximum token size was investigated. Evaluation of the model indicated the best metrics: 99.12% accuracy, 98.43% precision, 99.49% recall, and 98.96% f1-score on the imbalanced dataset. It was compared to an existing solution that uses the DL model and word embeddings. Finally, the model and solution architecture were implemented as a browser plug-in.
Słowa kluczowe
Twórcy
  • Institute of Telecommunications, Faculty of Electronics and Information Technology, Warsaw University of Technology, Poland
  • Institute of Telecommunications, Faculty of Electronics and Information Technology, Warsaw University of Technology, Poland
Bibliografia
  • [1] A. Almomani, B. B. Gupta, S. Atawneh, A. Meulenberg, and E. Almomani, “A survey of phishing email filtering techniques,” IEEE Communications Surveys Tutorials, vol. 15, no. 4, pp. 2070-2090, 2013. [Online]. Available: https://doi.org/10.1109/SURV.2013.030713.00020
  • [2] F. Labs, “2020 phishing and fraud report,” 2020, [Accessed: 14 June 2023]. [Online]. Available: https://www.f5.com/labs/articles/threat-intelligence/2020-phishing-and-fraud-report
  • [3] A. Das, S. Baki, A. El Aassal, R. Verma, and A. Dunbar, “Sok: A comprehensive reexamination of phishing research from the security perspective,” IEEE Communications Surveys Tutorials, vol. 22, no. 1, pp. 671-708, 2020.
  • [4] S. Salloum, T. Gaber, S. Vadera, and K. Shaalan, “A systematic literature review on phishing email detection using natural language processing techniques,” IEEE Access, vol. 10, pp. 65 703-65 727, 2022. [Online]. Available: https://doi.org/10.1109/ACCESS.2022.3183083
  • [5] A. Safi and S. Singh, “A systematic literature review on phishing website detection techniques,” Journal of King Saud University - Computer and Information Sciences, vol. 35, no. 2, pp. 590-611, 2023. [Online]. Available: https://doi.org/10.1016/j.jksuci.2023.01.004
  • [6] M. Dewis and T. Viana, “Phish responder: A hybrid machine learning approach to detect phishing and spam emails,” Applied System Innovation, vol. 5, no. 4, 2022. [Online]. Available: https: //doi.org/10.3390/asi5040073
  • [7] P. Boyle and L. Shepherd, “Mailtrout: a machine learning browser extension for detecting phishing emails,” in 34th British Human Computer Interaction Conference 2021 proceedings, ser. Electronic Workshops in Computing, J. Nocera, H. Petrie, G. Sim, T. Clemmensen, and F. Spyridonis, Eds. BCS Learning Development Ltd., Jul. 2021, pp. 104-115, 34rd British Human Computer Interaction Conference : Post-Pandemic HCI - Living digitally ; Conference date: 19-07-2021 Through 21-07-2021. [Online]. Available: https://doi.org/10.14236/ewic/HCI2021.10
  • [8] S. M. and A. R. Pais, “Classification of phishing email using word embedding and machine learning techniques,” Journal of Cyber Security and Mobility, vol. 11, no. 03, p. 279-320, May 2022. [Online]. Available: https://doi.org/10.13052/jcsm2245-1439.1131
  • [9] Enron email dataset. [Accessed: 14 June 2023]. [Online]. Available: https://www.cs.cmu.edu/∼enron/
  • [10] Jose nazario phishing email corpus. [Accessed: 14 June 2023]. [Online]. Available: https://monkey.org/∼jose/phishing/
  • [11] B. Klimt and Y. Yang, “Introducing the enron corpus,” in First Conference on Email and Anti-Spam (CEAS), Mountain View, CA, 2004, [Accessed: 14 June 2023]. [Online]. Available: https://www.ceas.cc/papers-2004/168.pdf
  • [12] spacy python model. [Accessed: 14 June 2023]. [Online]. Available: https://github.com/explosion/spacy-models/releases/tag/en_core web sm-3.5.0
  • [13] spacy token class attributes. [Accessed: 14 June 2023]. [Online]. Available: https://spacy.io/api/token#attributes
  • [14] S. V. K. S. Said Salloum, Tarek Gaber, “A systematic literature review on phishing email detection using natural language processing techniques,” IEEE Access, vol. 10, pp. 2169-3536, June 2022. [Online]. Available: https://doi.org/10.1109/ACCESS.2022.3183083
  • [15] A. J. T. M. Piotr Bojanowski, Edouard Grave, “Enriching word vectors with subword information,” Transactions of the Association for Computational Linguistics, vol. 5, pp. 135-147, June 2017. [Online]. Available: https://doi.org/10.1162/tacl_a_00051
  • [16] S. J. H. Marcus Butavicius, Ronnie Taib, “Why people keep falling for phishing scams: The effects of time pressure and deception cues on the detection of phishing emails,” Computers and Security, vol. 123, December 2022. [Online]. Available: https: //doi.org/10.1016/j.cose.2022.102937
  • [17] Common crawl. [Accessed: 14 June 2023]. [Online]. Available: https://commoncrawl.org/
  • [18] G. P. J. A. M. T. Grave Edouard, Bojanowski Piotr, “Learning word vectors for 157 languages,” in Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018), 2018.
  • [19] Tensorflow pad sequences method. [Accessed: 14 June 2023]. [Online]. Available: https://www.tensorflow.org/api_docs/python/tf/keras/utils/pad_sequences
  • [20] Tensorflow data performance. [Accessed: 14 June 2023]. [Online]. Available: https://www.tensorflow.org/guide/data performance
  • [21] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Man´e, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Vi´egas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, “TensorFlow: Large-scale machine learning on heterogeneous systems,” 2015, software available from tensorflow.org. [Online]. Available: https://www.tensorflow.org/
  • [22] F. Chollet et al., “Keras,” https://keras.io, 2015, [Accessed: 14 June 2023].
  • [23] P. Wang, Y. Qian, F. K. Soong, L. He, and H. Zhao, “Part-of-speech tagging with bidirectional long short-term memory recurrent neural network,” 2015. [Online]. Available: https://doi.org/10.48550/arXiv. 1510.06168
  • [24] Gmail api. [Accessed: 14 June 2023]. [Online]. Available: https://developers.google.com/gmail/api/guides
  • [25] Fastapi. [Accessed: 14 June 2023]. [Online]. Available: https://fastapi.tiangolo.com/
  • [26] P. P. M. A. K. S. K. Vinayakumar Ravi, Barathi Ganesh Hb, “Deepanti-phishnet: Applying deep neural networks for phishing email detection cen-aisecurity@iwspa-2018.” Tempe AZ USA, March 2018, 1st AntiPhishing Shared Pilot at 4th ACM International Workshop on Security and Privacy Analytics (IWSPA 2018) [Accessed: 14 June 2023]. [Online]. Available: https://ceur-ws.org/Vol-2124/paper 9.pdf
Uwagi
Opracowanie rekordu ze środków MEiN, umowa nr SONP/SP/546092/2022 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2022-2023).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-63f9d9b7-9c82-4866-9ed5-b47be7f38896
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.