Predictive Modelling of a Honeypot System Based on a Markov Decision Process and a Partially Observable Markov Decision Process

Wang, Lidong; Mosher, Reed; Duett, Patti; Falls, Terril

doi:10.5604/01.3001.0016.2027

Artykuł - szczegóły

Tytuł artykułu

Predictive Modelling of a Honeypot System Based on a Markov Decision Process and a Partially Observable Markov Decision Process

Autorzy

Wang Lidong , Mosher Reed , Duett Patti , Falls Terril

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.5604/01.3001.0016.2027

Warianty tytułu

Języki publikacji

Abstrakty

A honeypot is used to attract and monitor attacker activities and capture valuable information that can be used to help practice good cybersecurity. Predictive modelling of a honeypot system based on a Markov decision process (MDP) and a partially observable Markov decision process (POMDP) is performed in this paper. Analyses over a finite planning horizon and an infinite planning horizon for a discounted MDP are respectively conducted. Four methods, including value iteration (VI), policy iteration (PI), linear programming (LP), and Q-learning, are used in the analyses over an infinite planning horizon for the discounted MDP. The results of the various methods are compared to evaluate the validity of the created MDP model and the parameters in the model. The optimal policy to maximise the total expected reward of the states of the honeypot system is achieved, based on the MDP model employed. In the modelling over an infinite planning horizon for the discounted POMDP of the honeypot system, the effects of the observation probability of receiving commands, the probability of attacking the honeypot, the probability of the honeypot being disclosed, and transition rewards on the total expected reward of the honeypot system are studied.

Słowa kluczowe

cybersecurity honeypot machine learning Markov decision process Q-learning

cyberbezpieczeństwo honeypot uczenie maszynowe proces decyzyjny Markowa Q-learning

Wydawca

NASK – National Research Institute

Czasopismo

Applied Cybersecurity & Internet Governance

Rocznik

2023

Tom

Vol. 2, No. 1

Strony

1--14

Opis fizyczny

Bibliogr. 31 poz., rys., tab., wykr.

Twórcy

autor

Wang Lidong

lidong@iser.msstate.edu

Institute for Systems Engineering Research Mississippi State University, Mississippi, USA

https://orcid.org/0000-0003-3923-849X

autor

Mosher Reed

Institute for Systems Engineering Research Mississippi State University, Mississippi, USA

autor

Duett Patti

Institute for Systems Engineering Research Mississippi State University, Mississippi, USA

autor

Falls Terril

Institute for Systems Engineering Research Mississippi State University, Mississippi, USA

https://orcid.org/0009-0006-4468-4928

Bibliografia

1. T.M. McKenzie, Is Cyber Deterrence Possible?. Alabama: Air University Press, 2017.
2. S. Srujana, P. Sreeja, G. Swetha, H. Shanmugasundaram, “Cutting edge technologies for improvedcybersecurity model: A survey,” International Conference on Applied Artificial Intelligence and Computing(ICAAIC), 2022, pp. 1392–1396.
3. L. Vokorokos, A. Pekár, N. Ádám, P. Darányi, “Yet another attempt in user authentication,” Acta PolytechnicaHungarica, vol. 10, no. 3, pp. 37–50, 2013.
4. J. Palša, J. Hurtuk, E. Chovancová, M. Havira, “Configuration honeypots with an emphasis on logging of theattacks and redundancy,” IEEE 20th Jubilee World Symposium on Applied Machine Intelligence and Informatics(SAMI), 2022, pp. 000073–000076, doi: 10.1109/SAMI54271.2022.9780801.
5. F. Franzen, L. Steger, J. Zirngibl, P. Sattler, “Looking for honey once again: Detecting RDP and SMB honeypotson the Internet,” IEEE Looking European Symposium on Security and Privacy Workshops (EuroS&PW), 2022.
6. M. Boffa, G. Milan, L. Vassio, I. Drago, M. Mellia, Z.B. Houidi, “Towards NLP-based processing of honeypotlogs,” IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), 2022, pp. 314–321.
7. Z. Shamsi, D. Zhang, D. Kyoung, A. Liu, (2022, June). “Measuring and clustering network attackers usingmedium-interaction honeypots,” IEEE European Symposium on Security and Privacy Workshops (EuroS&PW),2022, pp. 294–306.
8. X. Liu, H. Zhang, S. Dong, Y. Zhang, “Network defense decision-making based on a stochastic gamesystem and a deep recurrent Q-network,” Computers & Security, vol. 111, p. 102480, 2021, doi: 10.1016/j.cose.2021.102480.
9. H. Itoh, H. Nakano, R. Tokushima, H. Fukumoto, H. Wakuya, “A partially observable markov decisionprocess-based blackboard architecture for cognitive agents in partially observable environments,” IEEETransactions on Cognitive and Developmental Systems, 2020, doi: 10.1109/TCDS.2020.3034428.
10. M. Haklidir, H. Temeltaş, “Guided soft actor critic: a guided deep reinforcement learning approach forpartially observable markov decision processes,” IEEE Access, vol. 9, pp. 159672–159683, 2021, doi: 10.1109/ACCESS.2021.3131772.
11. A.R. Cassandra. (2003). A survey of POMDP applications. [Online]. Available: http://www.cassandra.org/arc/papers/applications.pdf. [Accessed: Nov. 27, 2022].
12. O. Hayatle, H. Otrok, A. Youssef, “A Markov decision process model for high interaction honeypots,”Information Security Journal: A Global Perspective, vol. 22, no. 4, pp. 159–170, 2013.
13. M. Mohri, A. Rostamrdeh, A. Talwalkar, Foundations of machine learning. Adaptive computation andmachine learning. Cambridge, Massachusetts: MIT Press, 2012.
14. M.A. Alsheikh, D.T. Hoang, D. Niyato, H.P. Tan, S. Lin, “Markov decision processes with applications inwireless sensor networks: A survey,” IEEE Communications Surveys & Tutorials, vol. 17, no. 3, 1239–1267,2015, doi: 10.1109/COMST.2015.2420686.
15. Y. Chen, J. Hong, C.C. Liu, “Modeling of intrusion and defense for assessment of cyber security at powersubstations,” IEEE Transactions on Smart Grid, vol. 9, no. 4, pp. 2541–2552, 2018.
16. R.S. Sutton, A.G. Barto, Reinforcement learning: An introduction. Cambridge, Massachusetts: MIT press,2018.
17. M. van Otterlo, "Markov decision processes: Concepts and algorithms," in Reinforcement Learning.Adaptation, Learning, and Optimization, vol 12, M. Wiering, M. van Otterlo, M., Eds, Berlin, Heidelberg: Springer.doi: 10.1007/978-3-642-27645-3_1.
18. M. van Otterlo, M. Wiering, “Reinforcement learning and Markov decision processes,” in ReinforcementLearning. Adaptation, Learning, and Optimization, vol 12, M. Wiering, M. van Otterlo, M., Eds, Berlin, Heidelberg:Springer, pp 3-42. doi: 10.1007/978-3-642-27645-3_1.
19. O. Sigaud, O. Buffet, Markov decision processes in artificial intelligence. Hoboken, New Jersey: John Wiley& Sons, 2013.
20. S.J. Majeed, M. Hutter, “On Q-learning convergence for non-Markov decision processes,” IJCAI,pp. 2546–2552, 2018, doi: 10.24963/ijcai.2018/353.
21. E. Zanini, Markov decision processes. Berlin, Heidelberg: Springer, 2014.
22. Y. Liu, H. Liu, B. Wang, “Autonomous exploration for mobile robot using Q-learning,” 2nd InternationalConference on Advanced Robotics and Mechatronics (ICARM), 2017, pp. 614–619, doi: 10.1109/ICARM.2017.8273233.
23. G.E. Monahan, “State of the art—a survey of partially observable Markov decision processes: theory,models, and algorithms,” Management science, vol. 28, no. 1, pp. 1–16, 1982, doi: 10.1287/mnsc.28.1.1.
24. M.L. Littman, A.R. Cassandra, L.P. Kaelbling, Efficient dynamic-programming updates in partiallyobservable Markov decision processes, Department of Computer Science. Providence, Rhode Island: BrownUniversity, CS-95-19, 1995.
25. J. Stuart, P. Norvig, Artificial intelligence: A modern approach. London: Pearson, 3rd ed., 2010.
26. H. Kurniawati, D. Hsu, W.S. Lee, “Sarsop: Efficient point-based POMDP planning by approximatingoptimally reachable belief spaces,” Robotics: Science and Systems, 2008.
27. J. Pineau, G. Gordon, S. Thrun, “Point-based value iteration: An anytime algorithm for POMDPs,” IJCAI, vol.3, pp. 1025–1032, 2003.
28. E.J. Sondik, The optimal control of partially observable Markov processes. Stanford, California: StanfordUniversity, 1971.
29. N.L. Zhang, W. Liu, Planning in stochastic domains: Problem characteristics and approximation, Departmentof Computer Science. Hong Kong: Hong Kong University of Science and Technology, HKUST-CS96-31, 1996.
30. A.R. Cassandra, M.L. Littman, N.L. Zhang, “Incremental pruning: A simple, fast, exact method for partiallyobservable Markov decision processes,” arXiv preprint arXiv:1302.1525, 2013.
31. R.I. Brafman, “A heuristic variable grid solution method for POMDPs,” AAAI/IAAI, 1997, pp. 727–733.

Uwagi

Opracowanie rekordu ze środków MEiN, umowa nr SONP/SP/546092/2022 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2022-2023).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-636ad7fe-a0b8-4e3c-b161-377302f12902