Comparative Study of Supervised Learning Methods for Malware Analysis

Kruczkowski, M.; Niewiadomska-Szynkiewicz, E.

Artykuł - szczegóły

Tytuł artykułu

Comparative Study of Supervised Learning Methods for Malware Analysis

Autorzy

Kruczkowski M. , Niewiadomska-Szynkiewicz E.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

Malware is a software designed to disrupt or even damage computer system or do other unwanted actions. Nowadays, malware is a common threat of the World Wide Web. Anti-malware protection and intrusion detection can be significantly supported by a comprehensive and extensive analysis of data on the Web. The aim of such analysis is a classification of the collected data into two sets, i.e., normal and malicious data. In this paper the authors investigate the use of three supervised learning methods for data mining to support the malware detection. The results of applications of Support Vector Machine, Naive Bayes and k-Nearest Neighbors techniques to classification of the data taken from devices located in many units, organizations and monitoring systems serviced by CERT Poland are described. The performance of all methods is compared and discussed. The results of performed experiments show that the supervised learning algorithms method can be successfully used to computer data analysis, and can support computer emergency response teams in threats detection.

Słowa kluczowe

data classification k-Nearest Neighbors malware analysis Naive Bayes support vector machine (SVM)

Wydawca

Instytut Łączności - Państwowy Instytut Badawczy

Czasopismo

Journal of Telecommunications and Information Technology

Rocznik

2014

Tom

nr 4

Strony

24--33

Opis fizyczny

Bibliogr. 34 poz., rys., tab.

Twórcy

autor

Kruczkowski M.

Michal.Kruczkowski@nask.pl

Research and Academic Computer Network NASK, Warsaw, Poland
Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland

autor

Niewiadomska-Szynkiewicz E.

ewan@nask.pl

Research and Academic Computer Network NASK, Warsaw, Poland
Institute of Control and Computation Engineering, Warsaw University of Technology, Warsaw, Poland

Bibliografia

[1] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, 2nd ed. Series in Statistics, Springer, 2009.
[2] W. S. Noble, “Support vector machine applications in computational biology”, in Kernel Methods in Computational Biology, B. Scholkopf, K. Tsuda, and J.-P. Vert, Eds. Cambridge, USA: MIT Press, 2004, pp. 71–92.
[3] C. Wagner, G. Wagener, R. State, and T. Engel, “Malware analysis with graph kernels and support vector machines”, in Proc. 4th Int. Conf. Malicious and Unwanted Software MALWARE 2009, Montreal, Canada, 2009, pp. 63–68.
[4] M. Krzyśko, W. Wołyński, T. Gorecki, and M. Skorzybut, Systemy uczące się (Learning systems). Warszawa: Wydawnictwo Naukowo-Techniczne, 2009, pp. 107–187 (in Polish).
[5] K. Rieck, T. Holz, C. Willems, P. Dussel, and P. Laskov, “Learning and classification of malware behavior”, in Proc. 5th In. Conf. DIMVA 2008, Paris, France, 2008, vol. 5137, pp. 108–125.
[6] M. Amanowicz and P. Gajewski, “Military communications and information systems interoperability”, in Proc. Milit. Commun. Conf. MILCOM 96, McLean, VA, USA, 1996, vol. 1–3, pp. 280–283.
[7] R. Kasprzyk and Z. Tarapata, “Graph-based optimization method for information diffusion and attack durability in networks”, in Rough Sets and Current Trends in Computing, LNCS, vol. 6086, pp. 698–709, Springer, 2010.
[8] M. Mincer and E. Niewiadomska-Szynkiewicz, “Application of social network analysis to the investigation of interpersonal connections”, J. Telecommun. Inform. Technol., no. 2, pp. 81–89, 2012.
[9] M. Shankarapani, K. Kancherla, S. Ramammoorthy, R. Movva, and S.Mukkamala, “Kernel machines for malware classification and similarity analysis”, in Proc. Int. Joint Conf. Neural Netw. IJCNN 2010, Barcelona, Spain, 2010, pp. 1–6.
[10] S. Forrest et al., “Self-nonself discrimination in a computer”, in Proc. Comp. Soc. Symp. Res. Secur. and Priv., Oakland, CA, USA, 1994, vol. 10, pp. 311–324.
[11] I. Liane de Oliveira, A. Ricardo, A. Gregio, and A. M. Cansian, “A malware detection system inspired on the human immune system”, in Computational Science and its Applications – ICCSA 2012, LNCS, vol. 7336, pp. 286–301, Springer, 2012.
[12] E. Stalmans and B. Irwin, “A framework for DNS based detection and mitigation of malware infections on a network”, in Proc. 10th Ann. Inform. Secur. South Africa Conf. ISSA 2011, Johannesburg, South Africa, 2011.
[13] M. Zubair Shafiq, S. Ali Khayam, and M. Farooq, “Embedded malware detection using markov n-grams”, in Detection of Intrusions and Malware, and Vulnerability Assessment, T. Holz and H. Bos, Eds., LNCS, vol. 6739, pp. 88–107. Springer, 2008.
[14] M. Franklin, A. Halevy, and D. Maier. “From databases to dataspaces: A new abstraction for information management”, Sigmod Record, vol. 34, no. 4, pp. 27–33, 2005.
[15] K. Lasota and A. Kozakiewicz, “Analysis of the Similarities in Malicious DNS Domain Names”, in Secure and Trust Computing, Data Management and Applications, C. Lee, J.-M. Seigneur, J. J. Park, and R.R. Wagner, Eds. Communications in Computer and Information Science, vol. 187, pp. 1–6. Springer, 2011.
[16] Y. Yanfang, D. Wang, T. Li, and D. Ye, “IMDS: Intelligent malware detection system”, in Proc. 13th ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining KDD’07, San Jose, CA, USA, 2007, pp. 1043–1047.
[17] M. R. Faghani and H. Saidi, “Malware propagation in online social networks”, in Proc. Int. Conf. Malicious and Unwanted Software MALWARE 2009, Montreal, Canada, 2009, pp. 8–14.
[18] P. Cunningham and S. J. Delany, “k-Nearest Neighbour Classifiers”, Tech. Rep. UCD-CSI-2007-4, UCD School of Computer Science and Informatics, Dublin, 2007, pp. 1–17.
[19] J. M. Keller, “A fuzzy k-Nearest Neighbor algorithm”, IEEE Trans. Syst., Man, and Cybernet., vol. 15, no. 4, pp. 580–585, 1985.
[20] L. Jiang, H. Zhang and Z. Cai “A Novel Bayes Model: Hidden Naive Bayes”, IEEE Trans. Knowl. Data Engin., vol. 21, no. 10, pp. 1361–1371, 2009.
[21] V. N. Vapnik, The Nature of Statistical Learning Theory. New York: Springer, 1995.
[22] A. Borders and L. Bottou, “The Huller: a simple and efficient online SVM”, in Machine Learning: ECML-2005, J. Gama, R. Camacho, P. Brazdil, A. Jorge, and L. Torgo, Eds., LNCS, vol. 3720, pp. 505–512. Springer, 2005.
[23] J. Koronacki and J. Ćwik, Statystyczne systemy uczące się (Statistical learning systems). Warsaw: Exit, 2008 (in Polish).
[24] T. Joachims, “Support Vector and Kernel Methods”, SIGIR-Tutorial, Cornell University Computer Science Department, 2003.
[25] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines, 1st ed. Cambridge University Press, 2000, pp. 25–29.
[26] E. Ikonomowska, D. Gorgevik, and S. Loskovska, “A survey of stream data mining”, in Proc. 8th Nat. Conf. Int. Particip. ETAI 2007, Ohrid, Republic of Macedonia, 2007, pp. 16–20.
[27] T. Joachims, “Training linear SVMs in linear time”, in Proc. 12th ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining KDD 2006, Philadelphia, PA, USA, 2006 pp. 217–226.
[28] Kernel Machines homepage [Online]. Available: http://www.kernel-machines.org/
[29] C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition”, Knowledge Discovery and Data Mining, vol. 2, no. 2, pp. 121–167, 1998.
[30] T. Jebara, “Multi-task feature and kernel selection for SVMs”, in Proc. Int. Conf. on Machine Learning, Banff, Canada, 2004, pp. 55–63.
[31] F. R. Bach, G. Lanckriet, and M. Jordan “Multiple kernel learning, conic duality and the smo algorithm”, in Proc. 21st Int. Conf. Machine Learning ICML’04, Banff, Canada, 2004, pp. 6–13.
[32] F. R. Bach, R. Thibaux, and M. I. Jordan, “Computing regularization paths for learning multiple kernels” in Advances in Neural Information Processing Systems, L. K. Saul, Y. Weiss, and L. Bottou, Eds. MIT Press, 2005, pp. 73–80.
[33] N6 Platform homepage [Online]. Available: http://www.cert.pl/news/tag/n6
[34] G. M. Draper, “Interactive radial vizualizations for information retrieval and management”, Ph.D. Thesis, Univertity of Utah, 2009, Chapter 3.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-13105d82-68da-4a6e-bc1d-3024abc6ffdc