Identyfikatory
Warianty tytułu
Języki publikacji
Abstrakty
The feature selection problem often occurs in pattern recognition and, more specifically, classification. Although these patterns could contain a large number of features, some of them could prove to be irrelevant, redundant or even detrimental to classification accuracy. Thus, it is important to remove these kinds of features, which in turn leads to problem dimensionality reduction and could eventually improve the classification accuracy. In this paper an approach to dimensionality reduction based on differential evolution which represents a wrapper and explores the solution space is presented. The solutions, subsets of the whole feature set, are evaluated using the k-nearest neighbour algorithm. High quality solutions found during execution of the differential evolution fill the archive. A final solution is obtained by conducting k-fold cross-validation on the archive solutions and selecting the best one. Experimental analysis is conducted on several standard test sets. The classification accuracy of the k-nearest neighbour algorithm using the full feature set and the accuracy of the same algorithm using only the subset provided by the proposed approach and some other optimization algorithms which were used as wrappers are compared. The analysis shows that the proposed approach successfully determines good feature subsets which may increase the classification accuracy.
Rocznik
Tom
Strony
111--122
Opis fizyczny
Bibliogr. 45 poz., rys., tab., wykr.
Twórcy
autor
- Faculty of Electrical Engineering, J.J. Strossmayer University of Osijek, Kneza Trpimira 2b, 31000 Osijek, Croatia
autor
- Faculty of Electrical Engineering, J.J. Strossmayer University of Osijek, Kneza Trpimira 2b, 31000 Osijek, Croatia
autor
- Faculty of Electrical Engineering, J.J. Strossmayer University of Osijek, Kneza Trpimira 2b, 31000 Osijek, Croatia
Bibliografia
- [1] Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J. and García, S. (2011). KEEL data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework, Multiple-Valued Logic and Soft Computing 17(2–3): 255–287.
- [2] Balakrishnan, S., Narayanaswamy, R., Savarimuthu, N. and Samikannu, R. (2008). SVM ranking with backward search for feature selection in type II diabetes databases, Proceedings of the IEEE International Conference On System, Man and Cybernetics, Singapore, pp. 2628–2633.
- [3] Bhatia, N. and Vandana, A. (2010). Survey of nearest neighbor techniques, International Journal of Computer Science and Information Security 8(2): 302–305.
- [4] Chuang, L.-Y., Tsai, S.-W. and Yang, C.-H. (2011). Improved binary particle swarm optimization using catfish effect for feature selection, Expert Systems with Applications 38(10): 12699–12707.
- [5] Das, S., Konar, A. and Chakraborty, U.K. (2005). Two improved differential evolution schemes for faster global search, Proceedings of the 2005 Conference on Genetic and Evolutionary Computation, Washington DC, USA, pp. 991–998.
- [6] Das, S. and Suganthan, P.N. (2011). Differential evolution: A survey of the state-of-the-art, IEEE Transactions on Evolutionary Computation 15(1): 4–31.
- [7] Dash, M. and Liu, H. (1997). Feature selection for classification, Intelligent Data Analysis 1(1–4): 131–156.
- [8] Debska, B. and Guzowska-Swider, B. (2011). Application of artificial neural network in food classification, Analytica Chimica Acta 705(1–2): 283–291.
- [9] Duda, R., Hart, P. and Stork, D. (2001). Pattern Classification, 2nd Edition, Wiley and Sons Inc., New York, NY.
- [10] Eiben, A.E. and Smith, J.E. (2003). Introduction to Evolutionary Computing, Springer-Verlag, Berlin/Heidelberg.
- [11] Engelbrecht, A.P. and Pampara, G. (2007). Binary differential evolution strategies, Proceedings of the IEEE Congress on Evolutionary Computation 2007, Singapore, pp. 1942–1947.
- [12] Ferreira, A.J. and Figueiredo, M.A.T. (2012). Efficient feature selection filters for high-dimensional data, Pattern Recognition Letters 33(13): 1794–1804.
- [13] Frank, A. and Asuncion, A. (2010). UCI machine learning repository, http://archive.ics.uci.edu/ml.
- [14] Garcia, E.K., Feldman, S., Gupta, M.R. and Srivastava, S. (2010). Completely lazy learning, IEEE Transactions on Knowledge and Data Engineering 22(9): 1274–1285.
- [15] Gocławski, J., Sekulska-Nalewajko, J. and Kuźniak, E. (2012). Neural network segmentation of images from stained cucurbits leaves with colour symptoms of biotic and abiotic stresses, International Journal of Applied Mathematics and Computer Science 22(3): 669–684, DOI: 10.2478/v10006-012-0050-5.
- [16] Hsu, C.-W. and Lin, C.-J. (2002). A comparison of methods for multiclass support vector machines, IEEE Transactions on Neural Networks 13(2): 415–425.
- [17] Hsu, H.-H., Hsieh, C.-W. and Lu, M.-D. (2011). Hybrid feature selection by combining filters and wrappers, Expert Systems with Applications 38(7): 8144–8150.
- [18] Jain, A.K., Duin, R.P.W. and Mao, J. (2000). Statistical pattern recognition: A review, IEEE Transactions on Pattern Analysis and Machine Intelligence 22(1): 4–37.
- [19] Javed, K., Babri, H. and Saeed, M. (2012). Feature selection based on class-dependent densities for high-dimensional binary data, IEEE Transactions on Knowledge and Data Engineering 24(3): 465–477.
- [20] Jeleń, L., Fevens, T. and Krzyżak, A. (2008). Classification of breast cancer malignancy using cytological images of fine needle aspiration biopsies, International Journal of Applied Mathematics and Computer Science 18(1): 75–83, DOI: 10.2478/v10006-008-0007-x.
- [21] Jiang, L., Cai, Z., Wang, D. and Jiang, S. (2007). Survey of improving k-nearest-neighbor for classification, Proceedings of the 4th International Conference on Fuzzy Systems and Knowledge Discovery, Haikou, Hainan, China, Vol.1, pp. 679–683.
- [22] Khushaba, R.N., Al-Ani, A. and Al-Jumaily, A. (2008). Differential evolution based feature subset selection, Proceedings of the 19th International Conference on Pattern Recognition, Tampa, FL, USA, pp. 1–4.
- [23] Kubir, M.M., Shahajan, M. and Murase, K. (2011). A new local search based hybrid genetic algorithm for feature selection, Neurocomputing 74(17): 2914–2928.
- [24] Kubir, M.M., Shahajan, M. and Murase, K. (2012). A new hybrid ant colony optimization algorithm for feature selection, Expert Systems with Applications 39(3): 3747–3763.
- [25] Li, C. and Li, H. (2010). A survey of distance metrics for nominal attributes, Journal of Software 5(11): 1262–1269.
- [26] Lichtblau, D. (2012). Differential evolution in discrete optimization, International Journal of Swarm Intelligence and Evolutionary Computation 1(2012): 1–10.
- [27] Loughrey, J. and Cunningham, P. (2004). Overfitting in wrapper-based feature subset selection: The harder you try the worse it gets, in M. Bramer, F. Coenen and T. Allen (Eds.), The Twenty-fourth SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence, Springer, Berlin/Heidelberg, pp. 33–43.
- [28] Martinović, G. and Bajer, D. (2011). Impact of double operators on the performance of a genetic algorithm for solving the traveling salesman problem, in B.K. Panigrahi, P.N. Suganthan, S. Das and S.C. Satapathy (Eds.), Proceedings of the Second International Conference on Swarm, Evolutionary, and Memetic Computing Part I, Springer-Verlag, Berlin/Heidelberg, pp. 290–298.
- [29] Michalak, K. and Kwaśnicka H. (2006). Correlation-based feature selection strategy in classification problems, International Journal of Applied Mathematics and Computer Science 16(4): 503–511.
- [30] Pampara, G., Engelbrecht, A.P. and Franken, N. (2006). Binary differential evolution, Proceedings of the IEEE Congress on Evolutionary Computation 2006, Vancouver, BC, Canada, pp. 1873–1879.
- [31] Price, K.V., Storn, R.M. and Lampinen, J.A. (2005). Differential Evolution. A Practical Approach to Global Optimization, Springer-Verlag, Berlin/Heidelberg.
- [32] R Core Team (2013). R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, http://www.R-project.org.
- [33] Raymer, M.L., Punch, W.F., Goodman, E.D., Kuhn, L.A. and Jain, A.K. (2000). Dimensionality reduction using genetic algorithms, IEEE Transactions on Evolutionary Computation 4(2): 164–171.
- [34] Storn, R. and Price, K. (1997). Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces, Journal of Global Optimization 11(4): 341–359.
- [35] Trawiński, B., Smętek, M., Telec, Z. and Lasota, T. (2012). Nonparametric statistical analysis for multiple comparison of machine learning regression algorithms, International Journal of Applied Mathematics and Computer Science 22(4): 867–881, DOI: 10.2478/v10006-012-0064-z.
- [36] Vegh, V., Pierens, G.K. and Tieng, Q.M. (2011). A variant of differential evolution for discrete optimization problems requiring mutually distinct parameters, International Journal of Innovative Computing, Information and Control 7(2): 897–914.
- [37] Wang, G., Jian, M. and Yang, S. (2011). IGF-bagging: Information gain based feature selection for bagging, International Journal of Innovative Computing, Information and Control 7(11): 6247–6259.
- [38] Woźniak, M. and Krawczyk, B. (2012). Combined classifier based on feature space partitioning, International Journal of Applied Mathematics and Computer Science 22(4): 855–866, DOI: 10.2478/v10006-012-0063-0.
- [39] Wu, O., Zuo, H., Zhu, M., Hu, W., Gao, J. and Wang, H. (2009). Rank aggregation based text feature selection, Proceedings of the IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Tech, Milano, Italy, Vol. 1, pp. 165–172.
- [40] Xinjie, Y. and Mitsuo, G. (2010). Introduction to Evolutionary Algorithms, Springer-Verlag, London.
- [41] Yan, F., Dridi, M. and Moudni, A.E. (2013). An autonomous vehicle sequencing problem at intersections: A genetic algorithm approach, International Journal of Applied Mathematics and Computer Science 23(1): 183–200, DOI: 10.2478/amcs-2013-0015.
- [42] Yang, W., Li, D. and Zhu, L. (2011). An improved genetic algorithm for optimal feature subset selection from multi-character feature set, Expert Systems with Applications 38(3): 2733–2740.
- [43] Yusof, R., Khairuddin, U. and Khalid, M. (2012). A new mutation operation for faster convergence in genetic algorithm feature selection, International Journal of Innovative Computing, Information and Control 8(10(B)): 7363–7379.
- [44] Zhang, J., Avasarala, V., Sanderson, A.C. and Mullen, T. (2008). Differential evolution for discrete optimization: An experimental study on combinatorial auction problems, Proceedings of the IEEE Congress on Evolutionary Computation 2008, Hong Kong, China, pp. 2794–2800.
- [45] Zhua, M., Chena, W., Hirdes, J.P. and Stolee, P. (2007). The k-nearest neighbor algorithm predicted rehabilitation potential better than current clinical assessment protocol, Journal of Clinical Epidemiology 60(10): 1015–1021.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-8b8749af-87c5-4aae-bf03-41099d335101