PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Simultaneous Feature Selection and Extraction Using Feature Significance

Autorzy
Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Dimensionality reduction of a data set by selecting or extracting relevant and nonredundant features is an essential preprocessing step used for pattern recognition, data mining, machine learning, and multimedia indexing. Among the large amount of features present in real life data sets, only a small fraction of them is effective to represent the data set accurately. Prior to analysis of the data set, preprocessing the data to obtain a smaller set of representative features and retaining the optimal salient characteristics of the data not only decrease the processing time but also lead to more compactness of the models learned and better generalization. In this regard, a novel dimensionality reduction method is presented here that simultaneously selects and extracts features using the concept of feature significance. The method is based on maximizing both relevance and significance of the reduced feature set, whereby redundancy therein is removed. The method is generic in nature in the sense that both supervised and unsupervised feature evaluation indices can be used for simultaneously feature selection and extraction. The effectiveness of the proposed method, along with a comparison with existing feature selection and extraction methods, is demonstrated on a set of real life data sets.
Rocznik
Strony
405--431
Opis fizyczny
Bibliogr. 35 poz., tab., wykr.
Twórcy
autor
  • Machine Intelligence Unit, Indian Statistical Institute 203 B. T. Road, Kolkata, 700 108, India
autor
  • Department of Computer Science National Institute of Science and Technology Berhampur, Odisha, 761 008, India
Bibliografia
  • [1] Aspin, A.: Tables for Use in Comparisons Whose Accuracy Involves Two Variances, Biometrika, 36, 1949, 245–271.
  • [2] Battiti, R.: Using Mutual Information for Selecting Features in Supervised Neural Net Learning, IEEE Transactions on Neural Network, 5(4), 1994, 537–550.
  • [3] Bressan, M., Vitria, J.: On the Selection and Classification of Independent Features, IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(10), 2003, 1312–1317.
  • [4] Chen, C. H.: On a Class of Computationally Efficient Feature Selection Criteria, Pattern Recognition, 7(12), 1975, 87–94.
  • [5] Das, S. K.: Feature Selection with a Linear Dependence Measure, IEEE Transactions on Computers, 20(9), 1971, 1106–1109.
  • [6] Dash, M., Liu, H.: Unsupervised Feature Selection, Proceedings of the Pacific Asia Conference on Knowledge Discovery and Data Mining, 2000.
  • [7] Dash, M., Liu, H.: Consistency Based Search in Feature Selection, Artificial Intelligence, 151(1-2), 2003, 155–176.
  • [8] Davies, D. L., Bouldin, D. W.: A Cluster Separation Measure, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1, 1979, 224–227.
  • [9] Devijver, P. A., Kittler, J.: Pattern Recognition: A Statistical Approach, Englewood Cliffs: Prentice Hall, 1982.
  • [10] Ding, C., Peng, H.: Minimum Redundancy Feature Selection from Microarray Gene Expression Data, Journal of Bioinformatics and Computational Biology, 3(2), 2005, 185–205.
  • [11] Duda, R. O., Hart, P. E., Stork, D. G.: Pattern Classification and Scene Analysis, John Wiley & Sons, Inc., New York, 1999.
  • [12] Dunn, J. C.: A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact, Well-Separated Clusters, Journal of Cybernetics, 3, 1974, 32–57.
  • [13] Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection, Journal of Machine Learning Research, 3, 2003, 1157–1182.
  • [14] Hall, M. A.: Correlation Based Feature Selection for Discrete and Numerical Class Machine Learning, Proceedings of the 17th International Conference on Machine Learning, 2000.
  • [15] Heydorn, R. P.: Redundancy in Feature Extraction, IEEE Transactions on Computers, 1971, 1051–1054.
  • [16] Huang, D., Chow, T.W. S.: Effective Feature Selection Scheme Using Mutual Information, Neurocomputing, 63, 2004, 325–343.
  • [17] Koller, D., Sahami, M.: Toward Optimal Feature Selection, Proceedings of the International Conference on Machine Learning, 1996.
  • [18] Kwak, N., Choi, C.-H.: Input Feature Selection by Mutual Information Based on Parzen Window, IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(12), 2002, 1667–1671.
  • [19] Li, Y., Lu, B.-L.: Feature Selection Based on Loss-Margin of Nearest Neighbor Classification, Pattern Recognition, 42(9), 2009, 1914–1921.
  • [20] Liu, H., Sun, J., Liu, L., Zhang, H.: Feature Selection With Dynamic Mutual Information, Pattern Recognition, 42(7), 2009, 1330–1339.
  • [21] Maji, P., Pal, S. K.: Feature Selection Using f-InformationMeasures in Fuzzy Approximation Spaces, IEEE Transactions on Knowledge and Data Engineering, 22(6), 2010, 854–867.
  • [22] Maji, P., Paul, S.: Rough Set Based Maximum Relevance-Maximum Significance Criterion and Gene Selection from Microarray Data, International Journal of Approximate Reasoning, 52(3), 2011, 408–426.
  • [23] Mitra, P., Murthy, C. A., Pal, S. K.: Unsupervised Feature Selection Using Feature Similarity, IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(3), 2002, 301–312.
  • [24] Moustakidis, S. P., Theocharis, J. B.: SVM-FuzCoC: A Novel SVM-Based Feature Selection Method Using a Fuzzy Complementary Criterion, Pattern Recognition, 43(11), 2010, 3712–3729.
  • [25] Pal, S. K., De, R. K., Basak, J.: Unsupervised Feature Evaluation: A Neuro-Fuzzy Approach, IEEE Transactions on Neural Network, 11(2), 2000, 366–376.
  • [26] Pal, S. K., Mitra, P.: Pattern Recognition Algorithms for Data Mining, CRC Press, Boca Raton, Florida, 2004.
  • [27] Pal, S. K., Pal, A., Eds.: Pattern Recognition: from Classical to Modern Approaches, World Scientific, Singapore, 2001.
  • [28] Paul, S., Maji, P.: Rough Sets for Insilico Identification of Differentially Expressed miRNAs, International Journal of Nanomedicine, 8, 2013, 63–74.
  • [29] Peng, H., Long, F., Ding, C.: Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy, IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), 2005, 1226–1238.
  • [30] Quinlan, J. R.: C4.5: Programs for Machine Learning, Morgan Kaufmann, CA, 1993.
  • [31] Sotoca, J. M., Pla, F.: Supervised Feature Selection by Clustering Using Conditional Mutual Information-Based Distances, Pattern Recognition, 43(6), 2010, 2068–2081.
  • [32] Tou, J. T., Gonzalez, R. C.: Pattern Recognition Principles, Addison-Wesley, Reading, MA, 1974.
  • [33] Vapnik, V.: The Nature of Statistical Learning Theory, New York: Springer-Verlag, 1995.
  • [34] Vinzi, V. E., Chin, W., Henseler, J., Wang, H.: Handbook of Partial Least Squares, Springer, 2010.
  • [35] Wei, H.-L., Billings, S.: Feature Subset Selection and Ranking for Data Dimensionality Reduction, IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(1), 2007, 162–166.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-9d32da16-2fd4-4ec1-9d60-c70f012dba3c
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.