Selection of prototypes with the EkP system
A new system for selection of reference instances, which is called the EkP system (Exactly k Prototypes), has been introduced by us recently. In this paper we study suitability of the EkP method for training data reduction on seventeen datasets. As the underlaying classifier the well known IB1 system (1-Nearest Neighbor classifier) has been chosen. We compare generalization ability of our method to performance of IB1 trained on the entire training data and performance of LVQ, Learning Vector Quantization, for which the same number of codebooks has been chosen as the number of prototypes selected by the EkP system. The comparison indicates that even with only a few prototypes which have been chosen by the EkP method on nearly all seventeen datasets statistically indistinguishable results from those given by the IB1 system have been obtained. On many datasets generalization ability of the EkP method has been higher than the one attained with LVQ.
Bibliogr. 31 poz.
- AHA, D., KIBLER, D. and ALBERT, M. (1991) Instance-based learning algorithms. Machine Learning 6, 37-66.
- BHATTACHARYA, B., POULSEN, R. and TOUSSAINT, G. (1981) Application of proximity graphs to editing nearest neighbor decision rule. In: International Symposium on Information Theory, Santa Monica.
- BRIGHTON, H. and MELLISH, C. (2002) Advances in instance selection for instance-based learning algorithms. Data Mining and Knowledge Discovery 6, 153-172.
- BROWNLEE, J. (2004) A Java implementation of the SOM-LVQ PAK. http://www.it.swin.edu.au/personal/jbrownlee/, http://wekaclassalgos.sourceforge.net
- CAMERON-JONES, R. (1995) Instance selection by encoding length heuristic with random mutation hill climbing. In: Proceedings of the Eighth Australian Joint Conference on Artificial Intelligence, 99-106.
- DOBOSZ, K. (2006) Statistical Significance Tests in Estimation of the Results Obtained with Various Systems that Learn. M.Sc. thesis, Nicolaus Copernicus University, Torun, Poland, (in Polish).
- DUCH, W. and BLACHNIK, M. (2004) Fuzzy rule-based systems derived from similarity to prototypes. LNCS 3316, 912-917.
- DUCH, W. and GRUDZINSKI, K. (2001) Prototype based rules - new way to understand the data. IEEE International Joint Conference on Neural Networks, Washington D.C, 1858-1863.
- GATES, G. (1972) The reduced nearest neighbor rule. IEEE Transactions on Information Theory, 18, 665-669.
- GROCHOWSKI, M. (2003) Selection of Reference Vectors in Selected Methods for Classification. M.Sc. thesis, Nicolaus Copernicus University, Department of Applied Informatics, Torun, Poland, (in Polish).
- GROCHOWSKI, M. and JANKOWSKI, N. (2004) Comparison of Instance Selection Algorithms II: Results and Comments. Artificial Intelligence and Soft Computing ICAISC 2004, LNAI 3070, Springer, 580-585.
- GRUDZINSKI, K. and DUCH, W. (2000) SBL-PM: A Simple Algorithm for Selection of Reference Instances for Similarity-Based Methods. Intelligent Information Systems, Bystra, Poland, 2000. In: Advances in Soft, Computing, Physica-Verlag, 99-108.
- GRUDZINSKI, K. (2004) SBL-PM-M: A System for Partial Memory Learning. Artificial Intelligence and Soft Computing ICAISC 2004- LNAI 3070. Springer, 586-591.
- GRUDZINSKI, K. (2005) SBLWeka: A modified Weka System. http://scientific-activity-karol-grudzinski.blogspot.com.
- GRUDZINSKI, K. (2008) EfeP: A fast minimization based prototype selection algorithm. Proceedings of the International IIS’08 Conference, Zakopane, Poland, 2008. Academic Publishing House EXIT, Warsaw, 45-53.
- HART, P. (1968) The condensed nearest neighbor rule. IEEE Transactions on Information Theory 14, 515-516.
- HYNINEN, J., KANGAS, J., KOHONEN, T., LAAKSONNEN, J. and TORKOLLA, K. (1996) LVQ.PAK: The Learning Vector Quantization Program Package.
- JANKOWSKI, N. (2000) Data regularization. In: L. Rutkowski, R. Tadeusiewicz, eds., Neural Networks and Soft Computing, Zakopane, Poland, 209-214.
- JANKOWSKI, N. and GROCHOWSKI, M. (2004) Comparison of Instances Selection Algorithms I: Algorithms Survey. Artificial Intelligence and Soft Computing ICAISC 2004. LNAI 3070, Springer, 598-603.
- KOHONEN, T. (2001) Self-Organizing Maps. 3rd ed. Springer-Verlag, Berlin Heidelberg.
- LAMPTON, M. (no date) neldermead.java . The version of the original nelder-mead.java code modified by the author of this paper that has been used for the calculations used in this paper can be found at http://scientific-activity-karol-grudzinski.blogspot.com.
- MALOOF, M. and MICHALSKI, R. (2000) Selecting Examples for Partial Memory Learning. Machine Learning 41, 27-52.
- MERTZ, C. and MURPHY, P. (1996) UCI repository of machine learning databases. http://archive.ics.uci.edu/ml/.
- NELDER, J. and MEAD, R. (1965) A simplex method for function minimization. Computer Journal 7, 308-313.
- OJA, M., KASKI, S. and KOHONEN, T. (2003) Bibliography of Self-Organizing Map (SOM) Papers: 1998-2001 Addendum. Neural Computing Surveys, 3, 1-156.
- SKALAK, D. (1994) Prototype and feature selection by sampling and random mutation hill climbing algorithms. In: International Conference on Machine Learning, 293-301.
- TOMEK, I. (1976) An experiment with the edited nearest neighbor rule. IEEE Transactions on Systems, Man, and Cybernetics 6, 448-452.
- WILSON, D. (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, and Cybernetics 2, 408-421.
- WILSON, D. and MARTINEZ, T. (1997) Instance Pruning Techniques. In: D. Fisher: Machine Learning: Proceedings of the Fourteenth International Conference. Morgan Kaufmann Publishers, San Francisco, CA., 404-417.
- WILSON, D. and MARTINEZ, T. (2000) Reduction Techniques for Instance-Based Learning Algorithms. Machine Learning, 38, 257-286.
- WITTEN, I. and FRANK, E. (2000) Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publishers.