Semi-GAPS: A Semi-supervised Clustering Method Using Point Symmetry

Saha, S.; Bandyopadhyay, S.

Artykuł - szczegóły

Tytuł artykułu

Semi-GAPS: A Semi-supervised Clustering Method Using Point Symmetry

Autorzy

Saha S. , Bandyopadhyay S.

Wybrane pełne teksty z tego czasopisma

https://fi.episciences.org/

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

In this paper, an evolutionary technique for the semi-supervised clustering is proposed. The proposed technique uses a point symmetry based distance measure. Semi-supervised classification uses aspects of both unsupervised and supervised learning to improve upon the performance of traditional classification methods. In this paper the existing point symmetry based genetic clustering technique, GAPS-clustering, is extended in two different ways to handle the semi-supervised classification problem. The proposed semi-GAPS clustering algorithmis able to detect any type of clusters irrespective of shape, size and convexity as long as they possess the point symmetry property. Kd-tree based nearest neighbor search is used to reduce the complexity of finding the closest symmetric point. Adaptive mutation and crossover probabilities are used. Experimental results demonstrate practical performance benefits of the methodology in detecting classes having symmetrical shapes in case of semi-supervised clustering.

Słowa kluczowe

semi-supervised classification genetic algorithm symmetry point symmetry based distance Kd-tree

Wydawca

IOS Press

Czasopismo

Fundamenta Informaticae

Rocznik

2009

Tom

Vol. 96, nr 1/2

Strony

195--209

Opis fizyczny

Bibliogr. 25 poz., tab., wykr.

Twórcy

autor

Saha S.

autor

Bandyopadhyay S.

Machine Intelligence Unit, Indian Statistical Institute, Kolkata-700108, India, sriparna_r@isical.ac.in

Bibliografia

[1] Http://www.ics.uci.edu/~mlearn/MLRepository.html.
[2] Anderberg,M. R.: Computational Geometry: Algorithms and Applications, Springer, 2000.
[3] Bair, E., Tibshirani, R.: Semi-supervisedmethods to predict patient survival fromgene-expression data, PloS Biol, 2(4), 2004, 0511-0521.
[4] Bandyopadhyay, S., Maulik, U.: Genetic Clustering for Automatic Evolution of Clusters and Application to Image Classification, Pattern Recognition, (2), 2002, 1197-1208.
[5] Bandyopadhyay, S., Saha, S.: GAPS: A Clustering Method Using A New Point Symmetry Based Distance Measure, Pattern Recognition, 40, 2007, 3430-3451.
[6] Basu, S., Banerjee, A., Mooney, R.: Semi-supervised Clustering by Seeding, Proceedings of 19th International Conference on Machine learning (ICML'02), Sydney, Australia, 2002.
[7] Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training, in: Proceedings of the Conference on Computational Learning Theory, ACM Press, New York, NY, 1998.
[8] Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees, Wadsworth International, California, 1984.
[9] Chou, C. H., Su, M. C., Lai, E.: Symmetry as A new Measure for Cluster Validity, in: 2nd WSEAS Int. Conf. on Scientific Computation and Soft Computing, Crete, Greece, 2002, 209-213.
[10] Davies, D. L., Bouldin, D. W.: A Cluster Separation Measure, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1, 1979, 224-227.
[11] Demiriz, A., Bennett, K., Embrechts, M.: A genetic algorithm approach for semi-supervised clustering, Smart Engineering System Design, 2002, 21-30.
[12] Fisher, R. A.: The Use of Multiple Measurements in Taxonomic Problems, Annals of Eugenics, 3, 1936, 179-188.
[13] Handl, J., Knowles, J.: On semi-supervised clustering via multiobjective optimization, GECCO '06: Proceedings of the 8th annual conference on Genetic and evolutionary computation, ACM, New York, NY, USA, 2006, ISBN 1-59593-186-4.
[14] Hanisch, D., Zien, A., Zimmer, R., Lengauer, T.: Co-clustering of biological networks and gene expression data, Bioinformatics, 18(90001), 2002, 145-154.
[15] Holland, J. H.: Adaptation in Natural and Artificial Systems, The University of Michigan Press, AnnArbor, 1975.
[16] Joachims, T.: Transductive inference for text classification using support vector machines, in: Proceedings of ICML-99, Morgan Kaufmann, Paris, France, 1999, 200-209.
[17] Klein, D., Kamvar, S. D., Manning, C.: From Instance-level Constraints to Space-level Constraints: Making the most of prior knowledge in data clustering, Proceedings of 19th International Conference on Machine learning (ICML'02), Sydney, Australia, 2002.
[18] Li, T., Zhu, S., Li, Q., Ogihara, M.: Gene functional classification by semi-supervised learning from heterogenous data, Proceedings of the Symposium on Applied Computing, ACM Press, New York, NY.
[19] Mount, D. M., Arya, S.: ANN: A Library for Approximate Nearest Neighbor Searching, 2005, Http://www.cs.umd.edu/~mount/ANN.
[20] Speer, N., Spieth, C., Zell, A.: A memetic co-clustering algorithm for gene expression profiles and biological annotation, Proceedings of the Congress on Evolutionary Computation, IEEE Press, Sydney, Australia, 2004.
[21] Srinivas, M., Patnaik, L.: Adaptive Probabilities of Crossover and Mutation in Genetic Algorithms, IEEE Transactions on Systems, Man and Cybernatics, 24(4), April, 1994, 656-667.
[22] Su, M.-C., Chou, C.-H.: A Modified Version of the K-means Algorithm with a Distance Based on Cluster Symmetry, IEEE Transactions Pattern Analysis and Machine Intelligence, 23(6), 2001, 674-680.
[23] Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained K-Means Clustering with Back-ground Knowledge, Proceedings of 18th International Conference on Machine learning (ICML'01), 2001.
[24] Xing, E., Ng, A., Jordan, M., Russell, S.: Distance Metric Learning with Application to Clustering with Side-information, Advances in Neural Information Processing Systems.
[25] Yeung, K. Y., Ruzzo, W. L.: An empirical study on principal component analysis for clustering gene expression data, Bioinformatics, 17(9), 2001, 763-774.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BUS8-0008-0048