Towards Obtaining Upper Bound on Sensitivity Computation Process for Cluster Validity Measures

Mishra, S.; Mondal, S.; Saha, S.

doi:10.3233/FI-2018-1749

Powiadomienia systemowe

Sesja wygasła!

Artykuł - szczegóły

Tytuł artykułu

Towards Obtaining Upper Bound on Sensitivity Computation Process for Cluster Validity Measures

Autorzy

Mishra S. , Mondal S. , Saha S.

Wybrane pełne teksty z tego czasopisma

https://fi.episciences.org/

Identyfikatory

DOI

10.3233/FI-2018-1749

Warianty tytułu

Języki publikacji

Abstrakty

Cluster validity indices are proposed in the literature to measure the goodness of a clustering result. The validity measure provides a value which shows how good or bad the obtained clustering result is, as compared to the actual clustering result. However, the validity measures are not arbitrarily generated. A validity measure should satisfy some of the important properties. However, there are cases when in-spite of satisfying these properties, a validity measure is not able to differentiate the two clustering results correctly. In this regard, sensitivity as a property of validity measure is introduced to capture the differences between the two clustering results. However, sensitivity computation is a computationally expensive task as it requires to explore all the possible combinations of clustering results which are very large in number and these are growing exponentially. So, it is required to compute the sensitivity efficiently. As the possible combinations of clustering results grow exponentially, so it is required to first obtain an upper bound on this possible number of combinations which will be sufficient to compute the value of the sensitivity. In this paper, we obtain an upper bound on the number of possible combinations of clustering results. For this purpose, a generic approach which is suitable for various validity measures and a specific approach which is applicable for two validity measures are proposed. It is also shown that this upper bound is sufficient to compute the sensitivity of various validity measures. This upper bound is very less as compared to the total number of possible combinations of clustering results.

Słowa kluczowe

cluster validity measure clustering algorithm sensitivity

Wydawca

IOS Press

Czasopismo

Fundamenta Informaticae

Rocznik

2018

Tom

Vol. 163, nr 4

Strony

351--374

Opis fizyczny

Bibliogr. 24 poz., tab., wykr.

Twórcy

autor

Mishra S.

sumitmishra@iitp.ac.in

Department of Computer Science & Engineering, Indian Institute of Technology Patna, Patna, Bihar – 801103, India

autor

Mondal S.

samrat@iitp.ac.in

Department of Computer Science & Engineering, Indian Institute of Technology Patna, Patna, Bihar – 801103, India

autor

Saha S.

sriparna@iitp.ac.in

Department of Computer Science & Engineering, Indian Institute of Technology Patna, Patna, Bihar – 801103, India

Bibliografia

[1] Baker FB, Hubert LJ. Measuring the Power of Hierarchical Cluster Analysis, Journal of the American Statistical Association, 1975;70(349):31-38. doi:10.2307/2285371.
[2] Becker H, Riordan J. The Arithmetic of Bell and Stirling Numbers, American Journal of Mathematics, 1948;70(2):385-394. doi:10.2307/2372336.
[3] Bell ET. Partition Polynomials, Annals of Mathematics, 1927;29(1-4):38-46. doi:10.2307/1967979.
[4] Burton DM. Elementary Number Theory, Tata McGraw-Hill Education, 2006. ISBN:9781571461636, 1571461639.
[5] Desgraupes B. Clustering Indices, 2013.
[6] Fowlkes EB, Mallows CL. A Method for Comparing two Hierarchical Clusterings, Journal of the American Statistical Association, 1983;78(383):553-569.
[7] Hubert L, Arabie P. Comparing Partitions, Journal of Classification, 1985;2(1):193-218. doi:10.1007/BF01908075.
[8] Jackson DA, Somers KM, Harvey HH. Similarity Coefficients: Measures of Co-Occurrence and Association or Simply Measures of Occurrence?, American Naturalist, 1989;133(3):436-453. URL https://www.jstor.org/stable/2462129.
[9] Kaufman KL, Hilliker DR, Lathrop P, Daleiden EL. Assessing Child Sexual Offenders’ Modus Operandi: Accuracy in Self-reported use of Threats and Coercion, Annals of Sex Research, 1993;6(3):213-229. doi:10.1007/BF00849562.
[10] Kulczyński S. Die Pflanzenassoziationen der Pieninen, Imprimerie de l’Université, 1928.
[11] Larsen B, Aone C. Fast and Effective Text Mining Using Linear-Time Document Clustering, Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 1999, pp. 16-22. doi:10.1145/312129.312186.
[12] Meilă M, Heckerman D. An Experimental Comparison of Model-Based Clustering Methods, Machine Learning, 2001;42(1-2):9-29. doi:10.1023/A:1007648401407.
[13] Mishra S, Mondal S, Saha S. Entity Matching Technique for Bibliographic Database, Database and Expert Systems Applications, Springer, 2013. doi:10.1007/978-3-642-40173-2_5.
[14] Mishra S, Mondal S, Saha S. Sensitivity-An Important Facet of Cluster Validation Process for Entity Matching Technique, in: Transactions on Large-Scale Data-and Knowledge-Centered Systems XXIX, vol.10120, Springer, 2016, pp. 1-39. doi:10.1007/978-3-662-54037-4_1.
[15] Mishra S, Saha S, Mondal S. A Multiobjective Optimization Based Entity Matching Technique for Bibliographic Databases, Expert Systems with Applications, 2016;65:100-115. URL https://doi.org/10.1016/j.eswa.2016.07.043.
[16] Mukhopadhyay A, Bandyopadhyay S, Maulik U. Multi-Class Clustering of Cancer Subtypes through SVM Based Ensemble of Pareto-Optimal Solutions for Gene Marker Identification, PloS one, 2010;5(11):e13803. URL https://doi.org/10.1371/journal.pone.0013803.
[17] Murray DA. Chironomidae: Ecology, Systematics Cytology and Physiology, Elsevier, 1980. ISBN-10:1483123073, 13:978-1483123073.
[18] Rand WM. Objective Criteria for the Evaluation of Clustering Methods, Journal of the American Statistical Association, 1971;66(336):846-850. doi:10.2307/2284239.
[19] Rogers DJ, Tanimoto TT. A Computer Program for Classifying Plants, Science, 1960;132(3434):1115-1118. doi:10.1126/science.132.3434.1115.
[20] Russell PF, Rao TR, et al.: On Habitat and Association of Species of Anopheline Larvae in South-Eastern Madras, Journal of the Malaria Institute of India, 1940;3(1):153-178.
[21] Wagner S, Wagner D. Comparing Clusterings: An Overview, Universität Karlsruhe, Fakultät für Informatik Karlsruhe, 2007. ISSN: 1432-7864.
[22] Yeung KY, Ruzzo WL. An Empirical Study on Principal Component Analysis for Clustering Gene Expression Data, Bioinformatics, 2001;17(9):763-774.
[23] Yeung KY, Ruzzo WL. Details of the Adjusted Rand Index and Clustering Algorithms, Supplement to the Paper An Empirical Study on Principal Component Analysis for Clustering Gene Expression Data, Bioinformatics, 2001;17(9):763-774.
[24] Yin X, Han J, Yu P. Object Distinction: Distinguishing Objects with Identical Names, IEEE 23rd International Conference on Data Engineering, IEEE, 2007. doi:10.1109/ICDE.2007.368983.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-c2b8529e-b597-4c98-b4f7-23a94d14db52