On Seeking Consensus Between Document Similarity Measures

Kłopotek, M.

doi:10.3233/FI-2017-1597

Artykuł - szczegóły

Tytuł artykułu

On Seeking Consensus Between Document Similarity Measures

Autorzy

Kłopotek M.

Wybrane pełne teksty z tego czasopisma

https://fi.episciences.org/

Identyfikatory

DOI

10.3233/FI-2017-1597

Warianty tytułu

Języki publikacji

Abstrakty

This paper investigates the application of consensus clustering and meta-clustering to the set of all possible partitions of a data set. We show that when using a ”complement” of Rand Index as a measure of cluster similarity, the total-separation partition, putting each element in a separate set, is chosen.

Słowa kluczowe

cluster analysis partitioning clustering consensus functions ensemble knowledge reuse unsupervised learning meta-clustering

Wydawca

IOS Press

Czasopismo

Fundamenta Informaticae

Rocznik

2017

Tom

Vol. 156, nr 1

Strony

43--68

Opis fizyczny

Bibliogr. 28 poz., rys., tab.

Twórcy

autor

Kłopotek M.

klopotek@ipipan.waw.pl

Institute of Computer Science, Polish Academy of Sciences ul. Jana Kazimierza 5, 01-248 Warszawa, Poland

Bibliografia

[1] Wang H, Shan H, Banerjee A. Bayesian cluster ensembles. Statistical Analysis and Data Mining, 2011; 4:54–70. doi:10.1002/sam.10098.
[2] Gionis A, Mannila H, Tsaparas P. Clustering Aggregation. ACM Trans. Knowl. Discov. Data, 2007;1(1). doi:10.1145/1217299.1217303.
[3] Caruana R, Elhawary M, Nguyen N, Smith C. Meta Clustering. In: Proceedings of the Sixth International Conference on Data Mining, ICDM ’06. IEEE Computer Society, Washington, DC, USA. 2006 pp. 107–118. ISBN 0-7695-2701-9.
[4] Niu D, Dy JG, Jordan MI. Multiple Non-Redundant Spectral Clustering Views. In: ICML’10. 2010 pp. 831–838. URL http://dblp.uni-trier.de/db/conf/icml/icml2010.html#NiuDJ10.
[5] Bifulco I, Iorio F, Napolitano F, Raiconi G, Tagliaferri R. Interactive Visualization Tools for Meta-Clustering. In: Proceedings of the 2009 conference on New Directions in Neural Networks: 18th Italian Workshop on Neural Networks: WIRN 2008. IOS Press, Amsterdam, The Netherlands, The Netherlands. ISBN 978-1-58603-984-4, 2009 pp. 223–231.
[6] Bifulco I, Fedullo C, Napolitano F, Raiconi G, Tagliaferri R. Multiple data structure discovery through global optimisation, meta clustering and consensus methods. In: International Journal of Knowledge Engineering and Soft Data Paradigms, v.1 n.4, October 2009, pp. 300–317. URL https://doi.org/10.1504/IJKESDP.2009.028984.
[7] Dasgupta S, Ng V. Which clustering do you want? inducing your ideal clustering with minimal feedback. J. Artif. Int. Res., 2010;39:581–632. URL http://dl.acm.org/citation.cfm?id=1946417.1946430.
[8] Cui Y, Fern XZ, Dy JG. Learning multiple nonredundant clusterings. ACM Transactions on Knowledge Discovery from Data (TKDD), 2010;4(3):15:1–15:32. doi:10.1145/1839490.1839496.
[9] Anderberg M. Cluster Analysis for Applications. Academic Press, London, 1973. ISBN:9781483191393.
[10] Strehl A, Ghosh J. Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res., 2003;3:583–617. doi:10.1162/153244303321897735.
[11] Goder A, Filkov V. Consensus Clustering Algorithms: Comparison and Refinement. In: Munro JI, Wagner D (eds.), Proceedings of the Workshop on Algorithm Engineering and Experiments, ALENEX 2008, San Francisco, California, USA, January 19, 2008 pp. 109–117. doi:10.1137/1.9781611972887.11.
[12] Hore P, Hall LO, Goldgof DB. A scalable framework for cluster ensembles. Pattern Recogn., 2009; 42(5):676–688. doi:10.1016/j.patcog.2008.09.027.
[13] Ghosh J, Acharya A. Cluster ensembles. Wiley Interdisc. Rew.: Data Mining and Knowledge Discovery, 2011;1(4):305–315. doi:10.1002/widm.32.
[14] Li T, Ding C. Weighted consensus clustering. In: Proceedings of 2008 SIAM International Conference on Data Mining (SDM 2008), Atlanta, April 24-26, 2008. Society for Industrial and Applied Mathematics, 2008 pp. 798–809. URL https://doi.org/10.1137/1.9781611972788.72.
[15] Punera K, Ghosh J. Consensus Based Ensembles of Soft Clusterings. Applied Artificial Intelligence: An International Journal, 2008;22(7-8):780–810. doi:10.1080/08839510802170546.
[16] Monti S, Tamayo P, Mesirov J, Golub T. Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data. Mach. Learn., 2003;52(1-2):91–118. doi:10.1023/A:1023949509487.
[17] Topchy A, Jain AK, Punch W. Clustering ensembles: Models of consensus and weak partitions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005;27:1866–1881. doi:10.1109/TPAMI.2005.237.
[18] Nguyen N, Caruana R. Consensus Clusterings. In: Proceedings of the 7th IEEE International Conference on Data Mining (ICDM 2007), October 28-31, 2007, Omaha, Nebraska, USA. 2007 pp. 607–612. doi:10.1109/ICDM.2007.73.
[19] Wang Y, Pan Y. Semi-Supervised Consensus Clustering for Gene Expression Data Analysis. BioData Mining, 2014;7(7):13. doi:10.1186/1756-0381-7-7.
[20] Vogel T, Naumann F. Semi-Supervised Consensus Clustering: Reducing Human Effort. In: Proceedings of the International Workshop on Data Integration and Applications. 2014. doi:10.1109/ICDMW.2014.97.
[21] Barthelemy JP, Leclerc B. The median procedure for partition. In: et al IC (ed.), Partitioning Data Sets, AMS DIMACS Series in Discrete Mathematics. 1995 pp. 3–34.
[22] Gordon A, Vichi M. Partitions of partitions. Journal of Classification, 1998;15(2):265–285. doi:10.1007/s003579900034.
[23] Goder A, Filkov V. Consensus Clustering Algorithms: Comparison and Refinement. In: Alenex, volume 8. SIAM, 2008 pp. 109–117. doi:10.1137/1.9781611972887.11.
[24] Morlini I, Zani S. Comparing Approaches for Clustering Mixed Mode Data: An Application in Marketing Research. In: Palumbo F, Lauro CN, Greenacre M (eds.), Data Analysis and Classification: Proceedings of the 6th Conference. Springer, 2010 pp. 49–57. doi:10.1007/978-3-642-03739-9_6.
[25] Lei Y, Bezdek JC, Romano S, Vinh NX, Chan J, Bailey J. Ground Truth Bias in External Cluster Validity Indices. CoRR, 2016. abs/1606.05596.
[26] Milligan GW, Cooper MC. A study of the comparability of external criteria for hierarchical cluster analysis. Multivar. Behav. Res. 1986;21(4):441–458. doi:10.1207/s15327906mbr2104_5.
[27] Fowlkes EB, Mallows CL. A method for comparing two hierarchical clusterings, J. Am. Stat. Assoc., 1983;78(383):553–569. doi:10.2307/2288117.
[28] Simone Romano VNKV James Bailey. Standardized Mutual Information for Clustering Comparisons: One Step Further in Adjustment for Chance. In: Proceedings of The 31st International Conference on Machine Learning. 2014 pp. 1143–1151. URL http://jmlr.org/proceedings/papers/v32/romano14.pdf.

Uwagi

Opracowanie rekordu w ramach umowy 509/P-DUN/2018 ze środków MNiSW przeznaczonych na działalność upowszechniającą naukę (2018).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-1d4770e0-1f8e-4eaf-bf4f-e7ae259f8180