The Number of Groups in an Aggregated Approach in Taxonomy with the Use of Stability Measures and Classical Indices - a Comparative Analysis

Rozmus, Dorota

doi:https://doi.org/10.18778/0208-6018.357.04

Nowa wersja platformy, zawierająca wyłącznie zasoby pełnotekstowe, jest już dostępna.
Przejdź na https://bibliotekanauki.pl

Artykuł - szczegóły

Czasopismo

Acta Universitatis Lodziensis. Folia Oeconomica

2021 | vol. 6, t. 357 | 55-67

Tytuł artykułu

The Number of Groups in an Aggregated Approach in Taxonomy with the Use of Stability Measures and Classical Indices - a Comparative Analysis

Autorzy

Dorota Rozmus

Warianty tytułu

Wybór liczby grup w podejściu zagregowanym w taksonomii z wykorzystaniem miar stabilności oraz klasycznych indeksów - porównanie wyników

Języki publikacji

Abstrakty

We współczesnych rozważaniach z dziedziny taksonomii w literaturze często poruszane są dwa pojęcia: podejście zagregowane oraz stabilność metod grupowania. Do tej pory te były one rozważane osobno. Natomiast ciekawą propozycję w zakresie połączenia tych dwóch pojęć przedstawili Y. Șenbabaoğlu, G. Michailidis i J.Z. Li, którzy zasugerowali podejście zagregowane w taksonomii, połączone z zaproponowaną przez siebie miarą stabilności jako kryterium wy-boru optymalnej liczby grup (k). Celem artykułu jest porównanie wyników wyboru wartości parametru k za pomocą wspomnianej miary stabilności oraz klasycznych indeksów (np. Calińskiego-Harabasza, Dunna). (abstrakt oryginalny)

Recently, the two concepts that have been often discussed in the literature on taxonomy are the cluster ensemble and stability. An interesting proposal regarding the combination of these two concepts was presented by Șenbabaoğlu, Michailidis, and Li, who proposed as a measure of stability a proportion of ambiguously clustered pairs (PAC) for selecting the optimal number of groups in the cluster ensemble. This proposal appeared in the field of genetic research, but as the authors themselves write, the method can be successfully used also in other research areas. The aim of this paper is to compare the results of indicating the number of clusters (k parameter) using the aggregated approach in taxonomy and the above-mentioned measure of stability and classical indices (e.g. Caliński-Harabasz, Dunn, Davies-Bouldin). (original abstract)

Słowa kluczowe

Taksonomia Metody taksonomiczne

Taxonomy Taxonomic methods

Czasopismo

Acta Universitatis Lodziensis. Folia Oeconomica

Rocznik

2021

Numer

vol. 6, t. 357

Strony

55-67

Opis fizyczny

Twórcy

autor

Dorota Rozmus

University of Economics in Katowice, Poland

Bibliografia

Aldenderfer M.S., Blashfield R.K. (1984), Cluster analysis, Sage, Beverly Hills.
Anderberg M.R. (1973), Cluster analysis forapplications, Academic Press, New York-San Francisco-London.
Ben-Hur A., Guyon I . (2003), Detecting stable clusters using principal component analysis, "Methods in Molecular Biology", no.224, pp.159-182.
Brock G., Pihur V., Datta S., Datta S. (2008), clValid: an R package forcluster validation, "Journal of Statistical Software", vol.25(4), pp.1-22, https://doi.org/10.18637/jss.v025.i04
Caliński R.B., Harabasz J. (1974), Adendrite method forcluster analysis, "Communications inStatistics", vol.3, pp.1-27.
Chiu D.S., Talhouk A. (2018), diceR: an R package forclass discovery using anensemble driven approach, "BMC Bioinformatics", no.19, 11, https://doi.org/10.1186/s12859-017-1996-y
Davies D.L., Bouldin D.W. (1979), A Cluster Separation Measure, "IEEE Transactions on Pattern Analysis andMachine Intelligence", vol.1(2), pp.224-227.
Dudoit S., Fridlyand J. (2003), Bagging toimprove theaccuracy ofaclustering procedure, "Bioinformatics", vol.19(9), pp.1090-1099.
Dunn J.C. (1974), Well-Separated Clusters andOptimal Fuzzy Partitions, "Journal of Cybernetics", vol.4(1), pp.95-104.
Eurostat (2019), Database, https://ec.europa.eu/eurostat/web/main/data/database (accessed: 20.11.2021).
Everitt B.S., Landau S., Leese M. (2001), Cluster analysis, Edward Arnold, London.
Fang Y., Wang J. (2012), Selection of the number of clusters via the bootstrap method, "Computational Statistics and Data Analysis", no.56, pp.468-477.
Fred A., Jain A.K. (2002), Data clustering using evidence accumulation, "Proceedings of the Sixteenth International Conference on Pattern Recognition", pp.276-280.
Gordon A.D. (1987), A review of hierarchical classification, "Journal of the Royal Statistical Society", ser. A, pp.119-137.
Gordon A.D. (1996), Hierarchical classification, [in:] P. Arabie, L.J. Hubert, G. de Soete (eds.), Clustering and classification, World Scientific, Singapore, pp.65-121.
Henning C. (2007), Cluster-wise assessment of cluster stability, "Computational Statistics and Data Analysis", no.52, pp.258-271.
Hornik K. (2005), ACLUE for CLUster ensembles, "Journal of Statistical Software", no.14, pp.65-72.
Kaufman L., Rousseeuw P.J. (1990), Finding groups indata: anintroduction tocluster analysis, Wiley, New York.
Kuncheva L.I., Vetrov D.P. (2006), Evaluation of stability of k-means cluster ensembles with respect torandom initialization, "IEEE Transactions on Pattern Analysis &Machine Intelligence", vol.28(11), pp.1798-1808.
Leisch F. (1999), Bagged clustering, "Adaptive Information Systems and Modeling in Economics and Management Science", Working Papers, SFB, no. 51.
Lord E., Willems M., Lapointe F.J., Makarenkov V . (2017), Using the stability of objects to determine the number of clusters indatasets, "Information Sciences", no.393, pp.29-46.
Marino V., Presti L.L. (2019), Stay in touch! New insights intoend-user attitudes towards engagement platforms, "Journal ofConsumer Marketing", no.36, pp.772-783.
Monti S., Tamayo P., Mesirov J., Golub T. (2003), Consensus clustering: Aresampling-based method forclass discovery and visualization of gene expression microarray data, "Machine Learning", no.52, pp.91-118.
Șenbabaoğlu Y., Michailidis G., Li J.Z. (2014), Critical limitations of consensus clustering in class discovery, "Scientific Reports", no.4, 6207, https://doi.org/10.1038/srep06207
Shamir O., Tishby N. (2008), Clusterstability for finitesamples, "Advances in Neural Information Processing Systems", no.20, pp.1297-1304.
Sokołowski A. (1995), Percentage points of the similarity measure for partitions, "Statistics in Transition", vol.2(2), pp.195-199.
Suzuki R., Shimodaira H. (2006), Pvclust: an R package for assessing the uncertainty in hierarchical clustering, "Bioinformatics", vol. 22(12), pp. 1540-1542.
Volkovich Z., Barzily Z., Toledano-Kitai D., Avros R. (2010), The Hotteling's metric as a cluster stability index, "Computer Modelling and New Technologies", vol.14(4), pp.65-72.

Typ dokumentu

Bibliografia

Identyfikatory

DOI

https://doi.org/10.18778/0208-6018.357.04

Identyfikator YADDA

bwmeta1.element.ekon-element-000171654250