Using Bagging Aggregation Method in Taxonomy
Ensemble approach based on aggregated models has been successfully applied in the context of supervised learning in order to increase the accuracy and stability of classification. Recently, analogous techniques for cluster analysis have been suggested. Research has proved that, by combining a set of different clusterings, an improved solution can be obtained. In the literature a resampling method, inspired from bagging in classification, was proposed to improve the accuracy and stability of clustering procedures. In the ensemble method, a partitioning clustering method is applied to bootstrap learning sets and the resulting different partitions are combined by majority voting. Similarly as in prediction, the motivation behind bagging is to reduce variability in the partitioning results via averaging. The performance of the new and existing methods were compared using real and artificial data sets. Generally the bagged clustering procedure was at least as accurate and ofter even much more accurate than a single application of the partitioning clustering method.(original abstract)
- Ayad H., Kamei M. (2003): Finding Natural Clusters Using Multi-Clusterer Combiner Based on Shared Nearest Neighbors. "Proceedings of the Fourth International Workshop on Multiple Classitier Systems", MCS'03. Vol. 2709 of Lecture Notes in Computer Science. Springer Verlag. Guildtord, UK. pp. 166-175.
- Bezdek J.C. (1981): Pattem Recognition with Fuzzy Objective Function Algorithms. Plenum, New York.
- Blake C., Keogh E., Merz C.J. (1988): UCI Repository ot Machine Learning Databases. Department of Intormation and Computer Science. University of Calitornia, Irvine.
- Breiman L. (1996): Bagging Predictors. "Machine Learning", 26(2). pp. 123-140.
- Breiman L. (1998): Arcing classifiers. "Annals of Statistics., 26, pp. 801-824.
- Dudoit S.. Fridlyand J. (2003): Bagging to Improve the Accuracy of a Clustering Procedure. "Biointormaties", Vol. 19. No. 9. pp. 1090-1099.
- Fern X.Z., Brodley C.E. (2003): Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach. "Proceedings of the Twentieth International Conterence on Machine Learning", ICML, pp. 186-193.
- Fischer B., Buhmann J.M. (2003): Bagging tor Path-Based Clustering. "IEEE Transactions on Pattern Analysis and Machine Intelligence", 25(11), pp. 1411-1415.
- Fred A., Jain A. K. (2002): Data Clustering Using Evidence Accumulation. "Proceedings of the Sixteenth International Conterence on Pattern Recognition", ICPR. Canada, pp. 276-280.
- Freund Y. (1990): Boosting a Weak Learning Algorithm by Majority. "Proceedings of the Third Annual Workshop on Computational Learning Theory", pp. 202-216.
- Freund Y., Schapire R. E. (1995): A Decision-Theoretic Generalization of On-line Learning and an Application to Boosting. "Proceedings of the Second European Conterence on Computational Learning Theory", Springer Verlag. pp. 23-27.
- Kautman, L., Rousseeuw P.J. (1990): Finding Groups in Data: An Introduction to Cluster Analysis, Wiley & Sons. Inc., New York.
- Leisch F. (1996): Bagged Clustering. Technical Report. SFB Adeptive Intormation Systems and Modelling in Economics and Management Science. University ot Economics and Business. Vienna, http://www.ci.tuwen.ac.at! teisch/papers/fl-techrep.htm.
- Monti S., Tamayo P., Mesirov J., Golub T. (2003): Consensus Clustering: A Resampling Based Method for Class Discovery and Visualization of Gene Expression Microarray Data. "Machine Learning", 52, pp. 91-118.
- Strehl A., Ghosh J. (2002): Cluster Ensembles - a Knowledge Reuse Framework for Combining Multiple Partitions. "Journal of Machine Learning Research", 3. pp. 583-618.