PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Measuring similarity of complex and heterogeneous data in clustering of large data sets

Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Cluster analysis or classification usually concerns a set of exploratory multivariate data analysis methods and techniques for finding a clustering structure on a dataset. That may refer either to groups of statistical data units or to groups of variables. In this work we deal with a generalization of this paradigm concerning clustering of complex data described by three different types of variables, frequently present in a three-way context. We obtain compatible versions of the same affinity coefficient for measuring similarity between statistical data units described by those three types of variables. A global generalized similarity coefficient is analyzed for such kind of mixed data, often arising in data mining or knowledge mining.
Twórcy
autor
  • Universidade de Lisboa, FPCE, Lisboa, Portugal
Bibliografia
  • 1. Bock H.H., Diday E. [Eds.]: Analysis of Symbolic Data Exploratory Methods for Extracting Statistical Information from Complex Data, Springer, 2000.
  • 2. Bacelar-Nicolau H.: On the generalized affinity coefficient for complex data. Biocybernetics and Biomedical Engineering 2002, 22, 1, 31-42.
  • 3. Nicolau F. C., Bacelar-Nicolau H. et al: Probabilistic models in three way cluster analysis. In: Proceedings of the 56th Session of the International Statistical Institute, Lisbon 2007 (in press), published on the CD Proceedings of ISI 2007.
  • 4. Matusita K.: On the theory of statistical decision functions. Ann. Instit. Stat. Math. 1951, III, 1-30.
  • 5. Bacelar-Nicolau H.: Two probabilistic models for classification of variables in frequency tables - Classification and Related Methods of Data Analysis. In: H. H. Bock [Ed.], Elsevier Sciences Publishers B.V., North Holland, 1988, 181-186.
  • 6. Nicolau F. C., Bacelar-Nicolau H.: Some trends in the classification of variables. In: Hayashi, et al. [eds.]. Data Science, Classification and Related Methods, Springer, 1998, 89-98.
  • 7. SousaA.: Contribuiçŏes à Metodologia VL e índices de validação para Dados de Natureza Complexa. PhD Thesis, Univ. Azores 2005.
  • 8. Ichino M.: General metrics for mixed features - The Cartesian Space Theory for Pattern Recognition. IEEE Transactions on Systems, Man and Cybernetics 1988.
  • 9. Ichino M., Yaguchi H.: Generalized Minkowski Metrics for Mixed Feature Type Data Analysis. IEEE Transactions on Systems, Man and Cybernetics, 1994, 24, 4, 698-708.
  • 10. Bacelar-Nicolau H.: On the distribution equivalence in cluster analysis. Proceedings of the NATO ASI on Pattern Recognition Theory and Applications, Springer-Verlag, New York 1987, 73-79.
  • 11. Lerman I.C.: Étude distributionelle de statistiques de proximité entre structures algébriques finies du même type - Application à la Classification Automatique. Cahiers du B.U.R.O., Paris 1972, 19.
  • 12. Lerman I. C.: Classification et Analyse Ordinale des Données. Dunod, Paris 1981.
  • 13. Bacelar-Nicolau L.: Caracterização dos Sistemas de Informação das Organizaçŏes com base no modelo de Nolan. Aplicação de modelos de classificação hierárquica aos organismos da Administração Pública. Master Thesis, Univ. Nava de Lisboa 2002.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-article-BPZ3-0030-0018
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.