PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Metrics and similarities in modeling dependencies between continuous and nominal data

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Classification theory analytical paradigm investigates continuous data only. When we deal with a mix of continuous and nominal attributes in data records, difficulties emerge. Usually, the analytical paradigm treats nominal attributes as continuous ones via numerical coding of nominal values (often a bit ad hoc). We propose a way of keeping nominal values within analytical paradigm with no pretending that nominal values are continuous. The core idea is that the information hidden in nominal values influences on metric (or on similarity function) between records of continuous and nominal data. Adaptation finds relevant parameters which influence metric between data records. Our approach works well for classifier induction algorithms where metric or similarity is generic, for instance k nearest neighbor algorithm or proposed here support of decision tree induction by similarity function between data. The k-nn algorithm working with continuous and nominal data behaves considerably better, when nominal values are processed by our approach. Algorithms of analytical paradigm using linear and probability machinery, like discriminant adaptive nearest-neighbor or Fisher’s linear discriminant analysis, cause some difficulties. We propose some possible ways to overcome these obstacles for adaptive nearest neighbor algorithm.
Rocznik
Tom
Strony
25--37
Opis fizyczny
Bibliogr. 10 poz., rys. wykr.
Twórcy
autor
  • Warsaw School of Computer Science, Warsaw
autor
  • University of Warmia and Mazury, Olsztyn
Bibliografia
  • [1] Hung Son Nguyen, Approximate Boolean Reasoning: Foundations and Applications in Data Mining, in: Transactions on Rough Sets V, (eds.) Peters J.F., Skowron A., LNCS 4100, 2006
  • [2] Koronacki J., Ćwik J., Statystyczne systemy uczące się, Statistical Learning (in polish), Akademicka Oficyna Wydawnicza EXIT, Warszawa 2008
  • [3] Linial N., Finite metric spaces – Combinatorics, Geometry and Algorithms, Symposium on Computational Geometry, 2002
  • [4] Indyk P. , Matousek J., Low-Distortion Embedding of Finite Metric Spaces, Handbook of Discrete and Computational Geometry (2nd edition), (eds.) Goodman J.E., O’Rourke J., CRC Press, LLC 2004
  • [5] Zahorski J., private communication
  • [6] Doherty P., Łukaszewicz W., Skowron A., Szałas A., Knowledge Representation Techniques. A rough set approach, “Studies in Fuzziness and Soft Computing” 202, Springer-Verlag 2006
  • [7] Lopez de Mantaras R., A Distance–Based Attribute Selection Measure for Decision Tree Induction, “Machine Learning” 1991, Vol. 6
  • [8] Hastie T., Tibshirani R., Friedman J., The Elements of Statistical Learning, Springer Series in Statistics, 2001
  • [9] Frank A., Asuncion A., UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Science, 2010, http://archive.ics.uci.edu/ml
  • [10] Hastie T., Tibshirani R., Discriminant Adaptive Nearest Neighbor Classification, “IEEE Pattern Recognition and Machine Intelligence”, Vol. 18, No. 6
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-5851a07c-fa11-46fc-9bd5-2cf39dd3bbd8
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.