Tytuł artykułu
Autorzy
Identyfikatory
Warianty tytułu
Języki publikacji
Abstrakty
In various branches of science, e.g. medicine, economics, sociology, it is necessary to identify or detect outlying subsets of data. Suppose that the set of data is partitioned into many relatively small subsets and we have some reason to suspect that one or several of these subsets may be atypical or aberrant. We propose applying a new measure of separability, based on the ideas borrowed from the discriminant analysis. In our paper we define two versions of this measure, both using a jacknife, leave-one-out, estimator of classification error. If a suspected subset is significantly well separated from the main bulk of data, then we regard it as outlying. The usefulness of our algorithm is illustrated on a set of medical data collected in a large survey "Epidemiology of Allergic Diseases in Poland" (ECAP). We also tested our method on artificial data sets and on the classical IRIS data set. For a comparison, we report the results of a homogeneity test of Bartoszyński, Pearl and Lawrence, applied to the same data sets.
Czasopismo
Rocznik
Tom
Strony
693--709
Opis fizyczny
Bibliogr. 14 poz., wykr.
Twórcy
autor
autor
autor
autor
- Medical University of Warsaw, Department of Prevention of Environmental Hazards and Allergology, ul. Żwirki i Wigury 61, 02-091 Warszawa, Poland
Bibliografia
- BARNETT, V. and TOBY, L. (1994) Outliers in Statistical Data, 3rd ed. Wiley.
- BARTOSZYŃSKI, R., PEARL, D.K. AND LAWRENCE, J. (1997) A Multidimensional Goodness-of-Fit Test Based on Interpoint Distances. Journal of the American Statistical Association 92, 577-586.
- BECKER, R.A., CHAMBERS, J.M. AND WILKS, A.R. (1988) The New S Language. Wadsworth & Brooks/Cole.
- HAMPEL, F.R., RONCHETTI, E.M., ROUSSEEUW, P.J. and STAHEL, W.A. (1986) Robust Statistics: The Approach based on Influence Functions. John Wiley, New York.
- MARDIA, K.V., KENT, J.T. and BIBBY, J.M. (1979) Multivariate Analysis. Academic Press, London.
- MORRISON D.F. (1967) Multivariate Statistical Methods. Mc Graw Hill, New York.
- KORONACKI J. (2005) Statystyczne systemy uczące się. WNT, Warszawa.
- LACHENBRUCH, P.A. (1967) An Almost Unbiased Method of Obtaining Confidence Intervals for the Probability of Misclassification in Discriminant Analysis. Biometrics 23, 639-645.
- LACHENBRUCH, P.A. (1975) Discriminant Analysis. Hafner, New York.
- LACHENBRUCH, P.A. and MICKEY, M.R. (1968) Estimation of Error Rates in Discriminant Analysis. Technometrics 10, 1-11.
- RENZE, JOHN (no date) “Outlier.” From MathWorld - A Wolfram Web Resource, created by Eric W. Weisstein. http://mathworld.wolfram.com/Outlier.html
- RIPLEY, B.D. (1996) Pattern Recognition and Neural Networks. Cambridge University Press.
- VENABLES, W.N. and RIPLEY, B.D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.
- WATALA, C. (2002) Biostatystyka - wykorzystanie metod statystycznych w pracy badawczej w naukach biomedycznych, in Polish, @-medica press, Bielsko-Biała.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-article-BAT5-0033-0027