A quality index for detection of atypical elements (outliers)

Kulczycki, Piotr; Franus, Krystian; Charytanowicz, Małgorzata

doi:10.61822/amcs-2024-0031

Artykuł - szczegóły

Tytuł artykułu

A quality index for detection of atypical elements (outliers)

Autorzy

Kulczycki Piotr , Franus Krystian , Charytanowicz Małgorzata

Treść / Zawartość

Pełne teksty:

08_kulczycki_franus_charytanowicz_a_quality_index_for_detection_of_atypical_2024_3.pdf

Pobierz

Identyfikatory

DOI

10.61822/amcs-2024-0031

Warianty tytułu

Języki publikacji

Abstrakty

Besides clustering and classification, detection of atypical elements (outliers, rare elements) is one of the most fundamental problems in contemporary data analysis. However, contrary to clustering and classification, an atypical element detection task does not possess any natural quality (performance) index. The subject of the research presented here is the creation of one. It will enable not only evaluation of the results of a procedure for atypical element detection, but also optimization of its parameters or other quantities. The investigated quality index works particularly well with frequency types of such procedures, especially in the presence of substantial noise. Using a nonparametric approach in the design of this index practically frees the proposed method from the distribution in the dataset under examination. It may also be successfully applied to multimodal and multidimensional cases.

Słowa kluczowe

data analysis atypical element rare elements quality index

analiza danych element rzadki wskaźnik jakości

Wydawca

Oficyna Wydawnicza Uniwersytetu Zielonogórskiego

Czasopismo

International Journal of Applied Mathematics and Computer Science

Rocznik

2024

Tom

Vol. 34, no. 3

Strony

439--451

Opis fizyczny

Bibliogr. 34 poz., rys., tab., wykr.

Twórcy

autor

Kulczycki Piotr

kulczycki@agh.edu.pl

Faculty of Physics and Applied Computer Science, AGH University of Krakow, Mickiewicza 30, 30-059 Kraków, Poland
Systems Research Institute, Polish Academy of Sciences, Newelska 6, 01-447 Warsaw, Poland

autor

Franus Krystian

krystian.franus@ibspan.waw.pl

Systems Research Institute, Polish Academy of Sciences, Newelska 6, 01-447 Warsaw, Poland

autor

Charytanowicz Małgorzata

m.charytanowicz@pollub.pl

Systems Research Institute, Polish Academy of Sciences, Newelska 6, 01-447 Warsaw, Poland
Faculty of Electrical Engineering and Computer Science, Lublin University of Technology, Nadbystrzycka 36B, 20-618 Lublin, Poland

Bibliografia

[1] Aggarwal, C.C. (2013). Outlier Analysis, Springer, Cham.
[2] Agresti, A. (2002). Categorical Data Analysis,Wiley, Hoboken.
[3] Baszczyńska, A. (2016). Smoothing Parameter of the Density Functions for Random Variables in Economic Research, Lodz University Press, Łódź, (in Polish).
[4] Batool, F. and Hennig, C. (2021). Clustering with the average silhouette width, Computational Statistics and Data Analysis 158(6): 107190.
[5] Cateni, S., Colla, V. and Vannucci, M. (2008). Outlier detection methods for industrial applications, in J. Aramburo and A.R. Trevino (Eds), Advances in Robotics, Automation and Control, I-Tech, Vienna, pp. 265-282.
[6] Caltech (2024). NASA Exoplanet Archive, https://exoplanetarchive.ipac.caltech.edu/.
[7] Chacon, J.E. and Duong, T. (2020). Multivariate Kernel Smoothing and Its Applications, Chapman and Hall/CRC, Boca Raton.
[8] Charytanowicz, M., Kulczycki, P., Kowalski, P.A., Lukasik, S. and Czabak-Garbacz, R. (2018). An evaluation of utilizing geometric features for wheat grain classification using x-ray images, Computers and Electronics in Agriculture 144(1): 260-268.
[9] Charytanowicz, M., Perzanowski, K., Januszczak, M., Wołoszyn-Gałęza, A. and Kulczycki, P. (2020). Application of complete gradient clustering algorithm for analysis of wildlife spatial distribution, Ecological Indicators 113(6): 106216.
[10] Czmil, S., Kluska, J. and Czmil, A. (2024). An empirical study of a simple incremental classifier based on vector quantizzation and adaptive resonance theory, International Journal of Applied Mathematics and Computer Science 34(1): 149-165, DOI: 10.61822/amcs-2024-0011.
[11] Dalianis, H. (2018). Clinical Text Mining, Springer, Cham.
[12] Hodge, V. (2011). Outlier and Anomaly Detection: A Survey of Outlier and Anomaly Detection Methods, Lambert Academic Publishing, Saarbrucken.
[13] James, G., Witten, D., Hastie, T., Tibshirani, R. and Taylor, J. (2023). An Introduction to Statistical Learning, Springer, Cham.
[14] Kacprzyk, J. and Pedrycz, W. (2015). Springer Handbook of Computational Intelligence, Springer, Berlin.
[15] Kaggle (2024). Suicide rates overview 1985 to 2016, Dataset, http://www.kaggle.com/datasets/russellyates88/suicide-rates-overview-1985-to-2016.
[16] Kłopotek, R., Kłopotek, M. and Wierzchoń, S. (2020). A feasible k-means kernel trick under non-Euclidean feature space, International Journal of Applied Mathematics and Computer Science 30(4): 703-715, DOI: 10.34768/amcs-2020-0052.
[17] Knuth, D.E. (1988). Art of Computer Programming. Vol. 3: Sorting and Searching, Addison-Wesley, Upper Saddle River.
[18] Kulczycki, P. (2005). Kernel Estimators in Systems Analysis, Scientific and Engineering Publishers, Warsaw, (in Polish).
[19] Kulczycki, P. (2020). Methodically unified procedures for outlier detection, clustering and classification, in K. Arai (Ed.), Proceedings of the Future Technologies Conference (FTC), Springer, Cham, pp. 460-474.
[20] Kulczycki, P. and Franus, K. (2021). Methodically unified procedures for a conditional approach to outlier detection, clustering, and classification, Information Sciences 560: 504-527.
[21] Kulczycki, P. and Kruszewski, D. (2017). Identification of atypical elements by transforming task to supervised form with fuzzy and intuitionistic fuzzy evaluations, Applied Soft Computing 60(11): 623-633.
[22] Kulczycki, P. and Kruszewski, D. (2019). Detection of rare elements in investigation of medical problems, in N.T. Nguen et al., (Eds), Intelligent Information and Database Systems, Springer, Singapore, pp. 257-268.
[23] Lehmann, E.L. and Casella, G. (2011). Theory of Point Estimation, Springer, New York.
[24] Nisbet, R., Miner, G. and Yale, K. (2009). Handbook of Statistical Analysis and Data Mining Applications, Elsevier, London.
[25] Ott, R.L. and Longnecker, M.T. (2015). An Introduction to Statistical Methods and Data Analysis, Cengage, Boston.
[26] Pedrycz, W. and Chen, S.-M. (2017). Data Science and Big Data: An Environment of Computational Intelligence, Springer, Cham.
[27] Rajagopalan, B. and Lall, U. (1995). A kernel estimator for discrete distributions, Journal of Nonparametric Statistics 4(1): 409-426.
[28] Ranga Suri, N.N.R., Narasimha-Murty, M. and Athithan, G. (2019). Outlier Detection: Techniques and Applications, Springer, Cham.
[29] scikit-learn (2004). make_circles, Dataset, https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_circles.html.
[30] Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis, Chapman and Hall, London.
[31] Sorzano, C., Vargas, J. and Pascual-Montano, A. (2014). A survey of dimensionality reduction techniques, arXiv: 1403.2877v1.
[32] Wand, M.P. and Jones, M.C. (1995). Kernel Smoothing, Chapman and Hall, New York.
[33] Wasserman, L. (2004). All of Statistics: A Concise Course in Statistical Inference, Springer, New York.
[34] Yang, J., Tan, X. and Rahardja, S. (2023). Outlier detection: How to select k for k-nearest-neighbors-based outlier detectors, Pattern Recognition Letter 174: 112-117.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-408a4268-98f6-4ded-b7fe-cbd5b0bb64ed