Tytuł artykułu
Treść / Zawartość
Pełne teksty:
Identyfikatory
Warianty tytułu
Języki publikacji
Abstrakty
Besides clustering and classification, detection of atypical elements (outliers, rare elements) is one of the most fundamental problems in contemporary data analysis. However, contrary to clustering and classification, an atypical element detection task does not possess any natural quality (performance) index. The subject of the research presented here is the creation of one. It will enable not only evaluation of the results of a procedure for atypical element detection, but also optimization of its parameters or other quantities. The investigated quality index works particularly well with frequency types of such procedures, especially in the presence of substantial noise. Using a nonparametric approach in the design of this index practically frees the proposed method from the distribution in the dataset under examination. It may also be successfully applied to multimodal and multidimensional cases.
Rocznik
Tom
Strony
439--451
Opis fizyczny
Bibliogr. 34 poz., rys., tab., wykr.
Twórcy
autor
- Faculty of Physics and Applied Computer Science, AGH University of Krakow, Mickiewicza 30, 30-059 Kraków, Poland
- Systems Research Institute, Polish Academy of Sciences, Newelska 6, 01-447 Warsaw, Poland
autor
- Systems Research Institute, Polish Academy of Sciences, Newelska 6, 01-447 Warsaw, Poland
autor
- Systems Research Institute, Polish Academy of Sciences, Newelska 6, 01-447 Warsaw, Poland
- Faculty of Electrical Engineering and Computer Science, Lublin University of Technology, Nadbystrzycka 36B, 20-618 Lublin, Poland
Bibliografia
- [1] Aggarwal, C.C. (2013). Outlier Analysis, Springer, Cham.
- [2] Agresti, A. (2002). Categorical Data Analysis,Wiley, Hoboken.
- [3] Baszczyńska, A. (2016). Smoothing Parameter of the Density Functions for Random Variables in Economic Research, Lodz University Press, Łódź, (in Polish).
- [4] Batool, F. and Hennig, C. (2021). Clustering with the average silhouette width, Computational Statistics and Data Analysis 158(6): 107190.
- [5] Cateni, S., Colla, V. and Vannucci, M. (2008). Outlier detection methods for industrial applications, in J. Aramburo and A.R. Trevino (Eds), Advances in Robotics, Automation and Control, I-Tech, Vienna, pp. 265-282.
- [6] Caltech (2024). NASA Exoplanet Archive, https://exoplanetarchive.ipac.caltech.edu/.
- [7] Chacon, J.E. and Duong, T. (2020). Multivariate Kernel Smoothing and Its Applications, Chapman and Hall/CRC, Boca Raton.
- [8] Charytanowicz, M., Kulczycki, P., Kowalski, P.A., Lukasik, S. and Czabak-Garbacz, R. (2018). An evaluation of utilizing geometric features for wheat grain classification using x-ray images, Computers and Electronics in Agriculture 144(1): 260-268.
- [9] Charytanowicz, M., Perzanowski, K., Januszczak, M., Wołoszyn-Gałęza, A. and Kulczycki, P. (2020). Application of complete gradient clustering algorithm for analysis of wildlife spatial distribution, Ecological Indicators 113(6): 106216.
- [10] Czmil, S., Kluska, J. and Czmil, A. (2024). An empirical study of a simple incremental classifier based on vector quantizzation and adaptive resonance theory, International Journal of Applied Mathematics and Computer Science 34(1): 149-165, DOI: 10.61822/amcs-2024-0011.
- [11] Dalianis, H. (2018). Clinical Text Mining, Springer, Cham.
- [12] Hodge, V. (2011). Outlier and Anomaly Detection: A Survey of Outlier and Anomaly Detection Methods, Lambert Academic Publishing, Saarbrucken.
- [13] James, G., Witten, D., Hastie, T., Tibshirani, R. and Taylor, J. (2023). An Introduction to Statistical Learning, Springer, Cham.
- [14] Kacprzyk, J. and Pedrycz, W. (2015). Springer Handbook of Computational Intelligence, Springer, Berlin.
- [15] Kaggle (2024). Suicide rates overview 1985 to 2016, Dataset, http://www.kaggle.com/datasets/russellyates88/suicide-rates-overview-1985-to-2016.
- [16] Kłopotek, R., Kłopotek, M. and Wierzchoń, S. (2020). A feasible k-means kernel trick under non-Euclidean feature space, International Journal of Applied Mathematics and Computer Science 30(4): 703-715, DOI: 10.34768/amcs-2020-0052.
- [17] Knuth, D.E. (1988). Art of Computer Programming. Vol. 3: Sorting and Searching, Addison-Wesley, Upper Saddle River.
- [18] Kulczycki, P. (2005). Kernel Estimators in Systems Analysis, Scientific and Engineering Publishers, Warsaw, (in Polish).
- [19] Kulczycki, P. (2020). Methodically unified procedures for outlier detection, clustering and classification, in K. Arai (Ed.), Proceedings of the Future Technologies Conference (FTC), Springer, Cham, pp. 460-474.
- [20] Kulczycki, P. and Franus, K. (2021). Methodically unified procedures for a conditional approach to outlier detection, clustering, and classification, Information Sciences 560: 504-527.
- [21] Kulczycki, P. and Kruszewski, D. (2017). Identification of atypical elements by transforming task to supervised form with fuzzy and intuitionistic fuzzy evaluations, Applied Soft Computing 60(11): 623-633.
- [22] Kulczycki, P. and Kruszewski, D. (2019). Detection of rare elements in investigation of medical problems, in N.T. Nguen et al., (Eds), Intelligent Information and Database Systems, Springer, Singapore, pp. 257-268.
- [23] Lehmann, E.L. and Casella, G. (2011). Theory of Point Estimation, Springer, New York.
- [24] Nisbet, R., Miner, G. and Yale, K. (2009). Handbook of Statistical Analysis and Data Mining Applications, Elsevier, London.
- [25] Ott, R.L. and Longnecker, M.T. (2015). An Introduction to Statistical Methods and Data Analysis, Cengage, Boston.
- [26] Pedrycz, W. and Chen, S.-M. (2017). Data Science and Big Data: An Environment of Computational Intelligence, Springer, Cham.
- [27] Rajagopalan, B. and Lall, U. (1995). A kernel estimator for discrete distributions, Journal of Nonparametric Statistics 4(1): 409-426.
- [28] Ranga Suri, N.N.R., Narasimha-Murty, M. and Athithan, G. (2019). Outlier Detection: Techniques and Applications, Springer, Cham.
- [29] scikit-learn (2004). make_circles, Dataset, https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_circles.html.
- [30] Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis, Chapman and Hall, London.
- [31] Sorzano, C., Vargas, J. and Pascual-Montano, A. (2014). A survey of dimensionality reduction techniques, arXiv: 1403.2877v1.
- [32] Wand, M.P. and Jones, M.C. (1995). Kernel Smoothing, Chapman and Hall, New York.
- [33] Wasserman, L. (2004). All of Statistics: A Concise Course in Statistical Inference, Springer, New York.
- [34] Yang, J., Tan, X. and Rahardja, S. (2023). Outlier detection: How to select k for k-nearest-neighbors-based outlier detectors, Pattern Recognition Letter 174: 112-117.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-408a4268-98f6-4ded-b7fe-cbd5b0bb64ed