PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

A novel method for drift detection in streaming data based on measurement of changes in feature ranks

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Hidden changes in the data stream are unknown to learning algorithms and are referred to in the literature as drifts of various types. The accuracy of the classifier may degrade due to the occurrence of drift in non-stationary data streams. In such situations, the classifier must detect significant data changes and adjust its predictions. This article aims to present a new method of drift detection based on analyzing changes in feature ranks across adjacent chunks of data. The proposed strategy involves determining the ranking of the most important feature and tracking its fluctuations within the chunks into which the input data stream is divided. Changes in feature rankings between adjacent chunks serve as symptoms of data drift. The Least Absolute Shrinkage and Selection Operator (LASSO) procedure was proposed as an efficient rank pointer. We compared well-known and popular drift detection algorithms, such as the Drift Detection Method (DDM), Early Drift Detection Method (EDDM), ADaptive WINdowing (ADWIN), and Principal Component Analysis Feature Drift Detection (PCA-FDD), with our approach in comparative studies. The tests were conducted on different artificial data streams (sudden, gradual, recurring, and incremental) as well as real data. Comparative studies were performed on both two-class and multi-class datasets. The experiments confirm that the proposed feature drift detection strategy produces valuable results.
Rocznik
Strony
147--166
Opis fizyczny
Bibliogr. 32 poz., rys.
Twórcy
autor
  • Institute of Computer Science, Faculty of Science and Technology, University of Silesia, ul. Bedzinska 39, 41-200 Sosnowiec, Poland
  • Institute of Computer Science, Faculty of Science and Technology, University of Silesia, ul. Bedzinska 39, 41-200 Sosnowiec, Poland
  • Institute of Computer Science, Faculty of Science and Technology, University of Silesia, ul. Bedzinska 39, 41-200 Sosnowiec, Poland
  • Institute of Computer Science, Faculty of Science and Technology, University of Silesia, ul. Bedzinska 39, 41-200 Sosnowiec, Poland
Bibliografia
  • [1] Husheng Guo, Hai Li, Qiaoyan Ren, and Wenjian Wang. Concept drift type identification based on multi-sliding windows. Information Sciences, 585:1–23, 2022.
  • [2] Piotr Porwik and Rafal Doroz. Adaptation of the idea of concept drift to some behavioral biometrics: Preliminary studies. Engineering Applications of Artificial Intelligence, 99:104135, 2021.
  • [3] Thomas Bartz-Beielstein and Lukas Hans. Drift detection and handling. In Eva Bartz and Thomas Bartz-Beielstein, editors, Online Machine Learning: A Practical Guide with Examples in Python, pages 23–39, Singapore, 2024. Springer Nature Singapore.
  • [4] Jo˜ao Gama, Indr˙eŽliobait˙e, Albert Bifet, Mykola Pechenizkiy, and A. Bouchachia. A survey on concept drift adaptation. ACM Computing Surveys, 46:1 – 37, 2014.
  • [5] Supriya Agrahari and Anil Kumar Singh. Adaptive pca-based feature drift detection using statistical measure. Cluster Computing, 25(6):4481–4494, 2022.
  • [6] Paulo M. Gonçalves, Silas G.T. de Carvalho Santos, Roberto S.M. Barros, and Davi C.L. Vieira. A comparative study on concept drift detectors. Expert Systems with Applications, 41(18):8144–8156, 2014.
  • [7] Ruba Abu Khurma, Ibrahim Aljarah, Ahmad Sharieh, Mohamed Abd Elaziz, Robertas Damaševičius, and Tomas Krilavičius. A review of the modification strategies of the nature inspired algorithms for feature selection problem. Mathematics, 10(3):464, 2022.
  • [8] Isabelle Guyon and André Elisseeff. An introduction to variable and feature selection. J. Mach. Learn. Res., 3:1157–1182, 2003.
  • [9] Geoffrey I. Webb, Roy Hyde, Hong Cao, Hai Long Nguyen, and Francois Petitjean. Characterizing concept drift. Data Mining and Knowledge Discovery, 30(4):964–994, 2016.
  • [10] Hang Yu, Qingyong Zhang, Tianyu Liu, Jie Lu, Yimin Wen, and Guangquan Zhang. Meta-add: A meta-learning based pre-trained model for concept drift active detection. Information Sciences, 608:996–1009, 2022.
  • [11] Lei Yu and Huan Liu. Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research, 5:1205–1224, 2004.
  • [12] Jan Niklas Adams, Sebastiaan J. van Zelst, Thomas Rose, and Wil M.P. van der Aalst. Explainable concept drift in process mining. Information Systems, 114:102177, 2023.
  • [13] Hang Yu, Weixu Liu, Jie Lu, Yimin Wen, Xiangfeng Luo, and Guangquan Zhang. Detecting group concept drift from multiple data streams. Pattern Recognition, 134:109113, 2023.
  • [14] Supriya Agrahari and Anil Kumar Singh. Concept drift detection in data stream mining: A literature review. Journal of King Saud University -Computer and Information Sciences, 34(10, Part B):9523–9540, 2022.
  • [15] Mahmood Karimian and Hamid Beigy. Concept drift handling: A domain adaptation perspective. Expert Systems with Applications, 224:119946, 2023.
  • [16] Andrés L. Suárez-Cetrulo, David Quintana, and Alejandro Cervantes. A survey on machine learning for recurring concept drifting data streams. Expert Systems with Applications, 213:118934, 2023.
  • [17] Firas Bayram, Bestoun S. Ahmed, and Andreas Kassler. From concept drift to model degradation: An overview on performance-aware drift detectors. Knowledge-Based Systems, 245:108632, 2022.
  • [18] Lin Sun, Tianxiang Wang, Weiping Ding, Jiucheng Xu, and Yaojin Lin. Feature selection using fisher score and multilabel neighborhood rough sets for multilabel classification. Information Sciences, 578:887–912, 2021.
  • [19] Frank S. Corotto. Chapter nine - the two-sample t test and the importance of pooled variance. In Frank S. Corotto, editor, Wise Use of Null Hypothesis Tests, pages 95–98. Academic Press, 2023.
  • [20] Piotr Porwik and Benjamin Mensah Dadzie. Detection of data drift in a two-dimensional stream using the Kolmogorov-Smirnov test. Procedia Computer Science, 207:168–175, 2022. Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 26th International Conference KES2022.
  • [21] Toshiyuki Sueyoshi and Shingo Aoki. A use of a nonparametric statistic for dea frontier shift: the Kruskal and Wallis rank test. Omega, 29(1):1–18, 2001.
  • [22] Baoshuang Zhang, Yanying Li, and Zheng Chai. A novel random multi-subspace based relieff for feature selection. Knowledge-Based Systems, 252:109400, 2022.
  • [23] Jacob Goldberger, Sam Roweis, Geoff Hinton, and Ruslan Salakhutdinov. Neighbourhood components analysis. In Proceedings of the 17th International Conference on Neural Information Processing Systems, NIPS’04, page 513–520, Cambridge, MA, USA, 2004. MIT Press.
  • [24] Xue-wen Chen and Jong Cheol Jeong. Enhanced recursive feature elimination. In Sixth International Conference on Machine Learning and Applications (ICMLA 2007), pages 429–435, 2007.
  • [25] Nickolay Trendafilov and Michele Gallo. Pca and other dimensionality-reduction techniques. In Robert J Tierney, Fazal Rizvi, and Kadriye Ercikan, editors, International Encyclopedia of Education (Fourth Edition), pages 590–599. Elsevier, Oxford, fourth edition edition, 2023.
  • [26] Leo Breiman. Random forests. Mach. Learn., 45(1):5–32, 2001.
  • [27] Bradley Efron, Trevor Hastie, Iain Johnstone, and Robert Tibshirani. Least angle regression. The Annals of Statistics, 32(2):407–451, 2004.
  • [28] Yvan Saeys, I˜naki Inza, and Pedro Larra˜naga. A review of feature selection techniques in bioinformatics. Bioinformatics, 23(19):2507–2517, 08 2007.
  • [29] Qiuming Zhu. On the performance of matthews correlation coefficient (mcc) for imbalanced dataset. Pattern Recognition Letters, 136:71–80, 2020.
  • [30] Davide Chicco, Niklas Tötsch, and Giuseppe Jurman. The matthews correlation coefficient (mcc) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation. BioData Mining, 14, 2021.
  • [31] Vinícius M. A. de Souza, Denis Moreira dos Reis, André Gustavo Maletzke, and Gustavo E. A. P. A. Batista. Challenges in benchmarking stream learning algorithms with real-world data. Data Mining and Knowledge Discovery, 34:1805–1858, 2020.
  • [32] Yaohui Zeng and Patrick Breheny. The biglasso package: A memory- and computation-efficient solver for lasso model fitting with big data in r. The R Journal, 12, 01 2017.
Uwagi
Opracowanie rekordu ze środków MNiSW, umowa nr POPUL/SP/0154/2024/02 w ramach programu "Społeczna odpowiedzialność nauki II" - moduł: Popularyzacja nauki (2025).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-c11a264a-5446-4678-baf0-2b883fe03581
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.