Tytuł artykułu
Autorzy
Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Języki publikacji
Abstrakty
Outliers are instances that deviate from the norm. In certain fields, their detection is crucial since they are often indicators of interesting events such as system faults and deliberate human actions. Anomaly detection is an essential data mining task that is employed in many real-life applications. The continuous development of anomaly detection algorithms is primarily motivated by the explosive growth in both size and number of attributes of the data sets. Such growth requires algorithms that can deal with large data sets with effectiveness and efficiency. Isolation Forest (IF) was introduced with that idea in mind. IF uses an isolation mechanism to detect outliers without relying on any distance or density measures. This approach handles large data sets quite well, thanks to its low time complexity. However, IF struggles to detect local outliers. In this work, a new algorithm called Cluster-Based Outlier Ensemble Approach (CBOEA) is proposed. This approach combines IF and Local Outlier Factor (LOF) outputs through a clustering algorithm called OPTICS to identify the clustering structure. This clustering technique allows the compensation of IF weaknesses while maintaining its strengths. The proposed algorithm is then compared to LOF and IF using two evaluation metrics. The performance with benchmark data sets shows that the proposed method is competitive with its components.
Słowa kluczowe
Rocznik
Tom
Strony
27--55
Opis fizyczny
Bibliogr. 31 poz., rys., tab.
Twórcy
autor
- Institut Supérieur de Gestion, Université de Tunis, 41 Avenue de la Liberté, Bouchoucha, Tunisie
autor
- Institut des HautesÉtudes Commerciales, Université de Sfax, Route Sidi Mansour Km 10 B.P 43-3061, Sfax, Tunisie, Laboratoire OLID (LR19ES21), Institut Supérieur de Gestion Indus-trielle, Université de Sfax, Tunisie
Bibliografia
- [1] Aggarwal C. C. and Sathe S. Outlier Ensembles - An Introduction. Springer, 2017.
- [2] Ankerst M., Breunig M. M., Kriegel H.-P., and Sander J. OPTICS: Ordering Points to Identify the Clustering Structure. ACM SIGMOD Record), 28(2): 49-60, 1999.
- [3] Aryal S., Ting K. M., Wells J. R., and Washio T. Improving iforest with relative mass. In Advances in Knowledge Discovery and Data Mining, pages 510-521. Springer International Publishing, Cham, 2014.
- [4] Atkinson A. C. Identification of outliers. Biometrics, 37(4): 860-861, 1981.
- [5] Bandaragoda T. R. Isolation based anomaly detection : A re-examination. PhD thesis, https://doi.org/10.4225/03/58b3b5353fab5, 2017.
- [6] Barbariol T. and Susto G. A. Tiws-iforest: Isolation forest in weakly supervised and tiny ml scenarios. Information Sciences, 610: 126-143, 2022.
- [7] Birant D. and Kut A. ST-DBSCAN: An algorithm for clustering spatial–temporal data. Data & Knowledge Engineering, 60(1): 208-221, 2007.
- [8] Borchers B. The art of computer programming, by D.E. Knuth. Scientific Programming, 14:267-268, 01 2006.
- [9] Breunig M. M., Kriegel H.-P., Ng R. T., and Sander J. LOF: Identifying density-based local outliers. ACM SIGMOD Record, 29(2): 93-104, 2000.
- [10] Buschjager S., Honysz P.-J., and Morik K. Randomized outlier detection with trees. International Journal of Data Science and Analytics, 13: 1-14, 03 2022.
- [11] Chandola V., Banerjee A., and Kumar V. Anomaly detection: A survey. ACM Computing Surveys, 41(3): 1-58, 2009.
- [12] Dong N., Ren B., Li H., Zhong X., Gong X., Han J., Lv J., and Cheng J. A novel anomaly score based on kernel density fluctuation factor for improving the local and clustered anomalies detection of isolation forests. Information Sciences, 637: 118979, 2023.
- [13] Ester M., Kriegel H.-P., Sander J., and Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD’96: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pages 226-231. AAAI Press, 1996.
- [14] Gao R., Zhang T., Sun S., and Liu Z. Research and improvement of isolation forest in detection of local anomaly points. Journal of Physics: Conference Series, 1237(5): 052023, 2019.
- [15] Goldstein M. and Dengel A. Histogram-based outlier score (hbos): A fast un-supervised anomaly detection algorithm. In Proceedings of the 35th German Conference on Artificial Intelligence, pages 59-63. 2012.
- [16] Goldstein M. and Uchida S. A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PLoS ONE, 11(4): e0152173, 2016.
- [17] Hariri S., Kind M. C., and Brunner R. J. Extended isolation forest. IEEE Transactions on Knowledge and Data Engineering, 33(4): 1479-1489, 2021.
- [18] Karczmarek P., Kiersztyn A., and Pedrycz W. Fuzzy set-based isolation forest. In 2020 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pages 1-6. 2020.
- [19] Kriegel H.-P., Kroger P., Schubert E., and Zimek A. Loop: Local outlier probabilities. In International Conference on Information and Knowledge Management, Proceedings, pages 1649-1652. 2009.
- [20] Kuna H., Garcia-Martinez R., and Villatoro F. R. Outlier detection in audit logs for application systems. Information Systems, 44: 22-33, 2014.
- [21] Lesouple J., Baudoin C., Spigai M., and Tourneret J.-Y. Generalized isolation forest for anomaly detection. Pattern Recognition Letters, 149: 109-119, 2021.
- [22] Library O. http://odds.cs.stonybrook.edu.
- [23] Liu F. T., Ting K. M., and Zhou Z.-H. Isolation forest. In ICDM’08: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, pages 413-422. IEEE Computer Society, United States, 2008.
- [24] Lyu Y., Li W., Wang Y., Sun S., and Wang C. RMHSForest: Relative Mass and Half-Space Tree Based Forest for Anomaly Detection. Chinese Journal of Electronics, 29(6): 1093-1101, 2020.
- [25] Mehrotra K. G., Mohan C. K., and Huang H. Anomaly Detection Principles and Algorithms. Springer Publishing Company, Incorporated, 1st edition, 2017.
- [26] Mensi A. and Bicego M. Enhanced anomaly scores for isolation forests. Pattern Recognition, 120: 108-115, 2021.
- [27] Mensi A., Tax D. M., and Bicego M. Detecting outliers from pairwise proximities: Proximity isolation forests. Pattern Recognition, 138: 109334, 2023.
- [28] Preiss B. R. Design patterns for the data structures and algorithms course. SIGCSE Bulletin (Association for Computing Machinery, Special Interest Group on Computer Science Education), 31: 95-99, 2002.
- [29] Tan X., Yang J., and Rahardja S. Sparse random projection isolation forest for outlier detection. Pattern Recognition Letters, 163: 65-73, 2022.
- [30] Tokovarov M. and Karczmarek P. A probabilistic generalization of isolation forest. Information Sciences, 584: 433-449, 2022.
- [31] Yepmo V., Smits G., and Pivert O. Anomaly explanation: A review. Data & Knowledge Engineering, 137: 101-946, 2022.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-767ade82-2100-4b8a-8d6a-fb98c485c96c
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.