Class Specific Fuzzy Decision Trees for Mining High Speed Data Streams

Hashem, S.; Kangavari, M.; Yang, Y.

Artykuł - szczegóły

Tytuł artykułu

Class Specific Fuzzy Decision Trees for Mining High Speed Data Streams

Autorzy

Hashem S. , Kangavari M. , Yang Y.

Wybrane pełne teksty z tego czasopisma

https://fi.episciences.org/

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

In recent years, classification learning for data streams has become an important and active research topic. A major challenge posed by data streams is that their underlying concepts can change over time, which requires current classifiers to be revised accordingly and timely. To detect concept change, a common method is to observe the online classification accuracy. If accuracy drops below some threshold value, a concept change is deemed to have taken place. An implicit assumption behind this methodology is that any drop in accuracy can be interpreted as a symptom of concept change. Unfortunately however, this assumption is often violated in the real world where data streams carry noise and missing values that can also introduce a significant reduction in classification accuracy. To compound this problem, traditional noise cleansing methods are not applicable to data streams. These methods normally need to scan data multiple times whereas learning in data streams can only afford one-pass scan because of data's high speed and huge volume. To solve these problems, this paper proposes a novel classification algorithm, Class Specific Fuzzy Decision Trees (CSFDT), which utilizes fuzzy logic to classify data streams. The base classifier of CSFDT is a binary fuzzy decision tree. Whenever the problem of concern contains q classes (q > 2), CSFDT learns one binary classifier for each class to distinguish instances of this class from instances of the remaining (q -1) classes. The CSFDT's advantages are three folds. First, it offers an adaptive structure to effectively and efficiently handle concept change. Second, it is robust to noise. Third, it deals with missing values in an elegant way. As a result, accuracy drop can be safely attributed to concept change. Extensive evaluations are conducted to compare CSFDT with representative existing data stream classification algorithms on a large variety of data. Experimental results suggest that CSFDT provides a significant benefit to data stream classification in real-world scenarios where concept changes, noise and missing values coexist.

Słowa kluczowe

Wydawca

IOS Press

Czasopismo

Fundamenta Informaticae

Rocznik

2008

Tom

Vol. 88, nr 1-2

Strony

135--160

Opis fizyczny

bibliogr. 33 poz., tab., wykr.

Twórcy

autor

Hashem S.

autor

Kangavari M.

autor

Yang Y.

Department of Computer Engineering Iran University of Science and Technology, Iran, s.hashemi@iust.ac.ir

Bibliografia

[1] Basak, J.: Online Adaptive Decision Trees: Pattern Classification and Function Approximation, Neural Comput., 18(9), 2006, 2062-2101, ISSN 0899-7667.
[2] Bhatt, R. B., Gopal, M.: Neuro-fuzzy decision trees, Internatioal Journal of Neural Systems, 16(1), 2006, 63-78.
[3] Cohen, W.: Fast effective rule induction, Proceedings of the 12th International Conference on Machine Learning (ICML), 1995.
[4] Domingos, P., Hulten, G.: Mining High Speed Data Streams, Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2000.
[5] Fayyad, U., Irani, K.: Multi-interval discretization of continuous-valued attributes for classification learning, 13th International Joint Conference of Artificial Intelligence.
[6] Furnkranz, J.: Round Robin Classification, Journal of Machine Learning Research, 2, 2002, 721-747.
[7] Hashemi, S., Yang, Y., Pourkashani, M., Kangavari, M.: To Better Handle Concept Change and Noise: A Cellular Automata Approach to Data Stream Classification, Australian Joint Conference on Artificial Intelligence 2007.
[8] Haykin, S.: Neural Networks: A Comprehensive Foundation, Prentice Hall PTR, Upper Saddle River, NJ, USA, 1994, ISBN 0023527617.
[9] Ho, S. S.: A Martingale Framework for Concept Change Detection in Time-Varying Data Streams, Proceedings of the 22nd International Conference on Machine Learning (ICML), 2005.
[10] Hsu, C., Lin, C.: A comparison of methods for multi-class support vector machines, IEEE Transactions on Neural Networks, 5, 2002, 415-425.
[11] Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams, Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2001.
[12] Janikow, C. Z.: Fuzzy decision trees: issues and methods, IEEE transactions on systems, man and cybernetics-part B : cybernetics, 28(1), February 1998, 1-14.
[13] Janikow, C. Z., Kawa, K.: Fuzzy Decision Tree FID, Annualmeeting of the north american fuzzy information processing society, IEEE, 2005.
[14] Kolter, J. Z., Maloof, M. A.: Dynamic Weighted Majority: A New Ensemble Method for Tracking Concept Drift, Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM), 2003.
[15] Maher, P. E., Clair, D. C. S.: Uncertain reasoning in an ID3 machine learning framework, 2nd IEEE international conference on fuzzy systems.
[16] Mitchell, T. M.: Machine Learning, McGraw Hill, 1997.
[17] Mitra, S., Konwar, K. M., Pal, S. K.: Fuzzy Decision Tree, Linguistic Rules and Fuzzy Knowledge-Based Network: Generation and Evaluation, IEEE transactions on systems, man and cybernetics-part C : applications and reviews, 32(4), November 2002, 328-339.
[18] Newman, D. J., Hettich, S., Blake, C., Merz, C.: UCI Repository of machine learning databases, 1998.
[19] Olaru, C., Wehenkel, L.: A complete fuzzy decision tree technique, Fuzzy Sets and Systems, 138, February 2003, 221-254.
[20] Quinlan, J. R.: Induction of decision trees, 1993, 349-361.
[21] Rifkin, R., Klautau, A.: In defense of one-vs-all classification, Journal of Machine Learning Research, 5, 2004, 101-141.
[22] Street,W. N., Kim, Y.: A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification, Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2001.
[23] Tsymbal, A.: The problem of concept drift: definitions and related work, 2004, Technical Report TCD-CS-2004-15, Computer Science Department, Trinity College Dublin, Ireland.
[24] Umanol, M., Okamoto, H., Hatono, I., Tamura, H., Kawachi, F., Umedzu, S., Kinoshita, J.: Fuzzy decision trees by fuzzy ID3 algorithm and its application to diagnosis systems, IEEE World Congress on Computational Intelligence.
[25] Wang, H., Fan, W., Yu, P. S., Han, J.: Mining Concept Drifting Data Streams using Ensemble Classifiers, Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2003.
[26] Wang, P., Wang, H., Wu, X., Wang, W., Shi, B.: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams, Proceedings of the 5th IEEE International Conference on Data Mining (ICDM), 2005.
[27] Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts, Machine Learning, 23, 1996, 69-101.
[28] Yang, Y., Wu, X., Zhu, X.: Combining Proactive and Reactive Predictions for Data Streams, Proceedings of the 11th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), ACM Press, 2005.
[29] Yang, Y., Wu, X., Zhu, X.: Mining in Anticipation for Concept Change: Proactive-Reactive Prediction in Data Streams, Data Mining and Knowledge Discovery, 13(3), 2006, 261-289.
[30] Zhao, J., Chang, Z.: Neuro-Fuzzy Decision Tree by Fuzzy ID3 Algorithm and Its Application to Anti-Dumping Early-Warning System, International Conference on Information Acquisition, IEEE, 2006.
[31] Zhu, X.,Wu, X.: Class Noise vs. Attribute Noise: A Quantitative Study, Artificial Intelligence Review, 22(3), November 2004, 177-210.
[32] Zhu, X., Wu, X., Chen, Q.: Eliminating class noise in large datasets, Proceedings of the 20th Internation Conference in Machine Learning (ICML), 2003.
[33] Zhu, X., Wu, X., Yang, Y.: Effective classification of noisy data streams with attribute-oriented dynamic classifier selection, Knowledge and Information Systems, 9(3), March 2006, 339-363.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BUS8-0003-0032