Multidimensional data is often feature space heterogeneous so that individual features have unequal importance in different sub areas of the feature space. This motivates to search for a technique that provides a strategic splitting of the instance space being able to identify the best subset of features for each instance to be classified. Our technique applies the wrapper approach where a classification algorithm is used as an evaluation function to differentiate between different feature subsets. In order to make the feature selection local, we apply the recent technique for dynamic integration of classifiers. This allows to determine which classifier and which feature subset should be used for each new instance. Decision trees are used to help to restrict the number of feature combinations analyzed. For each new instance we consider only those feature combinations that include the features present in the path taken by the new instance in the decision tree built on the whole feature set. We evaluate our technique on data sets from the UCI machine learning repository. In our experiments, we use the C4.5 algorithm as the learning algorithm for base classifiers and for the decision trees that guide the local feature selection. The experiments show some advantages of the local feature selection with dynamic integration of classifiers in comparison with the selection of one feature subset for the whole space.
2
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
"The curse of dimensionality" is pertinent to many learning algorithms, and it denotes the drastic increase of computational complexity and classification error in high dimensions. In this paper, principal component analysis (PCA). parametric feature extraction (FE) based on Fisher's linear discriminant analysis (LDA), and their combination as means of dimensionality reduction are analysed with respect to the performance of different classifiers. Three commonly used classifiers are taken for analysis: ŁNN, Naive Bayes and C4.5 decision tree. Recently, it has been argued that it is extremely important to use class information in FE for supervised learning (SL). However, LDA-based FE, although using class information, has a serious shortcoming due to its parametric nature. Namely, the number of extracted components cannot be more that the number of classes minus one. Besides, as it can be concluded from its name, LDA works mostly for linearly separable classes only. In this paper we study if it is possible to overcome these shortcomings adding the most significant principal components to the set of features extracted with LDA. In experiments on 21 benchmark datasets from UCI repository these two approaches (PCA and LDA) are compared with each other, and with their combination, for each classifier. Our results demonstrate that such a combination approach has certain potential, especially when applied for C4.5 decision tree learning. However, from the practical point of view the combination approach cannot be recommended for Naive Bayes since its behavior is very unstable on different datasets.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.