Traditional clustering algorithms which use distance between a pair of data points to calculate their similarity are not suitable for clustering of boolean and categorical attributes. In this paper, a modified clustering algorithm for categorical attributes is used for segmentation of customers. Each segment is then mined using frequent pattern mining algorithm in order to infer rules that helps in predicting customer’s next purchase. Generally, purchases of items are related to each other, for example, grocery items are frequently purchased together while electronic items are purchased together. Therefore, if the knowledge of purchase dependencies is available, then those items can be grouped together and attractive offers can be made for the customers which, in turn, increase overall profit of the organization. This work focuses on grouping of such items. Various experiments on real time database are implemented to evaluate the performance of proposed approach.
Logit and probit models belong to the class of generalised linear models. A few applications of both models have been documented in the field of forestry. The objective of this paper was to test the parallel use of these models to discover the differences in damage to a spruce stand after thinning using the full tree system, the long wood system and the short wood system. In particular the aim was to ascertain the general damage probability caused by the harvesting systems (HS) and the particular damage class probability in each HS. When the general damage probability was calculated the logit model was used. When nine damage classes were taken into account, however, the probit model was found to fit the data better. In this case, the results obtained gave accurate information on the probability of the appearance of a particular damage class for each HS. It was concluded that the probit and logit models should be considered in parallel in order to obtain the best possible goodness of fit and to get accurate information on the distribution of damage classes.
3
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
Most of the earlier work on clustering has mainly been focused on numerical data whose inherent geometric properties can be exploited to naturally define distance functions between data points. Recently, the problem of clustering categorical data has started drawing interest. However, the computational cost makes most of the previous algorithms unacceptable for clustering very large databases. The k-means algorithm is well known for its efficiency in this respect. At the same time, working only on numerical data prohibits them from being used for clustering categorical data. The main contribution of this paper is to show how to apply the notion of "cluster centers'' on a dataset of categorical objects and how to use this notion for formulating the clustering problem of categorical objects as a partitioning problem. Finally, a k-means-like algorithm for clustering categorical data is introduced. The clustering performance of the algorithm is demonstrated with two well-known data sets, namely, soybean disease and nursery databases.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.