Wyniki wyszukiwania - BazTech

1

Exploiting multi-core and many-core parallelism for subspace clustering

Datta Amitava, Kaur Amardeep, Lauer Tobias, Chabbouh Sami

International Journal of Applied Mathematics and Computer Science

|

2019

|

Vol. 29, no. 1

81--91

EN

Finding clusters in high dimensional data is a challenging research problem. Subspace clustering algorithms aim to find clusters in all possible subspaces of the dataset, where a subspace is a subset of dimensions of the data. But the exponential increase in the number of subspaces with the dimensionality of data renders most of the algorithms inefficient as well as ineffective. Moreover, these algorithms have ingrained data dependency in the clustering process, which means that parallelization becomes difficult and inefficient. SUBSCALE is a recent subspace clustering algorithm which is scalable with the dimensions and contains independent processing steps which can be exploited through parallelism. In this paper, we aim to leverage the computational power of widely available multi-core processors to improve the runtime performance of the SUBSCALE algorithm. The experimental evaluation shows linear speedup. Moreover, we develop an approach using graphics processing units (GPUs) for fine-grained data parallelism to accelerate the computation further. First tests of the GPU implementation show very promising results.

2

Subspace Memory Clustering

Struski Ł, Tabor J., Spurek P.

Schedae Informaticae

|

2015

|

Vol. 24

133--142

EN

We present a new subspace clustering method called SuMC (Subspace Memory Clustering), which allows to efficiently divide a dataset D RN into k  N pairwise disjoint clusters of possibly different dimensions. Since our approach is based on the memory compression, we do not need to explicitly specify dimensions of groups: in fact we only need to specify the mean number of scalars which is used to describe a data-point. In the case of one cluster our method reduces to a classical Karhunen-Loeve (PCA) transform. We test our method on some typical data from UCI repository and on data coming from real-life experiments.

3

Clustering in fuzzy subspaces

Simiński K.

Theoretical and Applied Informatics

|

2012

|

Vol. 24, No. 4

313-326

EN

Some data sets contain data clusters not in all dimension, but in subspaces. Known algorithms select attributes and identify clusters in subspaces. The paper presents a novel algorithm for subspace fuzzy clustering. Each data example has fuzzy membership to the cluster. Each cluster is defined in a certain subspace, but the the membership of the descriptors of the cluster to the subspace (called descriptor weight) is fuzzy (from interval [0; 1]) - the descriptors of the cluster can have partial membership to a subspace the cluster is defined in. Thus the clusters are fuzzy defined in their subspaces. The clusters are defined by their centre, fuzziness and weights of descriptors. The clustering algorithm is based on minimizing of criterion function. The paper is accompanied by the experimental results of clustering. This approach can be used for partition of input domain in extraction rule base for neuro-fuzzy systems.

PL

Niektóre dane zawierają grupy danych nie we wszystkich wymiarach, ale w pewnych podprzestrzeniach dziedziny. Artykuł przedstawia algorytm grupowania danych w rozmytych podprzestrzeniach. Każdy przykład danych ma pewną rozmytą przynależność do grupy (klastra). Każdy klaster z kolei jest rozpięty w pewnej podprzestrzeni dziedziny wejściowej. Klastry mogą być rozpięte w różnych podprzestrzeniach. Algorytm grupowania oparty jest na minimalizacji funkcji kryterialnej. W wyniku jego działania wypracowane są położenia klastrów, ich rozmycie i wagi ich deskryptorów. Przestawiono także wyniki eksperymentów grupowania danych syntetycznych i rzeczywistych

4

Mining Outliers in Correlated Subspaces for High Dimensional Data Sets

Leng J., Hong T-P.

Fundamenta Informaticae

|

2010

|

Vol. 98, nr 1

71-86

EN

Outlier detection in high dimensional data sets is a challenging data mining task. Mining outliers in subspaces seems to be a promising solution, because outliers may be embedded in some interesting subspaces. Searching for all possible subspaces can lead to the problem called "the curse of dimensionality". Due to the existence of many irrelevant dimensions in high dimensional data sets, it is of paramount importance to eliminate the irrelevant or unimportant dimensions and identify interesting subspaces with strong correlation. Normally, the correlation among dimensions can be determined by traditional feature selection techniques or subspace-based clustering methods. The dimension-growth subspace clustering techniques can find interesting subspaces in relatively lower dimension spaces, while dimension-reduction approaches try to group interesting subspaces with larger dimensions. This paper aims to investigate the possibility of detecting outliers in correlated subspaces. We present a novel approach by identifying outliers in the correlated subspaces. The degree of correlation among dimensions is measured in terms of the mean squared residue. In doing so, we employ a dimension-reduction method to find the correlated subspaces. Based on the correlated subspaces obtained, we introduce another criterion called "shape factor" to rank most important subspaces in the projected subspaces. Finally, outliers are distinguished from most important subspaces by using classical outlier detection techniques. Empirical studies show that the proposed approach can identify outliers effectively in high dimensional data sets.