Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników

Znaleziono wyników: 19

Liczba wyników na stronie
first rewind previous Strona / 1 next fast forward last
Wyniki wyszukiwania
Wyszukiwano:
w słowach kluczowych:  data clustering
help Sortuj według:

help Ogranicz wyniki do:
first rewind previous Strona / 1 next fast forward last
1
Content available A quaternion clustering framework
EN
Data clustering is one of the most popular methods of data mining and cluster analysis. The goal of clustering algorithms is to partition a data set into a specific number of clusters for compressing or summarizing original values. There are a variety of clustering algorithms available in the related literature. However, the research on the clustering of data parametrized by unit quaternions, which are commonly used to represent 3D rotations, is limited. In this paper we present a quaternion clustering methodology including an algorithm proposal for quaternion based k-means along with quaternion clustering quality measures provided by an enhancement of known indices and an automated procedure of optimal cluster number selection. The validity of the proposed framework has been tested in experiments performed on generated and real data, including human gait sequences recorded using a motion capture technique.
EN
A method for assessing separability of EEG signals associated with three classes of brain activity is proposed. The EEG signals are acquired from 23 subjects, gathered from a headset consisting of 14 electrodes. Data are processed by applying Discrete Wavelet Transform (DWT) for the signal analysis and an autoencoder neural network for the brain activity separation. Processing involves 74 wavelets from 3 DWT families: Coiflets, Daubechies and Symlets. Euclidean distance between clusters normalized with respect to the standard deviation of the whole set of data are used to separate each task performed by participants. The results of this stage allow for an assessment of separability between subsets of data associated with each activity performed by experiment participants. The speed of convergence of the training process employing deep learning-based clustering is also measured.
3
Content available Linguistically defined clustering of data
EN
This paper introduces a method of data clustering that is based on linguistically specified rules, similar to those applied by a human visually fulfilling a task. The method endeavors to follow these remarkable capabilities of intelligent beings. Even for most complicated data patterns a human is capable of accomplishing the clustering process using relatively simple rules. His/her way of clustering is a sequential search for new structures in the data and new prototypes with the use of the following linguistic rule: search for prototypes in regions of extremely high data densities and immensely far from the previously found ones. Then, after this search has been completed, the respective data have to be assigned to any of the clusters whose nuclei (prototypes) have been found. A human again uses a simple linguistic rule: data from regions with similar densities, which are located exceedingly close to each other, should belong to the same cluster. The goal of this work is to prove experimentally that such simple linguistic rules can result in a clustering method that is competitive with the most effective methods known from the literature on the subject. A linguistic formulation of a validity index for determination of the number of clusters is also presented. Finally, an extensive experimental analysis of benchmark datasets is performed to demonstrate the validity of the clustering approach introduced. Its competitiveness with the state-of-the-art solutions is also shown.
EN
Most geolocation applications for mobile devices assume a constant connection with the network and high computational power nodes. However, with ever-developing devices it now becomes possible to establish peer-to-peer networks in case when the network can be unreachable due to special circumstances (like conflicts or natural disasters). In this paper, a method for clustering spatial data in mobile environment is discussed. A simple solution based on OPTICS algorithm with lexical distance is proposed for grouping the observations.
5
Content available Mathematical aspects of ranking theory
EN
The paper covers the theoretical grounds for defining of rankings, basing on the terms taken from the relation space theory. One presented an array of new definitions which allow establishing rankings without the necessity of using typical ranking functions. Moreover, one introduced the term precedence ranking relation (not necessarily order relation), and demonstrated general algorithms to establish rankings on the basis of definitions of extreme elements.
PL
W pracy przedstawiono podstawy teoretyczne definiowania rankingów, bazujące na pojęciach teorii zbiorów i relacji. Zaprezentowano szereg nowych definicji pozwalających budować rankingi bez konieczności korzystania z typowych funkcji rankingowych. Wprowadzono pojęcie relacji rankingowego poprzedzania (niekoniecznie porządku) oraz przedstawiono ogólne algorytmy pozwalające budować rankingi w oparciu o definicje elementów ekstremalnych.
EN
The paper presents the possibility of using Recurrent Pareto Filter (RPF) to the categorization procedures of objects (data). The paper presents a new implementation of the RPF algorithm, that uses lexicographical sorting objects and binary search Pareto optimal elements. The functioning of the algorithm illustrated by an example categorization procedure of scientific journals contained in the Scimago Scientific Journals Base.
PL
W pracy przedstawiono możliwość wykorzystania Rekurencyjnego Filtra Pareto (RPF) w procedurach kategoryzacji obiektów (danych). Przedstawiono nową implementację algorytmu RPF, wykorzystującą leksykograficzne sortowanie obiektów i binarne poszukiwanie elementów optymalnych w sensie Pareto (LBS). Funkcjonowanie algorytmu zilustrowano przykładem z obszaru kategoryzacji czasopism naukowych zawartych w Bazie Scimago Scientific Journals.
7
Content available remote Ocena segmentacji rynku za pomocą miar jakości grupowania danych
PL
Celem niniejszego artykułu jest przedstawienie miar służących do badania jakości grupowania danych i zastosowanie tych miar do oceny segmentacji rynku. W wykonanych badaniach analizowano dane dotyczące rynków zbytu przedsiębiorstwa produkującego wyroby gospodarstwa domowego. Segmentację rynku przeprowadzono z wykorzystaniem sieci neuronowych Kohonena. W pracy przedstawiono wyniki grupowania danych oraz ich ocenę. Wnioski na temat jakości utworzonych klastrów są próbą ogólnej oceny przeprowadzonej segmentacji rynku.
EN
The purpose of this paper is to present the measures used to evaluate the quality of data clustering and apply them to assess market segmentation. In the analysis the data of manufacturing companies that producing household products was used. The market segmentation was carried out using Kohonen neural network. This paper describes results of the clustering and evaluation of the clusters. The conclusions on the quality of clusters are attempt to overall assessment of the market segmentation.
EN
The paper presents a method of choosing the information technology system, the task of which is to support the management process of the military aircraft operation. The proposed method is based on surveys conducted among direct users of IT systems used in aviation of the Polish Armed Forces. The analysis of results of the surveys was conducted using statistical methods. The paper was completed with practical conclusions related to further usefulness of the individual information technology systems. In the future, they can be extremely useful in the process of selecting the best solutions and integration of the information technology systems.
9
Content available remote Zastosowanie metod eksploracji danych do segmentacji rynków
PL
Celem niniejszego artykułu są przedstawienie i ocena możliwości wykorzystania metod eksploracji danych do segmentacji rynków zbytu. Przedstawiono segmentacje opisową i predykcyjną oraz przeanalizowano wyniki rozwiązywania zadań klasyfikacji i grupowania danych za pomocą sieci neuronowych Kohonena oraz drzew klasyfikacyjnych CART i CHAID. W pracy wykorzystano dane dotyczące rynków zbytu przedsiębiorstwa produkującego wyroby gospodarstwa domowego.
EN
The purpose of this paper is to present and evaluate the possibility of using data mining methods in the market segmentation process. In the paper the descriptive and predictive segmentation were presented and the results of classification and clustering data were analyzed. To carry out the analysis were used following methods: Kohonen neural networks, CART and CHAID. The analysis concerns the manufacturing company producing household products.
10
Content available remote Analysis of medical data using dimensionality reduction techniques
EN
The paper presents the application of dimensionality reduction methods for representation of the multidimensional medical data representing the images of the blood cells in leukemia. Different techniques of reduction belonging to linear and nonlinear methods will be applied and their efficiency compared. Their application to the visualization of different classes as well as clusterization and classification of data will be studied and discussed in the paper.
PL
Praca przedstawia zastosowanie różnych metod redukcji wymiaru danych w reprezentacji numerycznej deskryptorów charakteryzujących klasy komórek krwiotwórczych w białaczce. Porównane zostaną różne podejścia do redukcji oparte na metodach liniowych i nieliniowych transformacji. W szczególności analizie poddane zostaną możliwości zastosowania tych metod w wizualizacji danych jak również klasteryzacji i klasyfikacji. W pracy pokazane zostaną wyniki przeprowadzonych eksperymentów dotyczących 11 klas komórek.
EN
Clustering is a very important technique in knowledge discovery. It has been widely used in data mining, image processing, machine learning, bioinformatics, marketing and other fields. Clustering discern the objects into groups called clusters, based on certain criteria. The similarity of objects is high within the clusters, but low between the clusters. In this work, we investigate a hybridization of the gravitational search algorithm (GSA) and big bang-big crunch algorithm (BB-BC) on data clustering. In the proposed approach, namely GSA-BB, GSA is used to explore the search space for finding the optimal locations of the clusters centroids. Whenever GSA loses its exploration, BB-BC algorithm is used to diversify the population. The performance of the proposed method is compared with GSA, BB-BC and K-means algorithms using six standard and real datasets taken from the UCI machine learning repository. Experimental results indicate that there is significant improvement in the quality of the clusters obtained by the proposed hybrid method over the non-hybrid methods.
EN
Granular computing is one of the important methods for extracting knowledge from data and has got great achievements. However, it is still a puzzle for granular computing researchers to imitate the human cognition process of choosing reasonable granularities automatically for dealing with difficult problems. In this paper, a Gaussian cloud transformation method is proposed to solve this problem, which is based on Gaussian Mixture Model and Gaussian Cloud Model. Gaussian Mixture Model (GMM) is used to transfer an original data set to a sum of Gaussian distributions, and Gaussian Cloud Model (GCM) is used to represent the extension of a concept and measure its confusion degree. Extensive experiments on data clustering and image segmentation have been done to evaluate this method and the results show its performance and validity.
PL
Istotny wpływ na wykrywanie zagrożenia pożarowego przenośników taśmowych w kopalniach węgla mają wartości takich parametrów, jak: stężenie tlenku węgla (CO) i cyjanowodoru (HCN) oraz wartości sygnałów z czujników dymu. Wielkości te są uwzględniane podczas wyznaczania wartości wskaźnika zagrożenia pożarowego. Zbudowano rozmyty model wskaźnika zagrożenia pożarowego w oparciu o laboratoryjne dane pomiarowe wymienionych wielkości. Model rozmyty wygenerowano z danych numerycznych przy zastosowaniu czterech algorytmów rozmytej klasteryzacji, które zaimplementowano w kodzie środowiska MATLAB. Uzyskane wyniki pokazano w tabelach i na wykresach. Do budowy i wizualizacji projektowanego modelu rozmytego wykorzystano funkcje oraz interfejsy Fuzzy Logic Toolbox.
EN
Significant influence on detecting the fire hazard of belt conveyor in the coal mine have values such parameters as concentration of carbon monoxide (CO), concentration of hydrogen cyanide (HCN) and signals from smoke detectors. Those values are used to set the fire risk index. Fuzzy model of the fire risk index was built based on laboratory data measurements. Fuzzy model was generated from the above numerical data using four algorithms of fuzzy clustering, implemented in the MATLAB code. The results are shown in tables and graphs. MATLAB and Fuzzy Logic Toolbox library (functions and interfaces) were used to design and visualize the proposed fuzzy model.
PL
W artykule zaproponowano podejście do wyznaczenia wartości granicznych za pomocą algorytmów rozmytego grupowania danych. Wykorzystano algorytmy FCM, PCM oraz algorytm Gustafsona-Kessela. Eksperyment przeprowadzano na danych symulacyjnych. W tym celu zbudowano model numeryczny maszyny wirnikowej, symulującej określone stany i wielkości niewyważenia. Wyznaczone wartości graniczne porównano z wartościami otrzymanymi przy pomocy metody statystycznej. Wszystkie obliczenia wykonywano w środowisku Matlab-Simulink.
EN
The paper describes a methodology for estimating the limit values of char-icteristics of diagnostic signals using methods of fuzzy data clustering (FCM, PCM and Gustafson-Kessel algorithms). The experiment was conducted on simulated data, using a numerical model of a rotor machine, simulating given inbalanced states. Limits were compared with value estimating using the statistical method.
PL
Pokazano możliwość analizy zbioru danych numerycznych w aspekcie odkrywania niewidocznych związków pomiędzy tymi danymi. Posłużono się metodą analizy składowych głównych oraz wybranymi metodami grupowania danych. W pierwszym przykładzie przeanalizowano podobieństwo wybranych krajów UE w dziedzinie pozyskiwania przez nie energii ze źródeł odnawialnych. Posłużono się powszechnie dostępnymi danymi statystycznymi z baz Głównego Urzędu Statystycznego. W drugim przykładzie pokazano możliwość grupowania okresów zmienności notowań giełdowych. Posłużono się historycznymi (rok 1998) danymi dotyczącymi notowań wybranych indeksów giełdy amerykańskiej.
EN
In this paper we analyze some numerical data sets in order to uncover unknown or hidden relationships between them. We use principal component analysis approach as well as the hierarchical clustering method. In the first example we analyze similarities of EU countries in the field of production of energy from renewable sources. We use commonly available data from the Polish Central Statistical Office. In the second example we try to find groups of similar periods of time based on the US stock exchange. We use same historical (1998) stock exchange quotations of some selected indexes.
EN
Application of machine learning method for creation of equipment diagnostic model is presented in the paper. Dewater pump working in abyssal mining pump station has been chosen as the illustrative example. In the second section, dewater pumps monitoring system is presented, and necessity of the pump diagnostic model creation is justified. Next sections present application of data clustering algorithm and algorithm of decision trees induction. Methods of reduction the get diagnostic model is also developed. The reduction leads to more legible data models. Results of analysis done for two different type of pumps are presented in the last part of the paper.
EN
This paper presents a novel approach to data clustering and multiple-class classification problems. The proposed method is based on a metaphor derived from immune systems, the clonal selection paradigm. A novel clonal selection algorithm - Immune K-Means, is proposed. The proposed system is able to cluster real valued data efficiently and correctly, dynamically estimating the number of clusters. In classification problems discrimination among classes is based on the k-nearest neighbor method. Two different types of suppression are proposed. They enable the evolution of different populations of lymphocytes well suited to a given problem : clustering or classification. The first type of suppression enables the lymphocytes to discover the data distribution while the second type of suppression focuses the lymphocytes on the classes' boundaries. Primary results on artificial data and a real-world benchmark dataset (Fisher's Iris Database) as well as a discussion of the parameters of the algorithm are given.
PL
W pracy pokazano na przykładzie danych ankietowych przydatność sieci Kohonena do analizy danych wielowymiarowych. Dzięki redukcji do trzech wymiarów z jednej strony i analizie danych przez eksperta z drugiej strony zweryfikowano użyteczność sieci bez nauczyciela do rozdzielania zbioru danych na oddzielne grupy. Ostateczną weryfikację twierdzenia o braku zależności między opiniami studentów a ich wyborami preferowanych cech u wykładowców dokonano za pomocą sieci LVQ.
EN
On the basis of opinion survey data the paper shows the usefulness of Kohonen's networks for multidimensional data analysis. Due to a reduction to three dimensions on the one hand and the analysis of the data by an expert on the other, the usefulness of unsupervised learning networks for dividing a set of data into separate groups was verified. The final verification of the thesis that there is no correlation between students' opinions and their choices of preferred lecturers' features was carried out using an LVQ network.
EN
The fuzzy c-means method is one of the most popular clustering methods based on minimization of a criterion function. However, one of the greatest disadvantages of this method is its sensitivity to the presence of noise and outliers in data. The epsilon -insensitive Fuzzy C-Means ( epsilon FCM) clustering algorithm is free of this disadvantage, but has a very high computational burden and requires a choice of the insensitivity parameter(s) epsilon In this paper, a new computationally effective epsilon -insensitive fuzzy c-means clustering algorithm with automatic adjustment of the insensitivity parameter(s) is introduced. Performance of the new clustering algorithm is experimentally verified using synthetic data with outliers and overlapped groups of heavy-tailed data.
first rewind previous Strona / 1 next fast forward last
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.