Nowa wersja platformy, zawierająca wyłącznie zasoby pełnotekstowe, jest już dostępna.
Przejdź na https://bibliotekanauki.pl
Ograniczanie wyników
Czasopisma help
Lata help
Autorzy help
Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników

Znaleziono wyników: 515

Liczba wyników na stronie
first rewind previous Strona / 26 next fast forward last
Wyniki wyszukiwania
Wyszukiwano:
w słowach kluczowych:  cluster analysis
help Sortuj według:

help Ogranicz wyniki do:
first rewind previous Strona / 26 next fast forward last
1
Content available remote A Novel Multimodal Probability Model for Cluster Analysis
100%
EN
Cluster analysis is a tool for data analysis. It is a method for finding clusters of a data set with most similarity in the same group and most dissimilarity between different groups. In general, there are two ways, mixture distributions and classification maximum likelihood method, to use probabilitymodels for cluster analysis. However, the corresponding probability distributions to most clustering algorithms such as fuzzy c-means, possibilistic c-means, mode-seeking methods, etc., have not yet been found. In this paper, we construct a multimodal probability distribution model and then present the relationships between many clustering algorithms and the proposed model via the maximum likelihood estimation. Moreover, we also give the theoretical properties of the proposed multimodal probability distribution.
EN
Problem of clustering of European countries with respect to food consumption is considered. Data related to average yearly per capita consumption of 14 main categories of food products in 39 countries are collected and analysed. Food consumption data for two years: 2000 and 1993 are elaborated. The year 2000 was because there are no more recent data sets available. The year 1993 was chosen as a good reference point: data for that year are the oldest complete. To perform a reasonable grouping of countries the cluster analysis is performed. As a proper number of cluster is not known in advance, hierarchical methods offered by statistical packages Statgraphics are used. The desirable number of clusters is estimated by distance matrices analysis, dendrograms, and graphical representations of distance between clusters with respect to different clustering stages. Squared Euclidean distance is used as a measure of similarity. It is remarkable that all hierarchical methods applied in this paper, apart from nearest neighborhood approach, lead to very similar classification results. Therefore we believe that obtained results provide a valuable and objective insight into the problem of diversification of food consumption in Europe. It has been verified that in spite of visible changes in food consumption in investigated countries, sets of countries belonging to particular clusters obtained for 2000 and for 1993 are almost indistinguishable.
PL
W artykule rozważono zagadnienie pogrupowania państw europejskich ze względu na konsumpcję żywności. Zgromadzono dane o rocznym spożyciu na osobę 14 głównych grup produktów żywnościowych w 39 państwach. Dane dotyczą konsumpcji żywności w latach 2000 oraz 1993. W celu pogrupowania państw wykorzystano analizę skupień. Z uwagi na brak przesłanek dotyczących liczby skupień zastosowano hierarchiczne metody aglomeracyjne, oprogramowane w pakietach statystycznych Statgraphics. Liczbę skupień ustalono na podstawie analizy macierzy odległości, dendrogramów oraz wykresów odległości skupień względem etapów grupowania. Za miarę podobieństwa przyjęto kwadrat odległości euklidesowej. Ustalono, że poza metodą najbliższego sąsiedztwa, wszystkie hierarchiczne metody aglomeracyjne prowadzą do skupień o zbliżonym zestawie państw. Na podstawie wykonanej analizy skupień stwierdzono, że mimo zmian w spożyciu produktów żywnościowych w poszczególnych krajach, zestawy państw w otrzymanych skupieniach w roku 2000 i 1993 były niemal identyczne.
PL
.
EN
Let us assume that the observed random vector from population has a p-dimensional normal distribution with a mean vector and a positive definite covariance matrix. A multivariate observation is known and it belongs to one of two multivariate normal populations but it is not known to which. Let E be the pxp matrix with each element eąual to unity and let I be the p x p identity matrix. In the paper we consider a Bayesian discrimination between s.
EN
The lack of answers is a common problem in all types of research, especially in the field of social sciences. Hence a number of solutions were developed, including the analysis of complete cases or imputations that supplement the missing value with a value calculated according to different algorithms. This paper evaluates the influence of the adopted method for the supplementation of missing answers regarding the result of segmentation conducted with the use of cluster analysis. In order to achieve this we used a set of data from an actual consumer research in which the cases with missing values were deleted or supplemented with the use of various methods. Cluster analyses were then performed on those sets of data, both with the assumption of ordinal and ratio level of measurement, and then the grouping quality, as expressed by different indicators, was evaluated. This research proved the advantage of imputation over the analysis of complete cases, it also proved the validity of using more complex approaches than the simple supplementation with an average or median value.
EN
In the economic studies models based on panel data are increasingly used. The standard panel models are composed of a cross-section character of the data in the time, but do not include the interaction associated with the location of objects in the geographic space. Spatial panel models are based on the information contained cross-section data in the time with regard to space. The paper proposes a different approach to the changes in time on the basis of spatial matrix weight. The aim of this study was to show the possibility to apply spatial weights matrix with particular consideration of time. Data for the analysis came from the database of the CSO and the ARMA in period 2004-2012. In addition to working methods of spatial statistics classical taxonomic methods were also used to obtain a distance matrix.
EN
The beta parameter is a popular tool for the evaluation of portfolio performance. The Sharpe single-index model is a simple regression model in which the stock’s returns are regressed against the returns of a broader index. The beta parameter is a measure of the strength of this relation. Extensive recent research has proved that the beta is not constant in time and should be modelled as a time-variant coefficient. One of the most popular methods of the estimation of a time-varying beta is the Kalman filter. As the output of the Kalman filter, one obtains a sequence of the estimates of a time-varying beta. This sequence shows the historical dynamics of sensitivity of a company’s returns to the variations of market returns. The article proposes a method of clustering companies listed on the Warsaw Stock Exchange according to time-varying betas.
7
Content available remote An alternative extension of the k-means algorithm for clustering categorical data
80%
EN
Most of the earlier work on clustering has mainly been focused on numerical data whose inherent geometric properties can be exploited to naturally define distance functions between data points. Recently, the problem of clustering categorical data has started drawing interest. However, the computational cost makes most of the previous algorithms unacceptable for clustering very large databases. The -means algorithm is well known for its efficiency in this respect. At the same time, working only on numerical data prohibits them from being used for clustering categorical data. The main contribution of this paper is to show how to apply the notion of 'cluster centers' on a dataset of categorical objects and how to use this notion for formulating the clustering problem of categorical objects as a partitioning problem. Finally, a -means-like algorithm for clustering categorical data is introduced. The clustering performance of the algorithm is demonstrated with two well-known data sets, namely, em soybean disease and em nursery databases.
8
Content available remote Ant colony metaphor in a new clustering algorithm
80%
EN
Among the many bio-inspired techniques, ant clustering algorithms have received special attention, especially because they still require much investigation to improve performance, stability and other key features that would make such algorithms mature tools for data mining. Clustering with swarm-based algorithms is emerging as an alternative to more conventional clustering methods, such as k-means algorithm. This proposed approach mimics the clustering behavior observed in real ant colonies. As a case study, this paper focuses on the behavior of clustering procedures in this new approach. The proposed algorithm is evaluated on a number of well-known benchmark data sets. Empirical results clearly show that the ant clustering algorithm (ACA) performs well when compared to other techniques.
9
80%
EN
The cluster analysis is applied to the analysis of the data describing the status of protein structure in respect to hydrophobic core characteristics. The analysis revealed presence of two clusters distinguishing the proteins accordant with the “fuzzy oil drop” model and those which appear as discordant in respect to this model. The analysis was performed separately for chains treated as structural unit and for units defined according to IV-order (taking the functional protein complex). The characteristics of these two classification system appeared to differ in respect to number of proteins belonging to each of two clusters as well as relation between them.
10
Content available Sustainability Attitude of Automotive Suppliers
80%
EN
The issue of sustainability, or corporate social responsibility (CSR), has become a widely discussed topic in all industrial production sectors. The article focuses on the automobile industrial sector because it is not only the most dynamically developing industrial area but also because it is one of the driving forces of local economies in many European countries. This paper aims to reveal possible differences and diversity of understanding of priorities in the CSR activities provided by automotive suppliers in European countries. Based on the meta-analysis, 73 actions were listed, and a questionnaire survey was performed. Cluster analysis and Fisher’s exact test were applied to find out whether the attitude towards sustainability differs dependent on the position in the supply chain or on the company size.
EN
The values of each character as they are often the results of measurements in different units, and it can cause that some characters seem to be dominating a few other characters influenced the course of cluster analysis. The methods of cluster analysis based on the quantitative expression similarity relations, it would not work with data-dependent unit of measurement. Therefore, it is appropriate to transfer their characters to standardization or normalization.
PL
Potrzeby firm w zakresie stosowania zaawansowanych metod przetwarzania danych są różne w zależności od branży funkcjonowania, możliwości finansowania, zachowań konkurencji, rozmiaru i zmienności gromadzonych informacji. W pewnych przypadkach technologie business intelligence, wizualizacja lub metody statystyczne stają się niezbędne do funkcjonowania firmy, w innych są sposobem zwiększenia wydajności oraz uzyskania przewagi konkurencyjnej. Celem publikacji jest analiza różnic w podejściu przedsiębiorstw do stosowania tych technologii. Sprawdzono, czy istnieją cechy powodujące, że dana grupa jest podatna na ofertę związaną z big data i data science. Realizacji tego celu służy analiza skupień, pozwalająca na wyznaczenie grup klientów o podobnej charakterystyce. Wyniki badania wskazują, że źródłem różnic są cechy demograficzne, odmienne oczekiwania oraz dotychczasowe doświadczenia.
EN
Enterpreneurs’ needs in terms of advanced data analysis methods vary depending on the business sector, funding flexibility, competitors’ behavior, volume and volatility of stored information. Business intelligence, visualisation or statistical methods become essential for performing daily operations in some cases, while in the others they develop into a mean of increasing efficiency or gaining competitive advantage. This publication analyses the differences in enterprises' attitude towards application of hot technologies. An attempt is made to distinguish certain features that potentially make a particular group prone to use offered solutions. This objective is accomplished with a cluster analysis carried out to determine client segments sharing similar characteristics. The results indicate that main differences arise from demographic features, varied expectations and past experiences.
EN
This article discusses an attempt at analysis of regional diversity in Poland in 2001 with respect to the level of the higher educational system. The first part of this report deals with ranking provinces with respect to the level of the higher educational system, measured by a synthetic variable. This variable is the result of 10 characteristics, weighted according to their influence on higher education. Selection of those characteristics was dictated by their use by experts as well as their availability in regional statistics data. In the second part of the article, the author presents dusters formed by provinces in two-dimensional areas: the first dimension indicates the level of higher educational system whereas the other dimension describes the socio-economic situation in the regions. This situation is represented by factors that have been singled out (by principal component analysis) as key among 21 characteristics, which potentially influence the higher educational system. The discovery of commonalities according to which those clusters are formed is the main purpose of the article.
PL
Artykuł jest próbą analizy zróżnicowania regionalnego Polski w 2001 r. pod względem szkolnictwa wyższego. W pierwszej części artykułu opisany jest proces hierarchizacji województw pod względem poziomu szkolnictwa wyższego. Za miarę tego poziomu przyjęto zmienną syntetyczną, będącą średnią ważoną z 10 cech mogących świadczyć o poziomie szkolnictwa wyższego. Przy wyborze tych cech kierowano się zarówno ich zastosowaniem przez ekspertów, jak i dostępnością danych w statystyce regionalnej. W drugiej części artykułu autor przedstawia grupowanie województw w dwuwymiarowych przestrzeniach, gdzie pierwszy wymiar to poziom szkolnictwa wyższego, drugi natomiast opisuje sytuację społeczno-gospodarczą w regionach. Sytuację tę reprezentują czynniki wyodrębnione w analizie głównych składowych spośród 21 cech mających potencjalny wpływ na kształtowanie się szkolnictwa wyższego. Celem artykułu jest wykrycie prawidłowości, według których tworzą się te skupiska.
14
Content available Proposal of New Cluster Analysis Algorithm
80%
EN
One of well-known groups of cluster analysis methods is the group of methods based on density estimation. In the paper we propose a new method of defining dusters which consists of two steps. In the first step we find local maxima of the joint distribution thus establishing clusters centres. In the second step we assign observations to one of existing clusters centres. The number of clusters is assumed to be known. In both steps we use similar technique based on the kernel density estimator with the Epanechnikov kernel. The performance of the method is analyzed in an example of application to the Gordon (1999) data. In the analysis the Rousseeuw indices are used to assess clusters cohesion as well as and some comparisons with other methods of defining clusters are presented. The results look promising.
PL
Jedną z dobrze znanych grup metod analizy skupień są metody oparte na szacowaniu gęstości. W artykule zaproponowana jest nowa metoda wyszukiwania skupień, która składa się z dwóch kroków. W pierwszym kroku znajdujemy maksima lokalne rozkładu łącznego, które przyjmujemy jako centra skupień. W drugim kroku każda obserwacja przyłączana jest do jednego z centrów. Zakładamy z góry liczbę skupień. W obydwu krokach używamy tej samej techniki opartej na estymatorze jądrowym funkcji gęstości z jądrem Epanecznikowa. Działanie metody jest przeanalizowane na przykładzie danych Gordona (1999). W analizie wykorzystano indeksy Rousseeuwa spoistości skupień, jak również przedstawiono porównanie z innymi metodami analizowania skupień. Wyniki wyglądają obiecująco.
EN
The article presents auxiliary functions of clusterSim package (see Walesiak & Dudek (2006)) and selected functions of packages stats, cluster, and ade4, which are applied to solving clustering problems. In addition, the examples of the procedures for solving different clustering problems are presented. These procedures, which are not available in statistical packages (SPSS, Statistica, SAS), can help solving a broad range of classification problems.
PL
W artykule scharakteryzowano funkcje pomocnicze pakietu clusterSim oraz wybrane funkcje pakietów stats, cluster i ade4 służące zagadnieniu analizy skupień. Ponadto zaprezentowano przykładowe procedury, wykorzystujące analizowane funkcje, ułatwiające potencjalnemu użytkownikowi realizację wielu zagadnień klasyfikacyjnych niedostępnych w podstawowych pakietach statystycznych (np. SPSS, Statistica, SAS).
EN
The aim of this article is to present the method of latent ideological types and its possible uses. Two dominant approaches to measuring the ideological orientations of the public are self-identification and expert evaluation of attitudes and value orientations. Both approaches are also a common ingredient in international comparative research. In this article the authors first focus on the strengths and methodological weaknesses of those approaches. They then introduce their method of latent ideological types as a different approach. They present the principles of the method, the necessary sequence of steps, and conclude with its concrete application using CSES Slovakia 2010 data. The method of latent ideological types is based on subjective placements on a given set of ideological orientations or scales. The non-manifest ideological inclinations of the respondents are then determined from their attitudes and opinions. Using attitude variables, definitional patterns and cluster analysis the authors proceed from manifest ideological types to latent ideological types. The resulting latent ideological positions of the respondents may differ from their manifest self-identifications. The advantages of knowing the latent positions are discussed.
EN
Objective:To obtain a case definition and to describe variables associated with a cluster of unspecific symptoms in healthcare workers (HCW) in a hospital building. Materials and Methods: A cross-sectional study was performed. All people working at the Residencia Cantabria building (a 200-bed building belonging to University Hospital Marqués de Valdecilla) in June 2009 were invited to complete a self-administered questionnaire, including questions on demographic data, working place and shift, working conditions and current symptoms. A cluster analysis was developed to obtain the case definition. The strength of the association between the studied variables and accomplishing the case definition was measured using odds ratios (OR) with the 95% confidence interval (CI). Multiple logistic regression was used to obtain a predictive model; its general validity was estimated with Receiver Operating Curves (ROC) and their Area Under the Curve (AUC). Results: 357 completed questionnaires were obtained. The case was defined as having at least 5 symptoms out of the eleven included. Not being ascribed to a specific shift was the strongest protective variable related with "being a case" (OR = 0.30; 95% CI: 0.17-0.54), whereas the personal antecedent of distal pain or inflammation in arms or legs was the main risk factor (OR = 4.33, 95% CI: 2.75-6.82). A six-variable predictive model has AUC equaling to 0.7378. Conclusions: A disease associated with the indoor environment quality in a hospital was characterized. A multivariate score was drafted for identifying HCW with higher risk of developing the disease in order to apply administrative prevention measures.
EN
This paper is an attempt to compare the performance of an algorithm for determining the number of clusters in a data set proposed by the author with other methods of determining the number of clusters. The idea of the new algorithm is based on the comparison of pseudo cumulative distribution functions of a certain random variable. For a fixed window size we draw К different points and for every point we find the corresponding limiting point in the mean shift procedure. Then we check if the distance (e.g. Euclidean) between every pair of the limiting points is greater than the window size. Analogously we determine the pseudo cumulative distribution functions for different numbers К of clusters. Out of all pseudo cumulative distribution functions we pick the proper one i.e. the last one” (with respect to K) which has a horizontal phase. Other methods of determining the number of clusters in a data set are compared with the proposed algorithm in a number of examples of two dimensional data sets for different clustering methods (k-means clustering and minimum distance agglomeration).
PL
Artykuł niniejszy jest próbą oceny porównawczej algorytmu wyznaczającego ilość skupień w zbiorze danych, zaproponowanego przez autora, z innymi metodami wyznaczania ilości skupień. Algorytm autora oparty jest na porównaniu pseudodystrybuant pewnej zmiennej losowej dla różnych ilości skupień. Ta zmienna losowa jest zdefiniowana w następujący sposób. Dla ustalonego rozmiaru okna losujemy ze zbioru danych К różnych punktów i dla każdego z tych punktów znajdujemy odpowiadający mu punkt graniczny w procedurze średniego przesunięcia próby. Następnie sprawdzamy, czy odległość (np. euklidesowa) pomiędzy każdą parą punktów granicznych jest większa od rozmiaru okna. Analogicznie wyznaczamy pseudodystrybuanty dla różnych ilości К skupień. Ze wszystkich dystrybuant za prawidłowo określającą ilość skupień uznajemy tę, która odpowiada ostatniej (względem K) krzywej, posiadającej fazę poziomą. Inne metody określania liczby skupień w zbiorze danych są porównane z zaproponowanym algorytmem na przykładach kilku dwuwymiarowych zbiorów danych dla dwóch, diametralnie różnych w naturze, metod konstruowania skupień.
EN
Mobility of students plays a major role in developing creativity, active citizenship and chances of employment − especially in the face of labour market globalization. The article refers to the chosen area of research on students, in the context of determinants of students educational mobility. The purpose of research was to explore determinants of educational mobility, to reveal common features and differences in motivations and attitudes of students and to elaborate preliminary recommendations for universities taking part in international educational programmes. Observations of the authors of the present article push the suppositions that motives students are guided by taking decisions about participating in mobility educational programmes often deviate from those planned in official documents.University teachers from other countries report similar observations. The authors decided initially to examine those motives which Polish and Russian young people are led by when choosing foreign educational path, carried out in the frames of academic mobility programmes. The material for analysis comes from examinations which started in the end of April 2015. The cluster analysis was used.
20
80%
EN
In the article are presented the synthetic review of the literature from P. Jaccard in 1908 to B. Mirkin, 2011. In this paper, the concept and classification of cluster validity indices are proposed. There are presented classification of validity indices to find the optimal number of clusters. The results of this study should be useful for all concerned with the problems of classification.
PL
W artykule dokonano syntetycznego przeglądu literatury tematu począwszy od prac P. Jaccarda z roku 1908 a skończywszy na pracach B. Mirkina z 2011 roku. Dokonano próby klasyfikacji znanych wskaźników jakości grupowania, uwzględniając kryteria pochodzące z różnych dyscyplin naukowych. W szczególności dokonano klasyfikacji wskaźników optymalnej liczby skupień jako podklasy wskaźników jakości grupowania. Wyniki prezentowanych badań powinny być użyteczne dla wszystkich zajmujących się problemami grupowania i klasyfikacji.
first rewind previous Strona / 26 next fast forward last
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.