Nowa wersja platformy, zawierająca wyłącznie zasoby pełnotekstowe, jest już dostępna.
Przejdź na https://bibliotekanauki.pl
Ograniczanie wyników
Czasopisma help
Lata help
Autorzy help
Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników

Znaleziono wyników: 266

Liczba wyników na stronie
first rewind previous Strona / 14 next fast forward last
Wyniki wyszukiwania
Wyszukiwano:
w słowach kluczowych:  cluster analysis
help Sortuj według:

help Ogranicz wyniki do:
first rewind previous Strona / 14 next fast forward last
EN
Problem of clustering of European countries with respect to food consumption is considered. Data related to average yearly per capita consumption of 14 main categories of food products in 39 countries are collected and analysed. Food consumption data for two years: 2000 and 1993 are elaborated. The year 2000 was because there are no more recent data sets available. The year 1993 was chosen as a good reference point: data for that year are the oldest complete. To perform a reasonable grouping of countries the cluster analysis is performed. As a proper number of cluster is not known in advance, hierarchical methods offered by statistical packages Statgraphics are used. The desirable number of clusters is estimated by distance matrices analysis, dendrograms, and graphical representations of distance between clusters with respect to different clustering stages. Squared Euclidean distance is used as a measure of similarity. It is remarkable that all hierarchical methods applied in this paper, apart from nearest neighborhood approach, lead to very similar classification results. Therefore we believe that obtained results provide a valuable and objective insight into the problem of diversification of food consumption in Europe. It has been verified that in spite of visible changes in food consumption in investigated countries, sets of countries belonging to particular clusters obtained for 2000 and for 1993 are almost indistinguishable.
PL
W artykule rozważono zagadnienie pogrupowania państw europejskich ze względu na konsumpcję żywności. Zgromadzono dane o rocznym spożyciu na osobę 14 głównych grup produktów żywnościowych w 39 państwach. Dane dotyczą konsumpcji żywności w latach 2000 oraz 1993. W celu pogrupowania państw wykorzystano analizę skupień. Z uwagi na brak przesłanek dotyczących liczby skupień zastosowano hierarchiczne metody aglomeracyjne, oprogramowane w pakietach statystycznych Statgraphics. Liczbę skupień ustalono na podstawie analizy macierzy odległości, dendrogramów oraz wykresów odległości skupień względem etapów grupowania. Za miarę podobieństwa przyjęto kwadrat odległości euklidesowej. Ustalono, że poza metodą najbliższego sąsiedztwa, wszystkie hierarchiczne metody aglomeracyjne prowadzą do skupień o zbliżonym zestawie państw. Na podstawie wykonanej analizy skupień stwierdzono, że mimo zmian w spożyciu produktów żywnościowych w poszczególnych krajach, zestawy państw w otrzymanych skupieniach w roku 2000 i 1993 były niemal identyczne.
EN
The lack of answers is a common problem in all types of research, especially in the field of social sciences. Hence a number of solutions were developed, including the analysis of complete cases or imputations that supplement the missing value with a value calculated according to different algorithms. This paper evaluates the influence of the adopted method for the supplementation of missing answers regarding the result of segmentation conducted with the use of cluster analysis. In order to achieve this we used a set of data from an actual consumer research in which the cases with missing values were deleted or supplemented with the use of various methods. Cluster analyses were then performed on those sets of data, both with the assumption of ordinal and ratio level of measurement, and then the grouping quality, as expressed by different indicators, was evaluated. This research proved the advantage of imputation over the analysis of complete cases, it also proved the validity of using more complex approaches than the simple supplementation with an average or median value.
3
Content available Clustering macroeconomic time series
80%
EN
The data mining technique of time series clustering is well established. However, even when recognized as an unsupervised learning method, it does require making several design decisions that are nontrivially influenced by the nature of the data involved. By extensively testing various possibilities, we arrive at a choice of a dissimilarity measure (compression-based dissimilarity measure, or CDM) which is particularly suitable for clustering macroeconomic variables. We check that the results are stable in time and reflect large-scale phenomena, such as crises. We also successfully apply our findings to the analysis of national economies, specifically to identifying their structural relations.
EN
The divergence entropy: O/T and O/R measuring the distance between observed/theoretical and observed/random distributions was applied to identify the category of protein structures in respect to the hydrophobic core in protein molecules. The naive interpretation was applied treating the proteins of O/T < O/R as the molecules of hydrophobic core accordant with the theoretically assumed. The proteins of O/T > O/R are treated as representing the hydrophobic core not accordant with the assumed one. The large scale computing was performed (PDB data set) to reveal whether other than simple inequality relation should be used for this identification. The cluster analysis was applied to identify the relation O/T versus O/R as the discrimination factor to classify the category of proteins in respect to their structural form of hydrophobic core.
EN
This paper shows an example of the grouping of piezocone penetration test (CPTU) characteristics using functional data analysis, together with the results of clustering, in the form of a subsoil rigidity model. The subsoil rigidity model was constructed based on layer separation using the proposed method, as well as the k-means method. In the construction of the subsoil rigidity model, the constrained modulus M was applied. These moduli were determined from empirical relationships for overconsolidated and normally consolidated soils from Poland based on cone tip resistance.
EN
The article presents a study of applying the proposed method of cluster analysis to support purchasing decisions in the welding industry. The authors analyze the usefulness of the non-hierarchical method, Expectation Maximization (EM), in the selection of material (212 combinations of flux and wire melt) for the SAW (Submerged Arc Welding) method process. The proposed approach to cluster analysis is proved as useful in supporting purchase decisions.
EN
The paper presents a method which supports the choice of the clustering procedure and makes it possible to select parameters for most important steps in this process. This method was presented on the example of thyroid ultrasound images belonging to healthy individuals and patients suffering from Hashimoto's thyroiditis. 11 360 variants of clustering procedure were analyzed and optimal parameters for 4 different forms of data set have been chosen.
PL
W pracy zaprezentowano metodę, która wspomaga wybór procedury grupowania obiektów i pozwala określić parametry dla najważniejszych etapów tego procesu. Działanie tej metody pokazano na przykładzie obrazów USG tarczycy należących do osób zdrowych i chorych na chorobę Hashimoto. Metoda pozwoliła przeanalizować 11 360 wariantów procedury grupowania i wybrać optymalne parametry dla czterech różnych postaci zbioru danych.
PL
W artykule badamy sprawność algorytmu wybierania zmiennych w analizie skupień opartego na entropii (por. Dash, Liu, 2000). Ocena oparta jest na eksperymencie, w którym zbiory generowane są w postaci mieszanin rozkładów normalnych. Wyniki wskazują na to. że metoda nie radzi sobie tak dobrze jak to sugerowali Autorzy.
EN
This article discusses an attempt at analysis of regional diversity in Poland in 2001 with respect to the level of the higher educational system. The first part of this report deals with ranking provinces with respect to the level of the higher educational system, measured by a synthetic variable. This variable is the result of 10 characteristics, weighted according to their influence on higher education. Selection of those characteristics was dictated by their use by experts as well as their availability in regional statistics data. In the second part of the article, the author presents dusters formed by provinces in two-dimensional areas: the first dimension indicates the level of higher educational system whereas the other dimension describes the socio-economic situation in the regions. This situation is represented by factors that have been singled out (by principal component analysis) as key among 21 characteristics, which potentially influence the higher educational system. The discovery of commonalities according to which those clusters are formed is the main purpose of the article.
PL
Artykuł jest próbą analizy zróżnicowania regionalnego Polski w 2001 r. pod względem szkolnictwa wyższego. W pierwszej części artykułu opisany jest proces hierarchizacji województw pod względem poziomu szkolnictwa wyższego. Za miarę tego poziomu przyjęto zmienną syntetyczną, będącą średnią ważoną z 10 cech mogących świadczyć o poziomie szkolnictwa wyższego. Przy wyborze tych cech kierowano się zarówno ich zastosowaniem przez ekspertów, jak i dostępnością danych w statystyce regionalnej. W drugiej części artykułu autor przedstawia grupowanie województw w dwuwymiarowych przestrzeniach, gdzie pierwszy wymiar to poziom szkolnictwa wyższego, drugi natomiast opisuje sytuację społeczno-gospodarczą w regionach. Sytuację tę reprezentują czynniki wyodrębnione w analizie głównych składowych spośród 21 cech mających potencjalny wpływ na kształtowanie się szkolnictwa wyższego. Celem artykułu jest wykrycie prawidłowości, według których tworzą się te skupiska.
10
Content available Proposal of New Cluster Analysis Algorithm
80%
EN
One of well-known groups of cluster analysis methods is the group of methods based on density estimation. In the paper we propose a new method of defining dusters which consists of two steps. In the first step we find local maxima of the joint distribution thus establishing clusters centres. In the second step we assign observations to one of existing clusters centres. The number of clusters is assumed to be known. In both steps we use similar technique based on the kernel density estimator with the Epanechnikov kernel. The performance of the method is analyzed in an example of application to the Gordon (1999) data. In the analysis the Rousseeuw indices are used to assess clusters cohesion as well as and some comparisons with other methods of defining clusters are presented. The results look promising.
PL
Jedną z dobrze znanych grup metod analizy skupień są metody oparte na szacowaniu gęstości. W artykule zaproponowana jest nowa metoda wyszukiwania skupień, która składa się z dwóch kroków. W pierwszym kroku znajdujemy maksima lokalne rozkładu łącznego, które przyjmujemy jako centra skupień. W drugim kroku każda obserwacja przyłączana jest do jednego z centrów. Zakładamy z góry liczbę skupień. W obydwu krokach używamy tej samej techniki opartej na estymatorze jądrowym funkcji gęstości z jądrem Epanecznikowa. Działanie metody jest przeanalizowane na przykładzie danych Gordona (1999). W analizie wykorzystano indeksy Rousseeuwa spoistości skupień, jak również przedstawiono porównanie z innymi metodami analizowania skupień. Wyniki wyglądają obiecująco.
EN
The growing popularity of dating apps has been noted in recent years, but the nature of the user experience of dating apps is heterogeneous. The present study aimed to investigate the experiences of dating app users and distinguish their types based on cluster analysis. An exploratory study was conducted online among 406 adults who have used online dating. Survey questions investigated: motives for online dating, perceived pressure to find a partner, length of use of the dating app, usability rating, perceived benefits and well-being after using dating apps, frequency of experiencing negative events while dating, reasons for and number of deletions of dating profile. Three credible clusters were identified: “persistent” - people who are mainly looking for a relationship, have long periods of use of the app and delete it less often; “for a while”. - are those with different motives for online dating, use the app briefly and rarely return to it, and are characterized by a small number of difficult online situations; "hurt" - people with motives other than looking for a relationship, have many negative experiences and most often delete dating apps. The clusters differed in age, gender, relationship status, motives (relationships vs. other motives), pressure (pressure, no pressure).
12
Content available Sustainability Attitude of Automotive Suppliers
80%
EN
The issue of sustainability, or corporate social responsibility (CSR), has become a widely discussed topic in all industrial production sectors. The article focuses on the automobile industrial sector because it is not only the most dynamically developing industrial area but also because it is one of the driving forces of local economies in many European countries. This paper aims to reveal possible differences and diversity of understanding of priorities in the CSR activities provided by automotive suppliers in European countries. Based on the meta-analysis, 73 actions were listed, and a questionnaire survey was performed. Cluster analysis and Fisher’s exact test were applied to find out whether the attitude towards sustainability differs dependent on the position in the supply chain or on the company size.
EN
The values of each character as they are often the results of measurements in different units, and it can cause that some characters seem to be dominating a few other characters influenced the course of cluster analysis. The methods of cluster analysis based on the quantitative expression similarity relations, it would not work with data-dependent unit of measurement. Therefore, it is appropriate to transfer their characters to standardization or normalization.
|
|
tom Vol. 34
69--74
EN
The introduction of food delivery apps, facilitated by the global pandemic, has created a significant disruption in the hospitality industry. However, how consumers use mobile applications in the context of daily choices and food consumption has not been fully explored. Using data collected through an online questionnaire comprising 165 food delivery app subscribers, k-mean cluster analysis was performed to classify users based on their internal motivations. The results reveal three distinct groups: Health-conscious Eaters, Food Enthusiasts, and Lifetime Diners. Practically, the present exploratory study assists FDA providers to better identify customers, so potentially optimizing marketing initiatives, and maximizing profitability.
|
|
nr 2
191-221
EN
The aim of this article is to present the method of latent ideological types and its possible uses. Two dominant approaches to measuring the ideological orientations of the public are self-identification and expert evaluation of attitudes and value orientations. Both approaches are also a common ingredient in international comparative research. In this article the authors first focus on the strengths and methodological weaknesses of those approaches. They then introduce their method of latent ideological types as a different approach. They present the principles of the method, the necessary sequence of steps, and conclude with its concrete application using CSES Slovakia 2010 data. The method of latent ideological types is based on subjective placements on a given set of ideological orientations or scales. The non-manifest ideological inclinations of the respondents are then determined from their attitudes and opinions. Using attitude variables, definitional patterns and cluster analysis the authors proceed from manifest ideological types to latent ideological types. The resulting latent ideological positions of the respondents may differ from their manifest self-identifications. The advantages of knowing the latent positions are discussed.
EN
Objective:To obtain a case definition and to describe variables associated with a cluster of unspecific symptoms in healthcare workers (HCW) in a hospital building. Materials and Methods: A cross-sectional study was performed. All people working at the Residencia Cantabria building (a 200-bed building belonging to University Hospital Marqués de Valdecilla) in June 2009 were invited to complete a self-administered questionnaire, including questions on demographic data, working place and shift, working conditions and current symptoms. A cluster analysis was developed to obtain the case definition. The strength of the association between the studied variables and accomplishing the case definition was measured using odds ratios (OR) with the 95% confidence interval (CI). Multiple logistic regression was used to obtain a predictive model; its general validity was estimated with Receiver Operating Curves (ROC) and their Area Under the Curve (AUC). Results: 357 completed questionnaires were obtained. The case was defined as having at least 5 symptoms out of the eleven included. Not being ascribed to a specific shift was the strongest protective variable related with "being a case" (OR = 0.30; 95% CI: 0.17-0.54), whereas the personal antecedent of distal pain or inflammation in arms or legs was the main risk factor (OR = 4.33, 95% CI: 2.75-6.82). A six-variable predictive model has AUC equaling to 0.7378. Conclusions: A disease associated with the indoor environment quality in a hospital was characterized. A multivariate score was drafted for identifying HCW with higher risk of developing the disease in order to apply administrative prevention measures.
EN
This paper is an attempt to compare the performance of an algorithm for determining the number of clusters in a data set proposed by the author with other methods of determining the number of clusters. The idea of the new algorithm is based on the comparison of pseudo cumulative distribution functions of a certain random variable. For a fixed window size we draw К different points and for every point we find the corresponding limiting point in the mean shift procedure. Then we check if the distance (e.g. Euclidean) between every pair of the limiting points is greater than the window size. Analogously we determine the pseudo cumulative distribution functions for different numbers К of clusters. Out of all pseudo cumulative distribution functions we pick the proper one i.e. the last one” (with respect to K) which has a horizontal phase. Other methods of determining the number of clusters in a data set are compared with the proposed algorithm in a number of examples of two dimensional data sets for different clustering methods (k-means clustering and minimum distance agglomeration).
PL
Artykuł niniejszy jest próbą oceny porównawczej algorytmu wyznaczającego ilość skupień w zbiorze danych, zaproponowanego przez autora, z innymi metodami wyznaczania ilości skupień. Algorytm autora oparty jest na porównaniu pseudodystrybuant pewnej zmiennej losowej dla różnych ilości skupień. Ta zmienna losowa jest zdefiniowana w następujący sposób. Dla ustalonego rozmiaru okna losujemy ze zbioru danych К różnych punktów i dla każdego z tych punktów znajdujemy odpowiadający mu punkt graniczny w procedurze średniego przesunięcia próby. Następnie sprawdzamy, czy odległość (np. euklidesowa) pomiędzy każdą parą punktów granicznych jest większa od rozmiaru okna. Analogicznie wyznaczamy pseudodystrybuanty dla różnych ilości К skupień. Ze wszystkich dystrybuant za prawidłowo określającą ilość skupień uznajemy tę, która odpowiada ostatniej (względem K) krzywej, posiadającej fazę poziomą. Inne metody określania liczby skupień w zbiorze danych są porównane z zaproponowanym algorytmem na przykładach kilku dwuwymiarowych zbiorów danych dla dwóch, diametralnie różnych w naturze, metod konstruowania skupień.
EN
The article presents auxiliary functions of clusterSim package (see Walesiak & Dudek (2006)) and selected functions of packages stats, cluster, and ade4, which are applied to solving clustering problems. In addition, the examples of the procedures for solving different clustering problems are presented. These procedures, which are not available in statistical packages (SPSS, Statistica, SAS), can help solving a broad range of classification problems.
PL
W artykule scharakteryzowano funkcje pomocnicze pakietu clusterSim oraz wybrane funkcje pakietów stats, cluster i ade4 służące zagadnieniu analizy skupień. Ponadto zaprezentowano przykładowe procedury, wykorzystujące analizowane funkcje, ułatwiające potencjalnemu użytkownikowi realizację wielu zagadnień klasyfikacyjnych niedostępnych w podstawowych pakietach statystycznych (np. SPSS, Statistica, SAS).
EN
The beta parameter is a popular tool for the evaluation of portfolio performance. The Sharpe single-index model is a simple regression model in which the stock’s returns are regressed against the returns of a broader index. The beta parameter is a measure of the strength of this relation. Extensive recent research has proved that the beta is not constant in time and should be modelled as a time-variant coefficient. One of the most popular methods of the estimation of a time-varying beta is the Kalman filter. As the output of the Kalman filter, one obtains a sequence of the estimates of a time-varying beta. This sequence shows the historical dynamics of sensitivity of a company’s returns to the variations of market returns. The article proposes a method of clustering companies listed on the Warsaw Stock Exchange according to time-varying betas.
20
Content available remote An alternative extension of the k-means algorithm for clustering categorical data
80%
|
|
nr 2
241-247
EN
Most of the earlier work on clustering has mainly been focused on numerical data whose inherent geometric properties can be exploited to naturally define distance functions between data points. Recently, the problem of clustering categorical data has started drawing interest. However, the computational cost makes most of the previous algorithms unacceptable for clustering very large databases. The -means algorithm is well known for its efficiency in this respect. At the same time, working only on numerical data prohibits them from being used for clustering categorical data. The main contribution of this paper is to show how to apply the notion of 'cluster centers' on a dataset of categorical objects and how to use this notion for formulating the clustering problem of categorical objects as a partitioning problem. Finally, a -means-like algorithm for clustering categorical data is introduced. The clustering performance of the algorithm is demonstrated with two well-known data sets, namely, em soybean disease and em nursery databases.
first rewind previous Strona / 14 next fast forward last
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.