This study aims to analyze energy consumption patterns across selected nations from Africa, America, Asia/Middle East, and Europe, with a focus on the types of energy sources used. Covering 46 countries, the research spans the years 2000 to 2018 and examines the distribution and changes in energy consumption by source and type. The regions studied include diverse countries such as Austria, Sweden, Czechia, and Croatia in Europe; Algeria, Egypt, and South Africa in Africa; China, India, and Saudi Arabia in Asia/Middle East; and Brazil, Canada, and the United States in the Americas, with Australia and New Zealand representing Oceania. Utilizing data from the BP Statistical Review of World Energy and the SHIFT Data Portal, along with key indicators maintained by Our World in Data, the study employs methods such as descriptive statistics, cluster analysis using k-means, and time-series clustering with dynamic time warping (DTW). The analysis highlights regional similarities and variances in energy use, providing new insights into the complex relationship between energy consumption patterns and factors such as economic growth, national policies, and geopolitical contexts. This research addresses a significant gap in the existing literature by offering a detailed comparative analysis of how different nations manage and consume energy. It contributes to the broader discourse on sustainable energy policies and economic development in the face of global energy challenges.
One of the greatest threats to many lakes is their accelerated eutrophication resulting from anthropogenic pressure, agricultural intensification, and climate change. A very important element of surface water protection in environmentally conserved areas is the proper monitoring of water quality and detection of potential threats by examining the physicochemical properties of water and performing statistical analyses that enable possible exposure of unfavourable trends. The article presents the analyses of the results of measurements made in three lakes located in the Sierakowski Landscape Park. As part of the measurements, water quality indicators i.e., phosphorus, nitrogen, BOD5 and COD, were determined monthly for a year at the inflows and outflows of the studied lakes. The test results of selected water quality indicators were analysed using machine learning algorithms i.e., PCA and k-means. The conducted tests enabled statistical estimation of changes in water quality indicators in the reservoirs and evaluation of their correlation.
Kleinberg introduced the concept of k-richness as a requirement for an algorithm to be a clustering algorithm. The most popular algorithm k means dos not fit this definition because of its probabilistic nature. Hence Ackerman et al. proposed the notion of probabilistic k-richness claiming without proof that k-means has this property. It is proven in this paper, by example, that the version of k-means with random initialization does not have the property probabilistic k-richness, just rebuking Ackeman's claim.
In this paper, experimental data, given in the form of pairwise comparisons, such as distances or similarities, are considered. Clustering algorithms for processing such data are developed based on the well-known k-means procedure. Relations to factor analysis are shown. The problems of improving clustering quality and of finding the proper number of clusters in the case of pairwise comparisons are considered. Illustrative examples are provided.
Assessment of seismic vulnerability of urban infrastructure is an actual problem, since the damage caused by earthquakes is quite significant. Despite the complexity of such tasks, today’s machine learning methods allow the use of “fast” methods for assessing seismic vulnerability. The article proposes a methodology for assessing the characteristics of typical urban objects that affect their seismic resistance; using classification and clustering methods. For the analysis, we use kmeans and hkmeans clustering methods, where the Euclidean distance is used as a measure of proximity. The optimal number of clusters is determined using the Elbow method. A decision-making model on the seismic resistance of an urban object is presented, also the most important variables that have the greatest impact on the seismic resistance of an urban object are identified. The study shows that the results of clustering coincide with expert estimates, and the characteristic of typical urban objects can be determined as a result of data modeling using clustering algorithms.
The paper presents a concept of using clusters of objects using the k-means method to control the performance of the production process, which runs under variable conditions. The distribution of the production process performance in production cycles grouped according to similarity is the basis for controlling the performance of subsequent production cycles. The practical part of the paper contains an example of calculations carried out according to this concept using the VBA and R languages, and is relates to the bolting process in underground mines.
PL
W artykule przedstawiono koncepcję wykorzystania grupowania obiektów metodą k-średnich do kontroli wydajności procesu produkcyjnego, który przebiega w zmiennych warunkach. Rozkłady wydajności procesu produkcyjnego w pogrupowanych pod względem podobieństwa cyklach produkcyjnych, stanowią podstawę kontroli wydajności kolejnych cykli produkcyjnych. Część praktyczna pracy zawiera przykład obliczeń przeprowadzonych według tej koncepcji z użyciem języka VBA oraz języka R i dotyczy procesu kotwienia w kopalniach podziemnych.
This paper poses the question of whether or not the usage of the kernel trick is justified. We investigate it for the special case of its usage in the kernel k-means algorithm. Kernel-k-means is a clustering algorithm, allowing clustering data in a similar way to k-means when an embedding of data points into Euclidean space is not provided and instead a matrix of “distances” (dissimilarities) or similarities is available. The kernel trick allows us to by-pass the need of finding an embedding into Euclidean space. We show that the algorithm returns wrong results if the embedding actually does not exist. This means that the embedding must be found prior to the usage of the algorithm. If it is found, then the kernel trick is pointless. If it is not found, the distance matrix needs to be repaired. But the reparation methods require the construction of an embedding, which first makes the kernel trick pointless, because it is not needed, and second, the kernel-k-means may return different clusterings prior to repairing and after repairing so that the value of the clustering is questioned. In the paper, we identify a distance repairing method that produces the same clustering prior to its application and afterwards and does not need to be performed explicitly, so that the embedding does not need to be constructed explicitly. This renders the kernel trick applicable for kernel-k-means.
8
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
This paper, constituting an extension to the conference paper [1], corrects the proof of the Theorem 2 from the Gower’s paper [2, page 5]. The correction is needed in order to establish the existence of the kernel function used commonly in the kernel trick e.g. for k-means clustering algorithm, on the grounds of distance matrix. The correction encompasses the missing if-part proof and dropping unnecessary conditions.
9
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
Mini-model method (MM-method) is an instance-based learning algorithm similarly as the k-nearest neighbor method, GRNN network or RBF network but its idea is different. MM operates only on data from the local neighborhood of a query. The paper presents new version of the MM-method which is based on k-means clustering algorithm. The domain of the model is calculated using k-means algorithm. Clustering method makes the learning procedure simpler.
PL
Metoda mini-modeli (metoda MM) jest algorytmem bazującym na próbkach podobnie jak metoda k-najbliższych sąsiadów, sieć RBF czy sieć GRNN ale jej zasada działania jest inna. MM operuje tylko na danych z najbliższego otoczenia punktu zapytania. Artykuł prezentuje nową wersję metody MM, która bazuje na algorytmie k-średnich. Domena MM jest obliczana przy pomocy algorytmu k-średnich. Użycie algorytmu klasteryzacji uprościło procedurę uczenia.
This paper concerns the analysis of experimental data, verifying the applicability of signal analysis techniques for condition monitoring of a packaging machine. In particular, the activity focuses on the cutting process that divides a continuous flow of packaging paper into single packages. The cutting process is made by a steel knife driven by a hydraulic system. Actually, the knives are frequently substituted, causing frequent stops of the machine and consequent lost production costs. The aim of this paper is to develop a diagnostic procedure to assess the wearing condition of blades, reducing the stops for maintenance. The packaging machine was provided with pressure sensor that monitors the hydraulic system driving the blade. Processing the pressure data comprises three main steps: the selection of scalar quantities that could be indicative of the condition of the knife. A clustering analysis was used to set up a threshold between unfaulted and faulted knives. Finally, a Support Vector Machine (SVM) model was applied to classify the technical condition of knife during its lifetime.
Conventional speaker recognition systems use the Universal Background Model (UBM) as an imposter for all speakers. In this paper, speaker models are clustered to obtain better imposter model representations for speaker verification purpose. First, a UBM is trained, and speaker models are adapted from the UBM. Then, the k-means algorithm with the Euclidean distance measure is applied to the speaker models. The speakers are divided into two, three, four, and five clusters. The resulting cluster centers are used as background models of their respective speakers. Experiments showed that the proposed method consistently produced lower Equal Error Rates (EER) than the conventional UBM approach for 3, 10, and 30 seconds long test utterances, and also for channel mismatch conditions. The proposed method is also compared with the i-vector approach. The three-cluster model achieved the best performance with a 12.4% relative EER reduction in average, compared to the i-vector method. Statistical significance of the results are also given.
12
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
Eksploracja danych dostarcza cennej wiedzy ukrytej w dużych zbiorach danych. Pozwala na odkrywanie zależności niewidocznych gołym okiem. Swoje zastosowanie może znaleźć także w edukacji podczas przygotowywania oferty dydaktycznej. Artykuł przedstawia zastosowanie algorytmów eksploracji danych w przygotowaniu procesu edukacyjnego. W rozważanym zakresie eksploracja danych służy do przekształcania surowych danych w wiedzę, która pozwala na poznanie preferencji studentów. Skupiono się na odkrywaniu grup studentów oraz tworzeniu ich modeli określających style uczenia się. W trakcie budowania grup zastosowano klasyfikację bez nadzoru m.in. metody k-średnich oraz EM. Grupy tworzone były z uwzględnieniem preferencji studentów dotyczących nauki. Pozwoliło to na uzyskanie grup zawierających studentów o podobnych stylach uczenia się. Do zweryfikowania poprawności klasyfikacji wykorzystane zostały indeksy walidacyjne, które pozwoliły na wybranie najbardziej efektywnego podziału studentów. Badania przeprowadzono na danych zebranych wśród studentów Politechniki Rzeszowskiej na podstawie ankiety zawierającej kwestionariusz ILS. Uzyskane podczas badań wyniki pozwoliły na określenie ile różnorodnych materiałów dydaktycznych należy przygotować, aby były dopasowane do preferencji studentów różnych grup. Poznanie stylów uczenia się studentów pozwala nauczycielowi na lepsze zrozumienie upodobań studentów, a samym uczniom na dopasowanie materiałów do własnego stylu uczenia, dzięki czemu łatwiej i szybciej przyswajają wiedzę.
EN
Data mining provides valuable knowledge hidden in large data sets. It allows to explore depending invisible to the naked eye. It has been used in education while preparation educational offer. The article shows the application of data mining algorithms in the preparation of the educational process. In the considered range, data mining is used to transform raw data into knowledge, which allows to know the students' preferences. It has been focused on discovering groups of students and the development of models for the assessment of their learning styles. It has been applied unsupervised classification during process build groups. Groups have been created taking into account the preferences of students in science. It has been allowed get the groups consisting of students with similar learning styles. To verify the accuracy of the classification has been used indexes validation that allowed you to select the most efficient distribution of students. The study was conducted on data collected among students of Rzeszow University of Technology based on a survey questionnaire containing the ILS. Obtained during the studies results allowed to determine what materials teaching should be prepared to be tailored to the preferences of different groups of students. Understanding the learning styles of students allows teachers to better understand the preferences of students and the students to tailor materials to their own learning style, making it easier and faster to acquire knowledge.
In this paper we propose a method for object description based on two wellknown clustering algorithms (k-means and mean shift) and the SURF method for keypoints detection. We also perform a comparison of these clustering methods in object description area. Both of these algorithms require one input parameter; k-means (k, number of objects) and mean shift (h, window). Our approach is suitable for images with a non-homogeneous background thus, the algorithm can be used not only on trivial images. In the future we will try to remove non-important keypoints detected by the SURF algorithm. Our method is a part of a larger CBIR system and it is used as a preprocessing stage.
14
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
Among the data clustering algorithms, k-means (KM) algorithm is one of the most popular clustering techniques due to its simplicity and efficiency. However, k-means is sensitive to initial centers and it has the local optima problem. K-harmonic-means (KHM) clustering algorithm solves the initialization problem of k-means algorithm, but it also has local optima problem. In this paper, we develop a new algorithm for solving this problem based on an improved version of particle swarm optimization (IPSO) algorithm and KHM clustering. In the proposed algorithm, IPSO is equipped with Cuckoo Search algorithm and two new concepts used in PSO in order to improve the efficiency, fast convergence and escape from local optima. IPSO updates positions of particles based on a combination of global worst, global best with personal worst and personal best to dynamically be used in each iteration of the IPSO. The experimental result on five real-world datasets and two artificial datasets confirms that this improved version is superior to k-harmonic means and regular PSO algorithm. The results of the simulation show that the new algorithm is able to create promising solutions with fast convergence, high accuracy and correctness while markedly improving the processing time.
Electromagnetism-like Mechanism (EM) method is known as one of metaheuristics. The basic idea is one that a set of parameters is regarded as charged particles and the strength of particles is corresponding to the value of the objective function for the optimization problem. Starting from any set of initial assignment of parameters, the parameters converge to a value including the optimal or semi-optimal parameter based on EM method. One of its drawbacks is that it takes too much time to the convergence of the parameters like other meta-heuristics. In this paper, we introduce hybrid methods combining EM and the descent method such as BP, k-means and FIS and show the performance comparison among some hybrid methods. As a result, it is shown that the hybrid EM method is superior in learning speed and accuracy to the conventional methods.
Artykuł przedstawia modyfikację inicjalizacji KKZ algorytmu k-means, uwzględniającą, oprócz wzajemnych odległości środków segmentów, również rozkład gęstości pikseli. Funkcja gęstości piksela jest sumą odwrotności odległości piksela od pozostałych i jest poddawana oszacowaniu na podstawie odległości piksela od wartości średniej i wariancji wartości pikseli. W eksperymentach segmentacji podlegały cztery różne sekwencje obrazów termicznych uzyskanych metodą termografii aktywnej. Pomimo dodatkowych obliczeń podczas inicjalizacji, metoda wykazała szybszą zbieżność algorytmu z czasami bardzo podobnymi do inicjalizacji KKZ, ale mniejszym błędem końcowym segmentacji.
EN
This article presents a modification for the KKZ initialization of the k-means segmentation algorithm, which, in addition to the mutual distance of segments, takes into account the density of pixels. Pixel density is expressed asa sum of the inverse of the pixel’s distance to the other pixels and is subjected to estimation based on the distance from the mean and variance of the pixel values. In the experiments, four different sequences of thermal images were used, obtained using active thermography. Despite the additional calculations during initialization, method showed a faster convergence of the algorithm, with processing times very similar to the KKZ initialization, but with a lower final segmentation error.
W pracy podjęto zagadnienie wspomagania decyzji zakupu odpowiedniego spawalniczego źródła prądu. Zaproponowano rozwiązanie problemu z wykorzystaniem metod analizy skupień. Przedstawiono wyniki klasyfikacji 69 urządzeń prądu stałego do spawania TIG metodami Warda i k-średnich dla odpowiednio dobranego i przygotowanego zestawu cech diagnostycznych. Na podstawie wyników uznano, że analiza skupień może być skuteczną metodą wspomagania decyzji zakupu analizowanych urządzeń spawalniczych, jednak może ona stanowić jedynie wstępny etap procesu decyzyjnego, który należy poprzeć szczegółową analizą merytoryczną.
EN
The paper presents an issue of decision support purchasing of suitable welding power source. Solution of the problem with the use of cluster analysis methods has been proposed. The results of classification of 69 devices for direct TIG welding process with the Ward and k-means methods for appropriately selected and prepared set of diagnostic features have been presented. From the results, it has been concluded that the cluster analysis can be an effective method of decision support purchasing of analyzed welding equipment, but it can only be a preliminary step of decision making process, which should be supported by a detailed substantive analysis.
Artykuł przedstawia nową metodę segmentacji sekwencji obrazów termicznych wyodrębniającą obszary o różnych właściwościach cieplnych. Metoda oparta jest na korelacji położenia i kształtu segmentów w poszczególnych kadrach sekwencji. Segmentacja pozwala zmniejszyć liczbę analizowanych obszarów do kilku tysięcy razy, co stwarza realne możliwości praktycznego wykorzystania tomografii termicznej. Opisana metoda jest porównana z algorytmami klasteryzacji K-Means i FCM. Zaletą algorytmu korelacyjnego jest automatyczne wyznaczanie liczby segmentów wyjściowych.
EN
This paper presents a new method for segmentation of thermal image sequences. Its aim is to divide the sequence into segments with different thermal properties. The described algorithm is based on measurements of the position and shape correlation of the segments in successive frames of the sequence. It is composed of several stages. The first stage consists of segmenting consecutive frames of the sequence (Fig. 2). The second step is analysis of the similarity of each segment in each frame with respect to all other segments of all frames and synthesis of the intermediate segments (Fig. 4). The intermediate segments form the segmented output image using the depth buffer technique to resolve multiple pixel-to-segment assignments (Fig. 6). This method is a basis for the thermal analysis of solids, which results in discovering depth profiles of thermal properties for each area. The segmentation reduces the number of the analyzed areas down to a few thousand times, which creates real opportunities for practical application of thermal tomography. The new algorithm has been compared with the K means algorithm [2], and FCM [6], which minimizes the sum of pixel value deviations from the centers of the segments they are assigned to, for all frames of the sequence (Tab. 1). The advantage of the correlation method is automatic determination of the number of output segments in the image and maintaining the constant segmentation error when increasing the number of the processed frames.
Document clustering, which is also refered to as text clustering, is a technique of unsupervised document organisation. Text clustering is used to group documents into subsets that consist of texts that are similar to each orher. These subsets are called clusters. Document clustering algorithms are widely used in web searching engines to produce results relevant to a query. An example of practical use of those techniques are Yahoo! hierarchies of documents [1]. Another application of document clustering is browsing which is defined as searching session without well specific goal. The browsing techniques heavily relies on document clustering. In this article we examine the most important concepts related to document clustering. Besides the algorithms we present comprehensive discussion about representation of documents, calculation of similarity between documents and evaluation of clusters quality.
In this paper the clusterization dataset module for "Ontology Data Models for Data and Metadata Exchange Repository" is considered. This module makes it possible to perform texts clusterization without prior clusters end-points.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.