The prediction of the coastal bed evolution at an annual scale utilizing process-based models is usually a complex task requiring significant computational resources. To compensate for this, accelerating techniques aiming at reducing the amount of input parameters are often employed. In the framework of this research, a comprehensive evaluation of the capacity of the widely-used K-Means clustering algorithm as a method to obtain representative wave conditions was undertaken. Various enhancements to the algorithm were examined in order to improve model results. The examined tests were implemented in the sandy coastline adjacent to the port of Rethymno, Greece, utilizing an annual dataset of wave characteristics using the model MIKE21 Coupled Model FM. Model performance evaluation was carried out for each test simulation by comparing results to a “brute force” one, containing the bed level changes induced from the annual time series of hourly changing offshore sea state wave characteristics, deeming the results very satisfactory. The best-performing configurations were found to be related to the implementation of a filtering methodology to eliminate low-energy sea states from the dataset. Employment of clustering algorithms utilizing “smart” configurations to improve their performance could become a valuable tool for engineers desiring to obtain an accurate representation of annual bed level evolution, while simultaneously reducing the required computational effort.
This study aimed to analyse the effect of anthropogenic activities on the spatial distribution of total nitrogen (TN) and total phosphate (TP) in Lake Maninjau, Indonesia, during the dry season. Sampling was carried out at ten observation locations representative for various activities around the lake. Cluster analysis and ANOVA were used to classify pollutant sources and observe differences between TN and TP at each site. Concentrations of TN and TP are categorised as oligotrophic-eutrophic. The ANOVA showed spatially that some sampling locations, such as the Tanjung Sani River, floating net cages, and hydropower areas have different TN concentrations. At the same time, TP levels were consistently significantly different across sampling sites. ANOVA and cluster analysis confirmed that floating net cages were the first cluster and the primary contributor to TN and TP. The second and third clusters come from anthropogenic activities around the lake, such as agriculture, settlement, and livestock. The fourth cluster with the lowest TN and TP is the river that receives the anthropogenic activity load but has a high flow velocity. The cluster change analysis needs to be conducted when there are future changes in the composition of floating net cages, agriculture, and settlements.
Cermet coatings are one of the best surface protection of machine elements against wear. On the other hand, the most universal and economically justified method of applying such coatings is high velocity oxy-fuel (HVOF) spraying. This method makes it possible to produce coatings characterized by compact structure, low porosity and very good adhesion to the substrate. All these fundamental properties contribute to the high wear resistance of these coatings. However, carrying out full wear tests (e.g. ball-on-disc) is time-consuming, especially when it is necessary to select the proper feedstock material and carefully selected process parameters. The aim of the following researches was to statistically investigate the possibility of replacing long-term wear resistance tests with estimation of this performance on the basis of determining the fundamental mechanical properties of the coatings. Three types of coating materials were selected: WC-12Co, WC-10Co-4Cr and WC-20Cr3C2-7Ni, which were deposited on AZ31 magnesium alloy substrates from three different spray distances: 320, 360 and 400 mm. On the basis of the tests carried out and using cluster analysis techniques (the Ward and k-means methods), the relative similarity between the obtained coatings was determined. The applied methodology allowed to select from the analyzed cermet coatings such samples that were characterized by improved resistance to abrasive wear. The obtained results of the analyzes were also referred to the results of tests of resistance to abrasive wear.
The results of research on intra-European Union (EU) food trade conducted by the Eastern EU countries were presented in 1999-2019, including exports and imports. The study applied cluster analysis: Eastern EU countries’ share of intra-EU food trade increased from 5% (in 1999) to 15% (in 2019). These countries traded mostly in beverages, cereals, fruit and vegetables. Eastern EU countries traded in food mainly among themselves, including their closest neighbours, regionally and with Germany. To increase their share of exports to other EU countries, these countries could use lower food prices and the benefits of traditional approaches to food production.
W zagadnieniach geologii naftowej metody statystyczne są szeroko stosowane w petrografii, petrofizyce, geochemii, geomechanice, geofizyce wiertniczej czy sejsmice, a analiza skupień jest istotna w klasyfikacji skał – wyznaczaniu stref o pewnych własnościach, np. macierzystych lub zbiornikowych. Artykuł prezentuje użycie metod statystycznych, w tym metod analizy skupień, w procesach przetwarzania i analizy dużych zbiorów różnorodnych danych geochemicznych. Do analiz statystycznych wykorzystano literaturowe dane z analiz składu chemicznego i izotopowego gazów ziemnych. Wyniki zawierały skład chemiczny gazów ziemnych oraz skład izotopowy. Zastosowano algorytmy tzw. nienadzorowanego uczenia maszynowego do przeprowadzenia analizy skupień. Grupowania było przeprowadzone dwiema metodami: k-średnich oraz hierarchiczną. Do zobrazowania wyników grupowania metodą k-średnich można wykorzystać dwuwymiarowy wykres (funkcja fviz_cluster języka R). Wymiary na wykresie to efekt analizy głównych składowych (PCA) i są one liniową kombinacją cech (kolumn w tabeli). Wynikiem grupowania metodą hierarchiczną jest wykres nazywany dendrogramem. W artykule dodatkowo zaprezentowano wykresy pudełkowe i histogramy oraz macierz korelacji zawierającą współczynniki korelacji Pearsona. Wszystkie prace wykonano z użyciem języka programowania R. Język R, z wykorzystaniem programu RStudio, jest bardzo wygodnym i szybkim narzędziem do statystycznej analizy danych. Przy użyciu tego języka uzyskanie wymienionych powyżej wykresów, tabeli i danych jest szybkie i stosunkowo łatwe. Wyniki analiz składu gazu wydają się mało zróżnicowane. Mimo to dzięki algorytmom k-średnich i hierarchicznym możliwe było pogrupowanie danych geochemicznych na wyraźnie rozdzielne zespoły. Zarówno wartości składu izotopowego, jak i skład chemiczny pozwalają wyznaczyć grupy, które w inny sposób nie byłyby dostrzegalne.
EN
In petroleum geology, statistical methods are widely used in petrography, petrophysics, geochemistry, geomechanics, well log analysis and seismics, and cluster analysis is important for rock classification – determination of zones with certain properties, e.g., source or reservoir. This paper presents the use of the R language for statistical analysis, including cluster analysis, of large sets of diverse geochemical data. Literature data from analyses of chemical and isotopic composition of natural gases were used for statistical analyses. The results included the chemical composition of the natural gases and the isotopic composition. So-called unsupervised machine learning algorithms were used to perform the cluster analysis. Clustering was performed using two methods: k-means and hierarchical. A two-dimensional graph (function fviz_cluster) can be used to illustrate the results of the k-means clustering. The dimensions in the graph are the result of principal component analysis (PCA) and are a linear combination of the features (columns in the table). The result of hierarchical clustering is a graph called a dendrogram. The paper additionally presents box plots and histograms as well as a correlation matrix containing Pearson correlation coefficients. All work was completed using the programming language R. The R language, using the RStudio software, is a very convenient and fast tool for statistical data analysis. Obtaining the above-mentioned graphs, tables and data is quick and relatively easy, using the R language. The results of the analyses of the composition of the gas appear to have little variation. Nevertheless, thanks to k-means and hierarchical algorithms, it was possible to group the geochemical data into clearly separable groups. Both the isotopic composition values and the chemical composition make it possible to delineate groups that would not otherwise be noticeable.
Visualizing data through Czekanowski's diagram has as its aim to present how objects are related to each other. Often, obvious clusters of observations are directly visible. However, exactly delimiting them is not a straightforward task. We present here a development of the RMaCzek package that includes cluster identification in Czekanowski's diagrams.
PL
Diagram Czekanowskiego ma na celu zaprezentowanie podobieństw wewnątrz próbki statystycznej. Najczęściej widać na nim wyraźne grupowania elementów. Jednakże dokładne wyznaczenie granic między skupieniami nie jest trywialnym zagdnieniem. W niniejszej pracy przedstawiamy rozszerzoną wersję pakietu RMaCzek, która pozwala na analizę skupień w diagramach Czekanowskiego.
For many years, the amount of waste generated on a global scale has shown an increasing tendency and their management and logistic is becoming a growing problem for most countries in the world. Waste management is an important issue to be addressed, as it concerns the three basic pillars of sustainable development: social, economic, and environmental. Therefore, it seems necessary to take initiatives to reduce the amount of waste generated and improve the waste management system. The article aims to analyse changes in the way of waste management and logistics in the European Union countries and the classification of these countries on the basis of the achieved effects in waste management. The article analyses three selected factors that reflect the effects of achieving environmental objectives in waste management. The cluster analysis method was used for the analysis. It found that EU countries differ in the quality of the results achieved in waste management, depending on the achievement of environmental management and sustainability objectives. In addition, the results of the analysis showed that the time factor has a significant impact on the classification of countries. High dynamics of the quality of effects in waste management were observed in the period under review.
The article presents results of the research and mathematical modelling of the rainfall erosivity factors. Erosion, whether water, wind or resulting from soil cultivation, includes three processes – soil descaling, movement and sedimentation. Spatial characteristics of precipitation during two researched periods are similar, having certain quantitative peculiar features. A common feature is maximum precipitation in the southwest and to a lesser extent in the eastern part of the region. Minimum precipitation is typical for the western part of the region. Peculiar feature of the second period of research is increase of contrasting effect of precipitation regime, when minimum values of precipitation decrease and maximum ones increase. Enhancement of contrasting effect of precipitation in space or time may cause increased intensity of erosion processes to the extent where the intensity of precipitation increases due to such contrasting effect. Thus, doubtless interest lies in the research of greater spatial or time contrasting effect of precipitation regime to activate water erosion. Thus, spatial peculiarities of distribution of precipitation within territory under study and time patterns correlate, but have their own special features. Clearly, total amount of precipitation as well as time distribution as a marker of correlation of intensification factor of erosion processes and defence mechanisms of vegetative cover are dominant for total losses of soil due to erosion. Coincidence of time of intense precipitation in summer and availability of vegetative cover reduces erosion. Nevertheless, continuance of intense precipitation when harvesting is started may cause intensification of water erosion of soil. Use of spatial variables and regression equations for spatial data calibration helped to estimate the spatial variation of precipitation on the territory under study. Comparison of two periods of research showed that in 2010–2016 significant reduction of rainfall erosivity factor has taken place in comparison with the previous period 9.6–65.4 MJ mm ha–1 h–1 per year. In Turiyskyi and Kovelskyi district changes in rainfall erosivity factor were minimal (9.6 and 16.7 MJ mm ha–1 h–1 per year respectively). Conversely, in Ivanytskyi and Gorokhivskyi districts changes were the most significant – 58.1 and 65.4 MJ mm ha–1 h–1 per year respectively.
9
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
Non-alloy steels constitute a large group of steels characterised by diversified chemical composition, structural morphology and a wide range of mechanical properties (determining weldability). The paper presents results of multidimensional analyses (based on cluster analysis) of 110 selected unalloyed steel grades. Properties adopted as diagnostic features included the chemical composition, mechanical properties (yield point) and values of selected indicators concerning susceptibility to technological crack formation. The analyses (performed using Ward’s and k-means methods) resulted in a division of the 110 steels into five steel groups (clusters). The comparison of results obtained using two clustering methods and involving various classification criteria revealed that multidimensional analyses constituted a prospective method making it possible to assess the weldability of steels. However, results of such multidimensional analyses should be subjected to thorough and substantive analyses.
PL
Stale niestopowe stanowią liczną grupę stali charakteryzujących się zróżnicowanym składem chemicznym, budową strukturalną oraz szerokim zakresem właściwości mechanicznych, co determinuje ich spawalność. W artykule przedstawiono wyniki analiz wielowymiarowych z wykorzystaniem analizy skupień wybranych 110 gatunków stali niestopowych. Jako cechy diagnostyczne przyjęto skład chemiczny, właściwości mechaniczne (granicę plastyczności) oraz wartości wybranych wskaźników skłonności do pęknięć technologicznych. Analizy przeprowadzono metodami Warda i k-średnich, uzyskując podział na pięć grup stali. Z porównania wyników otrzymanych dwoma metoda grupowania i dla różnych kryteriów klasyfikacji wynika, że analizy wielowymiarowe stanowią perspektywiczną metodę oceny spawalności stali, jednak ich wyniki należy poddać starannej analizie merytorycznej.
The aim of the statistical analyses carried out was to identify similarities and to point out differences between the various tributaries of the Narew River, to identify the factors and processes responsible for the transformations occurring in the aquatic environment and finally, to identify the main sources of pollution in the river catchment. For the purposes of statistical analysis, the results of studies conducted as part of diagnostic monitoring by the General Inspectorate for Environmental Protection in 2017–2018 were used. The studies included 8 measurement points located directly on the Narew River and 17 points located on its selected left and right tributaries. Analysis of the collected results indicates that the chemical condition of the water in the Narew catchment is assessed as being poor. This observation may be due to the fact that the Narew catchment is mainly used for agricultural purposes and, in addition, there is a relatively large number of potential anthropogenic sources. As part of the analysis, two potential sources of pollution affecting water quality in the Narew catchment were identified, which include surface run-off and treated wastewater inflow.
The publication contains the results of research in the field of cluster analysis carried out using data quoted on the Day-Ahead Market of TGE S.A. Two methods were used in the analysis, one hierarchical known as the Ward’s method, and the other non-hierarchical - the k-means method. Many interesting research results have been obtained, which are illustrated, among others, in in the form of dendrograms, silhouette graphs and graphs in the form of clusters. Data on the volume and the volumeweighted average price of electricity were examined for various types of quotations: fixing 1, fixing 2 and continuous quotations. The research was carried out in the MATLAB and Simulink environments using a library called Machine and Statistics Learning Toolbox. Selected test results were interpreted.
The issue of sustainability, or corporate social responsibility (CSR), has become a widely discussed topic in all industrial production sectors. The article focuses on the automobile industrial sector because it is not only the most dynamically developing industrial area but also because it is one of the driving forces of local economies in many European countries. This paper aims to reveal possible differences and diversity of understanding of priorities in the CSR activities provided by automotive suppliers in European countries. Based on the meta-analysis, 73 actions were listed, and a questionnaire survey was performed. Cluster analysis and Fisher’s exact test were applied to find out whether the attitude towards sustainability differs dependent on the position in the supply chain or on the company size.
Under the recent background of ‘Green Shipping’ and rising fuel prices, it is very important to reduce the fuel consumption rate of ships, which is directly affected by the performance of the main engine. A reasonable maintenance schedule can optimise the performance of the main engine. However, a traditional maintenance schedule is based on the navigation distance and time, ignoring many other factors, such as a harsh working environments and frequently changing operating conditions, which will lead to faster performance degradation. In this study, a real-time evaluation method combing big data of ship energy efficiency with physics-based analysis is proposed to judge the degradation of main engine performance and assist in determining the maintenance schedule. Firstly, based on the developed ship energy efficiency big data platform, the distribution statistics and comparison of different operating states are carried out. Gaussian mixture model (GMM) and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) are used to cluster the data and the high-density data areas are obtained as the analysis points. Then, the data of the analysis points are polynomial fitted, by the least square method, to obtain the propulsion characteristics curves, load characteristic curves, and speed characteristic curves, which can be used to observe the performance degradation of the main engine. The results show that this method can effectively monitor the degradation degree of the main engine performance, and is of great significance to fuel efficiency improvements and greenhouse gas (GHG) emissions reduction.
The aim of the study was to compare two grouping methods for regionalisation of watersheds, which are similar in respect of low flow and chosen catchments parameters (physiographic and meteorological). In the study, a residual pattern approach and cluster analysis, i.e. Ward’s method, were used. The analysis was conducted for specific low flow discharge q95 (dm3∙s-1∙km-2). In the analysis, 50 catchments, located in the area of the upper and central Vistula River basin, were taken. Daily flows used in the study were monitored from 1976 to 2016. Based on the residual pattern approach (RPA) method, the analysed catchments were classified into two groups, while using the cluster analysis method (Ward’s method) - into five. The predictive performance of the complete regional regression model checked by cross-validation R2cv was 47% and RMSEcv= 0.69 dm3∙s-1∙km-2. The cross validation procedure for the cluster analysis gives a predictive performance equal to 33% and RMSEcv= 0.81 dm3∙s-1∙km-2. Comparing both methods, based on the cross-validated coefficient of determination (R2cv), it was found that the residual pattern approach had a better fit between predicted and observed values. The analysis also showed, that in case of both methods, an overestimation of specific low flow discharge q95 was observed. For the cross-validation method and the RPA method, the PBIAS was -10%. A slightly higher value was obtained for the cross-validation method and models obtained using cluster analysis for which the PBIAS was -13.8%.
In this study, the groundwater quality and spatial distribution of the Basra province in the south of Iraq was assessed and mapped for drinking and irrigation purposes. Groundwater samples (n = 41) were collected from deep wells in the study area to demonstrate, estimate and model the Water Quality Index (WQI). The analysis of water samples integrated with GIS-based IDW technique was used to express the spatial variation in the study area with consideration of WQI. The physicochemical parameters, including pH, sodium (Na+), electrical conductivity (EC), chloride (Cl-), total dissolved solids (TDS), calcium (Ca2+), nitrate (NO3-), sulfate (SO42-), magnesium (Mg2+), and bicarbonate (HCO3-) were identified for groundwater quality assessment. The results of calculated WQI classify groundwater into three sorts. The results of WQI showed that 2.5%, 2.5% and 95% of the groundwater samples were classified as poor/very poor/unsuitable for drinking, respectively. The GIS tools integrated with statistical techniques are utilized for spatial distribution and description of water quality. Correlation analysis of groundwater data revealed that some parameters have actually a relationship that is strong with the other parameters and they share a common source of origin. Multivariate statistical techniques, especially cluster analysis (CA) and factor analysis (FA), were applied for the evaluation of spatial variations of forty-one selected groundwater samples. Cluster analysis confirmed that some different locations of wells have comparable sourced elements of water pollution, whereas factor analysis yielded three factors which are accountable for groundwater quality variations, clarifying more than 72% of the total variance of the data and permitted to group the preferred water quality. MultiLayer Perceptron (MLP) models were applied in modeling the water quality index. Comparing different result values of the MLP network suggested that the values of MSE and r for the selected model are 0.1940 and 0.9998, respectively. Finally, it can be revealed that the MLP network precisely predicted the output, i.e. the WQI values.
Groundwater is one of the most important natural resources that is overexploited and extensively polluted by human activity. Furthermore, drinking this dirty water might have major consequences for human health. Before using groundwater, it is consequently required to conduct a precise and regular assessment of its quality. Furthermore, for five monitoring stations in the Khemisset-Tiflet region, cluster analysis (CA), principal component analysis (PCA), and a fuzzy logic technique were utilized to analyze water quality. The CA classified the sample sites into three categories. The PCA identified temporal characteristics of water quality status. Group I include stations characterized by high temperature and low DO, COD, and BOD5 values. Group II includes stations characterized by high values of pH and low concentrations of NO3-, Cl-, SO42- and turbidity. Group III includes stations characterized by high concentrations of NO3-, Cl-, SO42- and turbidity and low concentrations of pH. In addition, fuzzy logic to reveal more information about groundwater quality. In effect, water quality in spring and winter was the best; the parameters responsible for the deterioration of water quality are NO3-, Cl-, SO42- and turbidity.
W dzisiejszych czasach zanieczyszczenie powietrza jest jednym z głównych, globalnych zagrożeń dla człowieka i środowiska. Prognozowanie zanieczyszczeń powietrza możliwe jest dzięki modelom sztucznej inteligencji, w tym sztucznym sieciom neuronowym. W artykule przedstawiono model prognozowania smogu z wykorzystaniem sztucznych sieci neuronowych stworzony na podstawie wielkości stężenia pyłów PM10 w Nowej Rudzie w okresie 2019-2020 oraz danych meteorologicznych. Do prognozowania wykorzystano sieć neuronową typu perceptron wielowarstwowy. Aby poprawić jakość modelu wykorzystano analizę skupień, dzięki której otrzymano dokładniejszą prognozę. Przeprowadzone badania wskazują, że wykorzystanie analizy skupień do grupowania wielkości PM10 w zależności od aktualnej temperatury minimalnej znacząco wpływa na jakość prognozy. Wynika to z korelacji niskiej temperatury powietrza, która wymusza ogrzewanie mieszkań, ze wzrostem wielkości niskiej emisji. Zastosowanie zaproponowanej metodyki prognozowania umożliwiło otrzymanie neuronowego modelu predykcji PM10, w którym zależność danych rzeczywistych i prognozowanych wynosiła r = 0.99, a średniokwadratowy błąd MSE od 0.021 do 0.159. Tak dokładne prognozowanie zanieczyszczenia powietrza może się przyczynić do poprawy jakości życia i ochrony społeczeństwa przed smogiem.
EN
Nowadays, air pollution is one of the main global threat to the environment and human. Air pollution forecasting is possible thanks to artificial intelligence models, including artificial neural networks. The article presents a smog forecast model with the use of neural artificial networks based on the volume of PM10 in Nowa Ruda in the period 2019-2020 and meteorological data. A multilayer perceptron neural network type was used for prediction. To improve the quality of the model, a cluster analysis was used, thanks to which a more accurate forecast was obtained. The conducted research shows that the use of cluster analysis to group PM10 values depending on the actual minimum temperature significantly improves the quality of the forecast. This is due to the correlation of low air temperature, which causes home heating, with an increase in low emissions. Using the proposed methodology, the PM10 neural prediction models were obtained, for which the relationship between the observed and predicted data was r = 0.99 and the mean square error MSE from 0.021 to 0.159. Such accurate forecasting of air pollution may contribute to the improvement of the quality of life and protection of the society against smog.
18
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
The introduction of food delivery apps, facilitated by the global pandemic, has created a significant disruption in the hospitality industry. However, how consumers use mobile applications in the context of daily choices and food consumption has not been fully explored. Using data collected through an online questionnaire comprising 165 food delivery app subscribers, k-mean cluster analysis was performed to classify users based on their internal motivations. The results reveal three distinct groups: Health-conscious Eaters, Food Enthusiasts, and Lifetime Diners. Practically, the present exploratory study assists FDA providers to better identify customers, so potentially optimizing marketing initiatives, and maximizing profitability.
W dwóch poprzednich numerach miesięcznika „Napędy i Sterowanie” opisywałem, czym jest sztuczna inteligencja (AI). Dla ożywienia narracji porównałem sztuczną inteligencję do archipelagu wysp, a poszczególne metody AI opisałem jako wyspy (rozumiane oczywiście metaforycznie, ale na zasadzie umowy pisane bez cudzysłowu). W grudniowym numerze NiS (z ubiegłego roku) opisałem w ten sposób metody symboliczne, sieci neuronowe i systemy ekspertowe. W numerze styczniowym prezentowałem metody zbiorów rozmytych i logiki rozmytej, zbiory przybliżone i rozpoznawanie obrazów (pattern recognition). Dzisiaj kilka kolejnych metod – opisywanych jako wyspy, ale zaprezentowanych solidnie poprzez podanie najważniejszych cech rozważanych metod. Jako pierwsze omówimy metody analizy skupień.
The paper examines the impact of the COVID-19 pandemic on macroeconomic activity in the selected European countries. The studies are based on monthly and quarterly indicators of GDP, unemployment rates and key indicators of the tourism sector. To present how COVID-19 has affected these macroeconomic variables, statistic data from the three periods are compared. Namely, data are collected from the pre-pandemic period, i.e. the fourth quarter of 2019 as the reference period, the second period covers the first quarter of 2020 and means the beginning of the pandemic, and the third one covers second quarter of 2020, during which the pandemic has spread to all the analyzed countries. The following statistical techniques are used in the research: regression analysis, the hierarchical grouping of agglomerations, k-means method, and selected non-parametric tests (Kruskal-Wallis test for a selected group of countries and Kolmogorov-Smirnov test for a selected pair of countries). The results show the significant impact of the pandemic on the level of gross domestic product, unemployment rate and turism sector. In most cases, a correlation between incidence of COVID-19 infections, unemployment rate and GDP is observed. The statistical techniques also allow to demonstrate the similarities and differences in the response of the economies to the COVID-19 pandemic. Central Statistical Offices of the selected countries are the main data source and for all calculations Statistica version 13.3. is used.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.