Ograniczanie wyników
Czasopisma help
Autorzy help
Lata help
Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników

Znaleziono wyników: 18

Liczba wyników na stronie
first rewind previous Strona / 1 next fast forward last
Wyniki wyszukiwania
Wyszukiwano:
w słowach kluczowych:  k-means clustering
help Sortuj według:

help Ogranicz wyniki do:
first rewind previous Strona / 1 next fast forward last
EN
Tool condition affects the tolerances and the energy consumption and hence needs to be monitored. Artificial intelligence (AI) based data-driven techniques for tool condition determination are proposed. Unfortunately, the data-driven techniques are data-hungry. This paper proposes a methodology for classification based on unsupervised learning using limited unlabeled training data. The work presents a multi-class classification problem for the tool condition monitoring. The principal component analysis (PCA) is employed for dimensionality reduction and the principal components (PCs) are used as input for classification using k-means clustering. New collected data is then projected on the PC space, and classified using the clusters from the training. The methodology has been appliedforclassification of tool faults in 6 classes in a vertical milling center. The use of limited input parameters from the user makes the method ideal for monitoring a large number of machines with minimal human intervention. Furthermore, due to the small amount of data needed for the training, the method has the potential to be transferable.
EN
Machine learning has been widely used in manufacturing, leading to significant advances in diverse problems, including the prediction of wear and remaining useful life (RUL) of machine tools. However, the data used in many cases correspond to simple and stable processes that differ from practical applications. In this work, a novel dataset consisting of eight cutting tools with complex tool paths is used. The time series of the tool paths, corresponding to the three-dimensional position of the cutting tool, are grouped according to their shape. Three unsupervised clustering techniques are applied, resulting in the identification of DBA-k-means as the most appropriate technique for this case. The clustering process helps to identify training and testing data with similar tool paths, which is then applied to build a simple two-feature prediction model with the same level of precision for RUL prediction as a more complex four-feature prediction model. This work demonstrates that by properly selecting the methodology and number of clusters, tool paths can be effectively classified, which can later be used in prediction problems in more complex settings.
EN
In this paper, the authors present an algorithm for determining the location of wireless network small cells in a dense urban environment. This algorithm uses machine learning, such as k-means clustering and spectral clustering, as well as a very accurate propagation channel created using the ray tracing method. The authors compared two approaches to the small cell location selection process – one based on the assumption that end terminals may be arbitrarily assigned to stations, and the other assuming that the assignment is based on the received signal power. The mean bitrate values are derived for comparing different scenarios. The results show an improvement compared with the baseline results. This paper concludes that machine learning algorithms may be useful in terms of small cell location selection and also for allocating users to small cell base stations.
EN
This paper presents unsupervised change detection method to produce more accurate change map from imbalanced SAR images for the same land cover. This method is based on PSO algorithm for image segmentation to layers which classify by Gabor Wavelet filter and then K-means clustering to generate new change map. Tests are confirming the effectiveness and efficiency by comparison obtained results with the results of the other methods. Integration of PSO with Gabor filter and k-means will providing more and more accuracy to detect a least changing in objects and terrain of SAR image, as well as reduce the processing time.
EN
Nuclear power plant process systems have developed great lyover the years. As a large amount of data is generated from Distributed Control Systems (DCS) with fast computational speed and large storage facilities, smart systems have taken over analysis of the process. These systems are built using data mining concepts to understand the various stable operating regimes of the processes, identify key performance factors, makes estimates and suggest operators to optimize the process. Association rule mining is a frequently used data-mining conceptin e-commerce for suggesting closely related and frequently bought products to customers. It also has a very wide application in industries such as bioinformatics, nuclear sciences, trading and marketing. This paper deals with application of these techniques for identification and estimation of key performance variables of a lubrication system designed for a 2.7 MW centrifugal pump used for reactor cooling in a typical 500MWe nuclear power plant. This paper dwells in detail on predictive model building using three models based on association rules for steady state estimation of key performance indicators (KPIs) of the process. The paper also dwells on evaluation of prediction models with various metrics and selection of best model.
EN
Any type of biomedical screening emerges large amounts of data. As a rule, these data are unprocessed and might cause problems during the analysis and interpretation. It can be explained with inaccuracies and artifacts, which distort all the data. That is why it is crucial to make sure that the biomedical information under analysis was of high quality to omit to receive possibly wrong results or incorrect diagnosis. Receiving qualitative and trustworthy biomedical data is a necessary condition for high-quality data assessment and diagnostics. Neural networks as a computing system in data analysis provide recognizable and clear datasets. Without such data, it becomes extremely difficult to make a diagnosis, predict the course of the disease, and treatment result. The object of this research was to define, describe, and test a new approach to the analysis and preprocessing of the biomedical images, based on segmentation. Also, it was summarized different metrics for assessing image quality depending on the purpose of research. Based on the collected data, the advantages and disadvantages of each of the methods were identified. The proposed method of analysis and noise reduction was applied to the results of computed tomography lungs screening. Based on the appropriate evaluation metrics, the obtained results were evaluated quantitatively and qualitatively. As a result, the expediency of the proposed algorithm application was proven.
7
Content available remote Training subset selection for support vector regression
EN
As more and more data are available, training a machine learning model can be extremely intractable, especially for complex models like Support Vector Regression (SVR) train- ing of which requires solving a large quadratic programming optimization problem. Selecting a small data subset that can effectively represent the characteristic features of training data and preserve their distribution is an efficient way to solve this problem. This paper proposes a systematic approach to select the best representative data for SVR training. The distribution of both predictor and response variables are preserved in the selected subset via a 2-layer data clustering strategy. A 2-layer step-wise greedy algorithm is introduced to select best data points for constructing a reduced training set. The proposed method has been applied for predicting deck's win rates in the Clash Royale Challenge, in which 10 subsets containing hundreds of data examples were selected from 100k for training 10 SVR models to maximize their prediction performance evaluated using R-squared metric. Our final submission having a R2 score of 0.225682 won the 3rd place among over 1200 solutions submitted by 115 teams.
8
Content available remote Efficient support vector regression with reduced training data
EN
Support Vector Regression (SVR) as a supervised machine learning algorithm have gained popularity in various fields. However, the quadratic complexity of the SVR in the number of training examples prevents it from many practical applications with large training datasets. This paper aims to explore efficient ways that maximize prediction accuracy of the SVR at the minimum number of training examples. For this purpose, a clustered greedy strategy and a Genetic Algorithm (GA) based approach are proposed for optimal subset selection. The performance of the developed methods has been illustrated in the context of Clash Royale Challenge 2019, concerned with decks' win rate prediction. The training dataset with 100,000 examples were reduced to hundreds, which were fed to SVR training to maximize model prediction performance measured in validation R2 score. Our approach achieved the second highest score among over hundred participating teams in this challenge.
EN
This article aims at the image processing of surface uniformity and thermally bonded points uniformity in polypropylene spunbonded non-wovens. The investigated samples were at two different weights and three levels of non-uniformity. An image processing method based on the k-means clustering algorithm was applied to produce clustered images. The best clustering procedure was selected by using the lowest Davies-Bouldin index. The peak signal-to-noise ratio (PSNR) image quality evaluation method was used to choose the best binary image. Then, the non-woven surface uniformity was calculated using the quadrant method. The uniformity of thermally bonded points was calculated through an image processing method based on morphological operators. The relationships between the numerical outcomes and the empirical results of tensile tests were investigated. The results of image processing and tensile behavior showed that the surface uniformity and the uniformity of thermally bonded points have great impacts on tensile properties at the selected weights and non-uniformity levels. Thus, a sample with a higher level of uniformity and, consequently, more regular bonding points with further bonding percentage depicts the best tensile properties.
EN
Data obtained through the monitoring of the water environment often includes a number of indicators, and is frequently collected from a large area or over a long period of time. Analysis of such data can be problematic. The division of elements which have a certain degree of similarity into subgroups may facilitate data analysis and provide indications as to the direction of the analysis. One tool for the separation of such groups of similar elements is cluster analysis. This paper describes the two most commonly used cluster analysis algorithms and summarises the results of several applications of cluster analysis in water monitoring.
PL
Dane monitoringu środowiska wodnego zawierają często pomiary wielu wskaźników, a także bywają zbierane z dużego obszaru czy w długim okresie. Analiza takich danych może być utrudniona. Podział ich na podgrupy, których elementy wykazują pewne podobieństwo, może przyczynić się do łatwiejszej analizy oraz dostarczyć przesłanek co do jej kierunku. Jednym z narzędzi wydzielania takich grup podobieństwa jest analiza skupień. Praca przedstawia opis dwóch najczęściej stosowanych algorytmów analizy skupień oraz streszcza rezultaty kilku zastosowań analizy skupień w monitoringu środowiska wodnego.
EN
Purpose: Automatic Optical Inspection (AOI) systems, used in electronics industry have been primarily developed to inspect soldering defects of Surface Mount Devices (SMD) on a Printed Circuit Board (PCB). However, no commercially available AOI system exists that can be integrated to a desktop soldering robotic system, which is capable of identifying soldering defects of Through Hole Technology (THT) solder joints along with the soldering process. In our research, we have implemented an AOI platform that is capable of performing automatic quality assurance of THT solder joints in a much efficient way. In this paper, we have presented a novel approach to identify soldering defects of THT solder joints, based on the location of THT component lead top. This paper presents the methodologies that can be used to precisely identify and localize THT component lead inside a solder joint. Design/methodology/approach: We have discussed the importance of lead top localization and presented a detailed description on the methodologies that can be used to precisely segment and localize THT lead top inside the solder joint. Findings: It could be observed that the precise localization of THT lead top makes the soldering quality assurance process more accurate. A combination of template matching algorithms and colour model transformation provide the most accurate outcome in localizing the component lead top inside solder joint, according to the analysis carried out in this paper. Research limitations/implications: When the component lead top is fully covered by the soldering, the implemented methodologies will not be able to identify the actual location of it. In such a case, if the segmented and detected lead top locations are different, a decision is made based on the direction in which the solder iron tip touches the solder pad. Practical implications: The methodologies presented in this paper can be effectively used to have a precise localization of component lead top inside the solder joint. The precise identification of component lead top leads to have a very precise quality assurance capability to the implemented AOI system. Originality/value: This research proposes a novel approach to identify soldering defects of THT solder joints in a much efficient way based on the component lead top. The value of this paper is quite high, since we have taken all the possibilities that may appear on a solder joint in a practical environment.
EN
This paper presents an implementation of the k-means clustering method, to segment cross sections of X-ray micro tomographic images of lamellar Titanium alloys. It proposes an approach for estimating the optimal number of clusters by analyzing the histogram of the local orientation map of the image and the choice of the cluster centroids used to initialize k-means. This is compared with the classical method considering random coordinates of the clusters.
PL
W artykule przedstawiono implementację metody klasteryzacji k-średnich, do segmentacji dwuwymiarowych rentgenowskich obrazów mikro tomograficznych lamelarnych stopów tytanu. Zaproponowano metody szacowania optymalnej liczbę klastrów oraz wyboru centro idów poprzez analizę histogramu mapy lokalnych kierunków obrazu. Dokonano porównania zaproponowanych metod z losowym doborem początkowego położenia klastrów.
EN
In this article we propose a new clustering algorithm for combinations of continuous and nominal data. The proposed algorithm is based on embedding of the nominal data into the unit sphere with a quadrance metrics, and adaptation of the general k-means clustering algorithm for the embedding data. It is also shown that the distortion of new embedding with respect to the Hamming metrics is less than that of other considered possibilities. A series of numerical experiments on real and synthetic datasets show that the proposed algorithm provide a comparable alternative to other clustering algorithms for combinations of continuous and nominal data.
14
Content available remote Data mining approach to Image feature extraction in old painting restoration
EN
In this paper a new approach to image segmentation was discussed. A model based on a data mining algorithm set on a pixel level of an image was introduced and implemented to solve the task of identification of craquelure and retouch traces in digital images of artworks. Both craquelure and retouch identification are important steps in art restoration process. Since the main goal is to classify and understand the cause of damage, as well as to forecast its further enlargement, a proper tool for a precise detection of the damaged area is needed. However, the complex nature of the pattern is a reason why a simple, universal detection algorithm is not always possible to be implemented. Algorithms presented in this work apply mining structures which depend of expandable set of attributes forming a feature vector, and thus offer an elastic structure for analysis. The result obtained by our method in craquelure segmentation was improved comparing to the results achieved by mathematical morphology methods, which was confirmed by a qualitative analysis.
EN
Accurate models for electric power load forecasting are essential to the operation and planning for the electric industry. They have many applications including energy purchasing, generation, distribution, and contract evaluation. This paper proposes the methods of short-term load forecasting using the k-means clustering. Two approaches are presented based on the similarity of the load sequence patterns. In the first one, each cluster is created from two preprocessed sequences of load time series: one preceding the forecast moment and the forecasted one. In the forecast procedure only the first part is presented to the model. The second forecasted part is reconstructed from the cluster closest to the first part. In the second approach both sequences are divided into clusters independently. After clustering the empirical probabilities that the forecasted sequence is associated to cluster j when the corresponding input sequence is associated to cluster i are calculated. The forecasted sequence for the new input sequence is formed from cluster centroids using these conditional probabilities. The suitability of the proposed approaches is illustrated through an application to real load data.
PL
W tym artykule proponuje się metody prognozowania krótkoterminowego oparte na klasteryzacji k-średnich. Zaprezentowano dwa podejścia wykorzystujące podobieństwo obrazów sekwencji szeregu czasowego obciążeń. W pierwszym podejściu, każdy klaster tworzony jest z dwóch przetworzonych sekwencji szeregu czasowego obciążeń: poprzedzającej moment prognozy i prognozowanej. W procedurze prognostycznej tylko pierwsza sekwencja jest prezentowana na wejście modelu. Druga sekwencja, prognozowana, rekonstruowana jest z klastera najbliższego do sekwencji pierwszej. W drugim podejściu obie sekwencje dzielone są na grupy niezależnie. Po fazie grupowania wyznacza się empiryczne prawdopodobieństwa, że prognozowana sekwencja należy do grupy j, pod warunkiem, że odpowiadająca jej sekwencja poprzedzająca należy do grupy i. Sekwencja prognozowana dla sekwencji wejściowej formowana jest z centroidów klasterów, przy użyciu tych warunkowych prawdopodobieństw. Skuteczność proponowanych metod zilustrowano przykładami prognoz wykonanych na rzeczywistych danych.
EN
In this paper, we present a novel approach to building of a probabilistic model of the data set, which is further used by the K-means clustering algorithm. Considering K-means with respect to the probabilistic model, requires incorporating of a probabilistic distance, which provides us with measure of similarity between two probability distributions, as the distance measure. We use various kinds of probabilistic distances in order to evaluate their effectiveness when applied to the algorithm with the proposed model of the analyzed data. Further, we report the results of experiments with the discussed clustering algorithm in the field of sound recognition and choose these probabilistic distances, which correspond to the highest clustering performance. As a reference technique, we used the traditional K-means algorithm with the most commonly employed Euclidean distance. Our experiments have shown that the presented method outperforms the traditional K-means algorithm, regardless of the statistical distance applied.
PL
W niniejszej pracy zaprezentowano nowy sposób budowy probabilistycznego modelu zbioru danych, analizowanych przez algorytm klasteryzacji K-średnich. Rozważanie metody K-średnich w odniesieniu do modelu probabilistycznego, narzuca wymaganie wykorzystania odległości probabilistycznej, będącej miarą podobieństwa pomiędzy dwoma rozkładami prawdopodobieństwa, jako miary odległości w algorytmie. W pracy wykorzystano różne typy odległości probabilistycznych, w celu oceny skuteczności ich zastosowania w algorytmie z proponowanym modelem analizowanych danych. Przedstawione zostały również wyniki badań omawianego algorytmu w dziedzinie rozpoznawania dźwięku. Jako punkt odniesienia wykorzystany został tradycyjny algorytm K-średnich z najczęściej stosowaną odległością Euklidesa. Wyniki przeprowadzonych badań pozwalają stwierdzić, iż zaprezentowana metoda umożliwia osiągnięcie lepszych rezultatów klasteryzacji niż klasyczny algorytm K-średnich, w przypadku każdej zastosowanej odległości statystycznej.
EN
This paper presents an application of k-means clustering in preliminary data analysis which preceded the choice of input variables for the system supporting the decision about stock purchase or sale on capital markets. The model forecasting share prices issued by companies in the food-processing sector quoted at the Warsaw Stock Exchange was created in STATISTICA 7.1. It was based on neural modeling and allowed for the assessment of changes direction in securities values (increase, decrease) and generates the quantitative forecast of their future price.
18
Content available remote A Refined VQ-Based Image Compression Method
EN
This paper presents a refined VQ-based image compression method which modifies the traditional VQ-based image compression method. The refined VQ-based image compression method further lossless encodes the compression data which is generated by the traditional VQ-based image compression method into a set of compression codes. The index image generated by the refined method is a contracted image of the compressed image, which can be used as the preview of the compressed image on image management. Although the PSNRs of the reconstructed images that are decoded by the refined VQ-based image compression method and the traditional VQ-based image compression method are the same, the refined VQ-based image compression method provides a better storage efficiency.
first rewind previous Strona / 1 next fast forward last
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.