Wyniki wyszukiwania - BazTech

1

Clustering based on eigenvectors of the adjacency matrix

Lucińska M., Wierzchoń S. T.

International Journal of Applied Mathematics and Computer Science

|

2018

|

Vol. 28, no. 4

771--786

EN

The paper presents a novel spectral algorithm EVSA (eigenvector structure analysis), which uses eigenvalues and eigenvectors of the adjacency matrix in order to discover clusters. Based on matrix perturbation theory and properties of graph spectra we show that the adjacency matrix can be more suitable for partitioning than other Laplacian matrices. The main problem concerning the use of the adjacency matrix is the selection of the appropriate eigenvectors. We thus propose an approach based on analysis of the adjacency matrix spectrum and eigenvector pairwise correlations. Formulated rules and heuristics allow choosing the right eigenvectors representing clusters, i.e., automatically establishing the number of groups. The algorithm requires only one parameter—the number of nearest neighbors. Unlike many other spectral methods, our solution does not need an additional clustering algorithm for final partitioning. We evaluate the proposed approach using real-world datasets of different sizes. Its performance is competitive to other both standard and new solutions, which require the number of clusters to be given as an input parameter.

2

Modelling the demand for cement: The case of Poland and Spain

Wiliński D., Kantorowicz J., Wierzchoń S. T.

Journal of Building Chemistry

|

2016

|

Vol. 1, iss. 1

69-83

EN

The paper develops a new tool for forecasting the demand for cement and tests it on the data from Poland and Spain. Predicting the demand for cement is a key issue from the perspective of the cement manufacturers. Forecasting this demand helps businesses determine, among others, the level of production, future revenue stream and purchase of raw materials. The hybrid models employed in this paper consists of Seasonal Autoregressive Integrated Moving Average with Exogenous Variables (SARIMAX) model and Artificial Neural Network (ANN). The SARIMAX model was initially used to forecast the demand for cement. The resulting forecasting errors were further corrected with ANN, which was built to account for the nonlinear tendencies that the SARIMAX technique could not identify. The forecasting errors from the hybrid model were compared with the errors from ARIMA-type and the ANN models working separately. The results indicate that the hybrid models outperform of the models used separately. If implemented, this methodology may become a powerful decisionmaking tool for cement industry.

3

Accelerating PageRank computations

Wierzchoń S. T., Kłopotek M. A., Ciesielski K., Czerski D., Dramiński M.

Control and Cybernetics

|

2011

|

Vol. 40, no 2

259-274

EN

Different methods for computing PageRank vectors are analysed. Particularly, we note the opposite behavior of the power method and the Monte Carlo method. Further, a method of reducing the number of iterations of the power method is suggested.

4

On the distance norms for detecting anomalies in multidimensional datasets

Chmielewski A., Wierzchoń S. T.

Zeszyty Naukowe Politechniki Białostockiej. Informatyka

|

2007

|

Z. 2

39-49

EN

One of the key parameters of algorithms for anomaly detection is the metric (norm) applied to calculate the distance between every two samples which reflect its proximity. It is especially important when we operate on real-valued high dimensional datasets, i.e. when we deal with the problem of intruders detection in computer networks. As observed, the most popular Euclidean norm becomes meaningless in higher than 15-dimensional space. This means that other norms should be investigated to improve the effectiveness of real-valued negative selection algorithms. In this paper we present results for the following norms: Minkowski, fractional distance and cosine.

PL

Jednym z kluczowych parametrów algorytmów wykrywania anomalii jest metryka (norma) służąca do obliczania odległości pomiędzy dwiema próbkami, która odzwierciedla ich podobieństwo. Jest ona szczególnie istotna w przypadkach operowania na zbiorach o wielu wymiarach takich, z jakimi mamy do czynienia w przypadku wykrywania intruzów w sieciach komputerowych. Zaobserwowano, że najczęściej stosowana norma euklidesowa staje się bezużyteczna w przestrzeniach o wymiarach większych niż 15. Oznacza to konieczność stosowania innych norm, które pozwoliłyby na zwiększenie skuteczności algorytmu selekcji negatywnej o wartościach rzeczywistych. W artykule prezentujemy wyniki uzyskane dla normy Minkowskiego, Lm, przy zmianach parametru m w zakresie (0, 2] oraz dla odległości kosinusowej.

5

Dual representation of samples for negative selection issues

Chmielewski A., Wierzchoń S. T.

Computer Assisted Mechanics and Engineering Sciences

|

2007

|

Vol. 14, No. 4

579-590

EN

This paper presents a new dual model combining binary and real-valued representations of samples for negative selection algorithms. Recent research show that the two types of encoding can produce quite good results for some types of datasets when they are applied separately in such algorithms. Besides a number of efficient algorithms, various affinity (or similarity) functions fitted to particular implementation was investigated. Basing on a series of experiments, we propose a dual representation enabling overcome some of the existing drawbacks of these algorithms, and allowing significant speed up the classification process. This new model was designed mainly for detecting anomalies in real-time applications, were the time of classification is crucial, e.g. intrusion detection systems.

6

Experiments with the V-Detector algorithm

Chmielewski A., Wierzchoń S. T.

Systems Science

|

2006

|

Vol. 32, no 4

55-63

EN

V-Detector is real-valued negative selection algorithm designed to detect anomalies in datasets containing real-valued data. Many of the previous experiments were focused on analysis of usability of this algorithm to detect intruders in computer network. Intrusion Detection System (IDS) should be efficient and reliable due to a large number of network connections and their diversity. Additionally, every connection is described as a record containing tens of numerical and symbolic attributes. We show that choosing appropriate representation of "typical" connections and smart decomposition of the learning data it is possible to obtain quite efficient and cheap algorithm detecting mom-typical connections.

7

Map Quality Measurements for GNG and SOM based Document Collection Maps

Kłopotek M. A., Wierzchoń S. T., Ciesielski K., Dramiński M., Czerski D., Kujawiak M.

Studia Informatica : systems and information technology

|

2006

|

Vol. 1(7)

65--76

EN

The paper presents a proposal of a set of measures for comparison of maps of document collections as well as preliminary results concerning evaluation of their usefulness and expressive power.

8

A new immune algorithm for classification static and dynamic data

Wierzchoń S. T., Kużelewska U.

Studia Informatica : systems and information technology

|

2004

|

Vol. 1(3)

107--116

EN

In this paper we present a new algorithm for exploratory data analysis. It can be used for automated cluster extraction in static as well as dinamically changing data sets. The description of the algorithm is followed by a short overview of immune-based approaches to data analysis and machine learning. The entire algorithm is briefly described in Section 3. When coping with multidimensional data, problems with their visualization is presented; the algorithm reflects topological structure of extracted clusters rather than true data location in multidimensional space. Section 5 describes shortly numerical experiments with static and dinamically changing data sets. Section 6 colcludes the paper and Section 7 describes future developments.

9

In search for new applications of the evidence theory

Kłopotek M. A., Wierzchoń S. T.

Studia Informatica : systems and information technology

|

2003

|

Vol. 1(1)

27--36

EN

This paper is concerned with seeking new applications for the Dempster-Shafer Theory that are by their nature better suited to the axiomatic framework of this theory. In particular, wafer processing on a integrated circuits production line, chemical product quality evaluation etc. are considered. Some extensions to basic DST formalism are envisaged.

10

Distributed enumeration protocol for valuation based systems

Kłopotek M. A., Wierzchoń S. T.

Zeszyty Naukowe Politechniki Białostockiej. Informatyka

|

2002

|

Z.1

83-95

EN

The paper presents a new algorithm for the problem of an enumeration protocol for nodes in a network. The new algorithm, contrary to previous ones, is local both in information access (neighbourhood only) and information stored (proportional to the number of neighbours). This property is achieved at the expense of the type of connectivity the network is assumed to exhibit.

PL

W pracy przedstawiono nowy algorytm enumeracji węzłów sieci. W odróżnieniu od dotychczasowych algorytmów jest on lokalny zarówno w sensie dostępu do informacji (uwzględnia się wyłącznie informacje pochodzące od sąsiadów aktualnie przetwarzanego węzła) jak i przechowywania informacji (ilość informacji jest proporcjonalna do liczby sąsiadów danego węzła). Cechę lokalności uzyskano zawężając rozważania do rodziny grafów triangulowanych, które odgrywają podstawową rolę w teorii sieci bayesowskich. Uogólnieniem tych ostatnich są systemy z wartościowaniami, nazywane też grafowymi systemami ekspertowymi, czyli struktury grafowe służące do reprezentacji niedeterministycznych zależności między zmiennymi (odpowiadają im węzły grafu).

11

Function optimization by the immune metaphor

Wierzchoń S. T.

TASK Quarterly : scientific bulletin of Academic Computer Centre in Gdansk

|

2002

|

Vol. 6, No 3

493-508

EN

The main goal of the immune system is to protect an organism against pathogens. To be able to recognize unknown (i.e. never seen) pathogens, the immune system applies a number of methods allowing to maintain sufficient diversity of its receptors. The most important methods are clonal selection and suppression of ineffective receptors. In effect the immune system admits maturation affinity property: during its functioning it continuously improves its ability to recognize new types of pathogens. This idea had found many interesting computer-oriented applications. In this paper a simple and easy to implement algorithm for multi-modal as well as non-stationary functions optimization is proposed. It is based on clonal selection and cells suppression mechanisms. Empirical results confirming its usability for uni-, multi-modal and non-stationary functions optimization are presented, and a review of other immunity-based approaches is given.

12

Evolutionary algorithm for learning Bayesian structures from data

Kozłowski M., Wierzchoń S. T.

TASK Quarterly : scientific bulletin of Academic Computer Centre in Gdansk

|

2002

|

Vol. 6, No 3

509-521

EN

In this paper we report an evolutionary approach to learning Bayesian networks from data. We explain reasons, which advocate such a non-deterministic approach. We analyze weaknesses of previous works and come to conclusion that we should operate in the search space native for the problem i.e. in the space of directed acyclic graphs instead of standard space of binary strings. This requires adaptation of evolutionary methodology into very specific needs. We propose quite new data representation and implementation of generalized genetic operators and then we present an efficient algorithm capable of learning complex networks without additional assumptions. We discuss results obtained with this algorithm. The approach presented in this paper can be extended with the possibility to absorb some suggestions from experts or obtained by means of data preprocessing.