Wyniki wyszukiwania - Biblioteka Nauki

1

100%

Dudek A. , University of Economics in Wrocław D. o. E. a. C. S.

|

EN

Symbolic Data Analysis is an extension of multivariate analysis dealing with data represented in an extended form. Each cell in symbolic data table (symbolic variable) can contain data in form of single quantitative value, categorical value, interval, multivalued variable, multivalued variable with weights. Variable can be taxonomic, hierarchically dependent, logically dependent. Due to extended data representation Symbolic Data Analysis introduces new methods and also implements traditional methods that symbolic data can be treated as an input. Article shows how “classical” Bayesian discrimination rule can be adapted to deal with data of different symbolic types, presents kernel intensity measures for symbolic data and methods of obtaining probabilities of belongings to the classes. The example of using symbolic discriminant analysis for electronic mail filtering is given.

PL

Symboliczna analiza danych jest rozszerzeniem metod wielowymiarowej analizy statystycznej ze względu na sposób reprezentacji danych. Każda komórka w symbolicznej tablicy danych (zmienna symboliczna) może reprezentować dane w postaci liczb, danych jakościowych (tekstowych), przedziałów liczbowych, zbioru wartości, zbioru wartości z wagami. Zmienne mogą ponadto reprezentować strukturę gałęziową oraz być hierarchicznie lub logicznie zależne. Ze względu na sposób reprezentacji symboliczna analiza danych wprowadza nowe metody ich przetwarzania oraz tak implementuje metody tradycyjne, żeby dane symboliczne mogły być ich danymi wejściowymi. W artykule pokazano, jak „klasyczna” analiza Bayesowska może być zaadoptowana dla różnych typów danych symbolicznych za pomocą jądrowego estymatora intensywności dla obiektów symbolicznych. Całość jest zakończona przykładem zastosowania analizy dyskryminacyjnej obiektów symbolicznych do filtrowania przychodzącej poczty elektronicznej.

2

Kohonen self-organizing maps for symbolic objects

100%

Dudek A. , Chair of Econometrics and Informatics U. o. E.

Acta Universitatis Lodziensis. Folia Oeconomica

|

2008

|

tom 216

EN

Visualizing data in the form of illustrative diagrams and searching, in these diagrams, for structures, clusters, trends, dependencies etc. is one of the main aims of multivariate statistical analysis. In the case of symbolic data (e.g. data in form of: single quantitative value, categorical values, intervals, multi-valued variables, multi-valued variables with weights), some well-known methods are provided by suitable 'symbolic' adaptations of classical methods such as principal component analysis or factor analysis. An alternative visualization of symbolic data is obtained by constructing a Kohonen map. Instead of displaying the individual items k = 1,..., n by n points or rectangles in a two dimensional space, the n items are first clustered into a number m of mini-clusters and then these mini-clusters are assigned to the vertices of a rectangular lattice of points in the plane such that 'similar' clusters are represented by neighbouring vertices in the lattice.

3

Classification via spectral clustering

100%

Dudek A. , Uniwersytet Ekonomiczny we Wrocławiu; Wydział Ekonomii Z. i. T. K. E. i. I.

|

tom 235

PL

Klasyfikacja spektralna to rozwijająca się od końca poprzedniego wieku metoda analizy skupień. Metoda ta, mimo niekiedy niezbyt rozbudowanej podbudowy teoretycznej, daje bardzo dobre wyniki empiryczne zarówno na zbiorach testowych jak i na rzeczywistych zbiorach danych. Artykuł przedstawia najważniejsze kroki algorytmu klasyfikacji spektralnej, wskazuje sytuacje, w których stosowanie algorytmu daje duże lepsze rezultaty (mierzone indeksem Randa) niż inne metody analizy skupień. W zakończenie przedstawione są rekomendacje dotyczące sytuacji, w których warto stosować tą technikę klasyfikacji.

4

Multidimensional Scaling for Symbolic Interval Data

100%

Dudek A. , Chair of Econometrics and Informatics U. o. E.

Acta Universitatis Lodziensis. Folia Oeconomica

|

2009

|

tom 228

PL

Podstawowym celem skalowania wielowymiarowego jest przedstawienie relacji między obiektami w przestrzeni wielowymiarowej jako odległości w przestrzeni 2- lub 3- wymiarowej. Dane wejściowe do procedur skalowania wielowymiarowego to zazwyczaj symetryczna macierz kwadratowa wskazująca na relacje (podobieństwa lub niepodobieństwa) pomiędzy obiektami pewnego zbioru. Istnieje wiele technik klasycznego skalowania wielowymiarowego, jednak wszystkie z nich wymagają aby w poszczególnych komórkach tej macierzy znajdowały się pojedyncze wartości liczbowe. Denoeux and Masson (2002) zaproponowali rozszerzenie klasycznego skalowania wielowymiarowego na dane symboliczne w postaci przedziałów liczbowych. Danymi wejściowymi do opracowanego przez nich algorytmu 1NTERSCAL jest tabela zawierająca minimalne i maksymalne odległości pomiędzy hiperprostopadłościanami reprezentującymi obiekty. Takie same podejście występuje w algorytmach SYMSCAL i I-SCAL zaproponowanych przez Groenena i in. (2005). W artykule przedstawiony zostały najważniejsze algorytmy skalowania wielowymiarowego dla danych symbolicznych w postaci przedziałów liczbowych oraz przykłady ich zastosowania dla danych symbolicznych pochodzących z repozytorium http://www.ceremade.dauphine.fr/~touati/sodas-pagegarde.htm.

EN

The aim of multidimensional scaling is to represent dissimilarities among objects in high dimensional space as distances in low (usually 2- or 3-) dimensional space. Usually the input to multidimensional scaling procedure is a square, symmetric matrix indicating relationships (similarities or dissimilarities) among a set of items. There are many techniques of classical multidimensional scaling but all under assumption that each entry in relationship matrix is single numeric value. Denoeux and Masson (2002) have proposed to extend multidimensional scaling onto symbolic interval data. The input to theirs INTERSCAL algorithm is interval dissimilarity table containing minimum and maximum distance between hyper-rectangles representing objects. The same approach is used in SYMSCAL and I-SCAL algorithms proposed by Groenen et al. (2005). Article presents main algorithms of multi-dimensional scaling for symbolic data in form of intervals along with some examples on datasets taken from symbolic data repository (http://www.ceremade.dauphine.fr/~touati/sodas-pagegarde.htm).

5

Classification of Large Data Sets. Comparison of Performance of Chosen Algorithms

100%

Dudek A. , Wrocław U. o. E.

Acta Universitatis Lodziensis. Folia Oeconomica

|

2013

|

tom 285

EN

Researchers analyzing large (> 100,000 objects) data sets with the methods of cluster analysis often face the problem of computational complexity of algorithms, that sometimes makes it impossible to analyze in an acceptable time. Common solution of this problem is to use less computationally complex algorithms (like k-means), which in turn can in many cases give much worse results than for example algorithms using eigenvalues decomposition . The results of analysis of the actual sets of this type are therefore usually a compromise between quality and computational capabilities of computers. This article is an attempt to present the current state of knowledge on the classification of large datasets, and identify ways to develop and open problems.

PL

Badacze analizujący przy pomocy metod analizy skupień duże (> 100.000 obiektów) zbiory danych, stają często przed problemem złożoności obliczeniowej algorytmów, uniemożliwiającej niekiedy przeprowadzenie analizy w akceptowalnym czasie. Jednym z rozwiązań tego problemu jest stosowanie mniej złożonych obliczeniowo algorytmów (hierarchiczne aglomeracyjne, k-średnich), które z kolei mogą w wielu sytuacjach dawać zdecydowanie gorsze rezultaty niż np. algorytmy wykorzystujące dekompozycję względem wartości własnych. Rezultaty rzeczywistych analiz tego typu zbiorów są więc zazwyczaj kompromisem pomiędzy jakością a możliwościami obliczeniowymi komputerów. Artykuł jest próbą przedstawienia aktualnego stanu wiedzy na temat klasyfikacji dużych zbiorów danych oraz wskazania dróg rozwoju i problemów otwartych.

6

The fuzzy TOPSIS method and its implementation in the R programme

63%

Dudek A. , Jefmański B.

Informatyka Ekonomiczna

|

2015

|

nr 1(35)

19-27

EN

The TOPSIS method (Technique for Order Preference by Similarity Ideal Solu-tion) suggested by Hwang and Yoon [1981], belongs to the group of pattern linear ordering methods of multidimensional objects. A characteristic feature of this method is a way to evaluate a synthetic criterion’s values, which takes into consideration the distance of an evaluated object from a positive-ideal solution as well as from a negative-ideal solution. The fuzzy TOPSIS method enables the linear ordering of objects described through linguistic variables, whose values are expressed in the form of triangular fuzzy numbers. In this arti-cle, a way of synthetic measurement estimation in environment R was presented, according to the assumptions of the fuzzy TOPSIS method proposed by Chen [2000]. Scripts, which are included in the article make the accomplishment of this particular method’s stages pos-sible.

7

Regression analysis for interval-valued symbolic data versus noisy variables and outliers

63%

Pełka M. , Dudek A.

Econometrics. Ekonometria. Advances in Applied Data Analytics

|

2016

|

nr 2 (52)

35-42

EN

Regression analysis is perhaps the best known and most widely used method used for the analysis of dependence; that is, for examining the relationship between a set of independent variables (X’s) and a single dependent variable (Y). In general regression, the model is a linear combination of independent variables that corresponds as closely as possible to the dependent variable [Lattin, Carroll, Green 2003, p. 38]. The aim of the article is to present two suitable adaptations for a regression analysis of symbolic interval-valued data (centre method and centre and range method) and to compare their usefulness when dealing with noisy variables and/or outliers. The empirical part of the paper presents the results of simulation studies based on artificial and real data, without noisy variables and/or outliers and with noisy variable and outliers. The results are compared according to the values of two coefficients of determination 2 RL and 2 . RU The results show that usually the centre and range method obtains better results even when the data set contains noisy variables and outliers, but in some cases the centre method obtains better results than the centre and range method.