Wyniki wyszukiwania - BazTech

1

On the analysis of correlation between nominal data and numerical data

Gniazdowski Zenon

Zeszyty Naukowe Warszawskiej Wyższej Szkoły Informatyki

|

2022

|

nr 27

57--82

EN

The article investigates the possibility of measuring the strength of a linear corre lation relationship between nominal data and numerical data. Correlation coeffi cients for variables coded with real numbers as well as for variables coded with complex numbers were studied. For variables coded with real numbers, unam biguous measures of real linear correlation were obtained. In the case of complex coding, it has been observed that the obtained complex correlation coefficients change with the permutation of the phases in the complex numbers used to code classes of elements with equal cardinalities. It was found that a necessary condi tion for linear correlation is the possibility of linear ordering of a set with data. Since linear order is not possible in the set of complex numbers, complex correla tion coefficients cannot be used as a measure of linear correlation. In the event of such a situation, a substitute action was suggested that would prevent equal cardi nality of classes of identical elements contained in the set with nominal data. This action would consist in the correction of data, analogous to the correction during preprocessing or cleaning of data containing missing or outlier values.

2

Numerical Coding of Nominal Data

Gniazdowski Z., Grabowski M.

Zeszyty Naukowe Warszawskiej Wyższej Szkoły Informatyki

|

2015

|

nr 12

53--61

EN

In this paper, a novel approach for coding nominal data is proposed. For the given nominal data, a rank in a form of complex number is assigned. The proposed method does not lose any information about the attribute and brings other properties previously unknown. The approach based on these knew properties can been used for classification. The analyzed example shows that classification with the use of coded nominal data or both numerical as well as coded nominal data is more effective than the classification, which uses only numerical data.

3

Grabowski M., Korpusik M.

Zeszyty Naukowe Warszawskiej Wyższej Szkoły Informatyki

|

2013

|

nr 10

25--37

EN

Classification theory analytical paradigm investigates continuous data only. When we deal with a mix of continuous and nominal attributes in data records, difficulties emerge. Usually, the analytical paradigm treats nominal attributes as continuous ones via numerical coding of nominal values (often a bit ad hoc). We propose a way of keeping nominal values within analytical paradigm with no pretending that nominal values are continuous. The core idea is that the information hidden in nominal values influences on metric (or on similarity function) between records of continuous and nominal data. Adaptation finds relevant parameters which influence metric between data records. Our approach works well for classifier induction algorithms where metric or similarity is generic, for instance k nearest neighbor algorithm or proposed here support of decision tree induction by similarity function between data. The k-nn algorithm working with continuous and nominal data behaves considerably better, when nominal values are processed by our approach. Algorithms of analytical paradigm using linear and probability machinery, like discriminant adaptive nearest-neighbor or Fisher’s linear discriminant analysis, cause some difficulties. We propose some possible ways to overcome these obstacles for adaptive nearest neighbor algorithm.