Dyskretyzacja danych numerycznych metodami przekształceń boolowskich

Jankowski, C.; Borowik, G.; Kowalski, K.

Artykuł - szczegóły

Tytuł artykułu

Dyskretyzacja danych numerycznych metodami przekształceń boolowskich

Autorzy

Jankowski C. , Borowik G. , Kowalski K.

Identyfikatory

Warianty tytułu

Discretization of numerical data using boolean transformations

Języki publikacji

Abstrakty

Dyskretyzacja jest jednym z podstawowych zabiegów wstępnego przetwarzania tablic decyzyjnych. Przekształcenie ciągłych wartości atrybutów na ich dyskretne odpowiedniki umożliwia dalszą analizę za pomocą metod eksploracji danych. Od jakości dyskretyzacji zależy zatem dokładność przewidywań, uzyskanych za pomocą wyznaczania reguł decyzyjnych. Przedstawiono opis metody dyskretyzacji danych numerycznych w tablicach decyzyjnych metodami przekształceń boolowskich. Pokazano, iż użycie algorytmów, wywodzących się z syntezy logicznej, umożliwia uzyskanie dobrej jakościowo dyskretyzacji.

Discretization is one of the most important parts of decision tables preprocessing. Transformation continuous values of attributes into discrete intervals allows further analysis using data mining methods. The accuracy of generated rules predictions relies on the quality of discretization. The paper contains a description of the method of discretization of numerical data in decision tables using boolean transformations. Has been shown that the use of algorithms derived from logic synthesis results in a good quality discretization.

Słowa kluczowe

dyskretyzacja przetwarzanie wstępne danych eksploracja danych algebra Boole'a

discretization data pre-processing data mining Boolean algebra

Wydawca

Wydawnictwo SIGMA-NOT

Czasopismo

Przegląd Telekomunikacyjny + Wiadomości Telekomunikacyjne

Rocznik

2014

Tom

nr 10

Strony

1334--1342

Opis fizyczny

Bibliogr. 35 poz., rys., tab.

Twórcy

autor

Jankowski C.

C.Jankowski@stud.elka.pw.edu.pl

Instytut Telekomunikacji, Wydział Elektroniki i Technik Informacyjnych Politechniki Warszawskiej

autor

Borowik G.

gborowik@tele.pw.edu.pl

Instytut Telekomunikacji, Wydział Elektroniki i Technik Informacyjnych Politechniki Warszawskiej

autor

Kowalski K.

Instytut Telekomunikacji, Wydział Elektroniki i Technik Informacyjnych Politechniki Warszawskiej

Bibliografia

[1] Abouabdalla O.: False Positive Reduction in Intrusion Detection System: A Survey, Proc. of IC-BNMT2009, 2009
[2] Bache K., Lichman M.: UCI Machine Learning Repository [http://archive.ics.uci.edu/ml], Irvine, CA: University of California, School of Information and Computer Science, stan z listopada 2013
[3] Borowik G.: Boolean function complementation based algorithm for data discretization, in: Moreno-Diaz R., Pichler F.R., Ouesada-Arencibia A. (eds.) Computer Aided Systems Theory - EUROCAST 2013, vol. 8112, Springer Heidelberg (2013)
[4] Borowik G.: Data mining approach for decision and classification systems using logic synthesis algorithms, in: Klempous R., Nikodem J., Jacak W, Chaczko Z. (eds.) Advanced Methods and Applications in Computational Intelligence, Topics in Intelligent Engineering and Informatics, vol. 6, Springer International Publishing (2014), doi: 10.1007/978-3-319-01436-4J
[5] Borowik G., Łuba I: Fast algorithm of attribute reduction based on the complementation of Boolean function, in: Klempous R., Nikodem J., Jacak W., Chaczko Z. (eds.) Advanced Methods and Applications in Computational Intelligence, Topics in Intelligent Engineering and Informatics, vol. 6, Springer International Publishing (2014)
[6] Borowik G., Łuba T, Zydek D.: Features Reduction Using Logic Minimization Techniques, Intl Journal of Electronics and Telecommunications, 2012, vol. 58, no. 1
[7] Bouckaert R.R, Frank E., Hali M., Kirkby R., Reutemann P, Seewald A., ScuseD.: WEKA Manual for Version 3-6-10, 2013
[8] Brayton R.K., Hachtel G.D., McMullen C.T., Sangiovanni-Vincentelli A.: Logic Minimization Algorithms for VLSI Synthesis. Kluwer Academic Publishers(1984)
[9] Carletta J.: Assessing agreement on classification tasks: the kappa statistic, Journal Computational Linguistics, vol. 22,1996
[10] Domingos R: MetaCost: A General Method for Making Classifiers Cost-Sensitive, Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, ACM Press, 1999
[11] Dougherty J., Kohavi R., Sahami M.: Supervised and Unsupervised Discretization of Continuous Features, Machine Learning: Proceedings of the Twelfth International Conference, 1995
[12] Ekbal A.: Improvement of Prediction Accuracy Using Discretization and Voting Classifier, The 18th International Conference on Pattern Recognition, IEEE, 2006
[13] Fayyad U., Piatetsky-Shapiro G., Smyth R: From Data Mining to Knowledge Discovery in Databases, Al Magazine, vol. 17, no. 3,1996
[14] Fayyad U.M. Irani K.B.: Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning, Proceedings of the International Joint Conference on Uncertainty in Al, 1993
[15] Frank E., Witten I.H.: Making Better Use of Global Discretization, Proceeding of 16th International Conference on Machine Learning, 1999
[16] Grzenda M.: Towards the Reduction of Data Used for the Classification of Network Flows, Lecture Notes in Computer Science, Volume 7209, Springer, 2012
[17] Holmes G., Donkin A., Witten I.A.: WEKA: a machine learning work-bench, Proceedings of the 1994 Second Australian and New Zealand Conference on Intelligent Information Systems, 1994
[18] Komorowski J., Polkowski L, Skowron A.: Rough Sets: A Tutorial, 1998
[19] Kotsiantis S., Kanellopoulos D.: Discretization Techniques: A recent survey, GESTS International Transactions on Computer Science and Engineering, vol.32 (1), 2006
[20] Liu H, Hussain R, Tan Ch. L., Dash M.: Discretization: An Enabling Technique, Data Mining and Knowledge Discovery, 2002, 6
[21] Liu H., SetionoR.: Feature selection via discretization, IEEE Transactios on Knowledge and Data Engineering, 1997, 9
[22] Luba T: Synteza układów logicznych, Oficyna Wydawnicza Politechniki Warszawskiej, Warszawa, 2005
[23] Luba T. (et al.): Fioła i znaczenie syntezy logicznej w eksploracji danych dla potrzeb telekomunikacji i medycyny. Przegląd Telekomunikacyjny i Wiadomości Telekomunikacyjne, nr 5,2014
[24] Mańkowski M., Luba T, Borowik G., Jankowski C.: Indukcja reguł decyzyjnych z dwustopniowym procesem selekcji reguł. Przegląd Telekomunikacyjny i Wiadomości Telekomunikacyjne, nr 7, 2014
[25] Nguyen H.S, Nguyen S.H.: Discretization methods in data mining, Rough Sets in Knowledge Discovery, Physica-Verlag, Heidelberg, 1998
[26] Nguyen H.S.: Approximate Boolean Reasoning: Foundations and Applications in Data Mining, Lecture Notes in Computer Science Volume 4100, 2006
[27] Nguyen H.S.: Discretization of Real Value Attributes:A boolean reasoning approach., rozprawa doktorska, Uniwersytet Warszawski, 1997
[28] Othman M.F., YauT.M.S.: Comparison of Different Classification Techniques Using WEKA for Breast Cancer, 3rd Kuala Lumpur International Conference on Biomedical Engineering 2006 IFMBE Proceedings Volume15, 2007
[29] Pawlak Z., Skowron A.: Rough sets and Boolean reasoning, Information Sciences, 2007,177(1)
[30] Peng L., Oing W, Yujia G.: Study on Comparison of Discretization Methods, International Conference on Artificial Intelligence and Computational Intelligence, 2009
[31] Pyle D.: Data Preparation for Data Mining, Morgan Kaufmann Publishers, Los Attos, California, 1999
[32] Viera A.J., Garrett J.M.: Understanding interobserver agreement: the kappa statistic, Family Medicine, vol. 37, no. 5,2005
[33] Weiss G.: Data Mining in the Telecommunications Industry, Encyclopedia of Data Warehousing and Mining, Second Edition, Chapter 76, 2009
[34] Zadnik M., Michlovsky Z.: Is Spam Visible in Flow-Level Statistics?, CESNET National Research and Education Network, Technical Report 6/2008
[35] Zakrzewicz M.: Data Mining i odkrywanie wiedzy w bazach danych, Materiały konf. PLOUG'97,1997

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-e84a926c-3e19-4b8e-b4db-ae3270c25753