Powiadomienia systemowe
- Sesja wygasła!
Identyfikatory
Warianty tytułu
Porównanie metod dyskretyzacji danych w uczeniu modeli probabilistycznych
Języki publikacji
Abstrakty
Very often statistical method or machine learning algorithms can handle discrete attributes only. And that is why discretization of numerical data is an important part of the pre–processing. This paper presents the results of the problem of data discretization in learning quantitative part of probabilistic models. Four data sets taken from UCI Machine Learning Repository were used to learn the quantitative part of the Bayesian networks. The continuous variables were discretized using two supervised and two unsupervised discretization methods. The main goal of this paper was to study whether method of data discretization in given data set has an influence on model’s reliability. The accuracy was defined as the percentage of correctly classified records.
Bardzo często algorytmy uczenia maszynowego nie są przystosowane do korzystania ze zmiennych ciągłych. Z tego powodu dyskretyzacja danych jest istotną częścią wstępnego przetwarzania. W artykule przedstawiono wyniki prac nad problemem dyskretyzacji danych w uczeniu modeli probabilistycznych. Cztery zestawy danych pobrane z repozytorium uczenia maszynowego UCI zostały wykorzystane do nauczenia parametrów ilościowej części sieci bayesowskich. Występujące w wybranych zbiorach zmienne ciągłe były dyskretyzowane przy użyciu dwóch metod nadzorowanych i dwóch nienadzorowanych. Głównym celem tego artykułu było zbadanie, czy metoda dyskretyzacji danych w danym zbiorze ma wpływ na niezawodność modelu. Dokładność metod była definiowana jako odsetek poprawnie sklasyfikowanych rekordów.
Czasopismo
Rocznik
Tom
Strony
177--192
Opis fizyczny
Bibliogr. 27 poz., rys., tab., wykr.
Twórcy
autor
- Faculty of Computer Science, Bialystok University of Technology, Białystok, Poland
autor
- Faculty of Computer Science, Bialystok University of Technology, Białystok, Poland
Bibliografia
- [1] R. Abraham, J. B. Simha, S. S. Iyengar, A comparative analysis of discretization methods for Medical Datamining with NaA˘ Zve Bayesian classifier, Information Technology, 2006.
- [2] E. Cantú–Paz, Supervised and unsupervised discretization methods for evolutionary algorithms, In proc. Of the genetic and evolutionary computation conference, pp. 213–216, 2001.
- [3] J. Dougherty, R. Kohavi, M. Sahami, Supervised and Unsupervised Discretization of Continuous Features, Machine Learning: Proceedings of the Twelfth International Conference, 1995.
- [4] A. Ekbal, Improvement of Prediction Accuracy Using Discretization and Voting Classifier, The 18th International Conference on Pattern Recognition, IEEE, 2006.
- [5] N. Friedman, D. Geiger, M. Goldszmidt, Bayesian network classifiers, Machine Learning 29 (1997), 131–163.
- [6] S. García, J. Luengo, J. A. Sáez, V. López, F. Herrera, Survey of discretization techniques, Taxonomy and empirical analysis in supervised learning, IEEE Transactions on Knowledge and Data Engineering, vol. 25(4), pp. 734–750, 2013.
- [7] M. Hacibeyoglu, M. H. Ibrahim, Comparison of the effect of unsupervised ˇ and supervised discretization methods on classification process, International Journal of Intelligent Systems and Applications in Engineering, vol. 4(1), pp. 105–108, 2016.
- [8] F. Hussain H. Liu C. L. Tan M.Dash, Discretization: An enabling technique, Data Mining and Knowledge Discovery (2002) 6: 393.
- [9] F. Kaya, Discretizing Continuous Features for Naive Bayes and C4. 5 Classifiers, University of Maryland publications: College Park, MD, USA, 2008.
- [10] R. Kerber., Chi Merge: Discretization of numeric attributes, In Proc. Tenth National Conference on Artificial Intelligence, pp. 123–128. MIT Press 1992.
- [11] S. Kotsiantis, D. Kanellopoulos, Discretization Techniques: A recent survey, GESTS International Transactions on Computer Science and Engineering, vol.32 (1), 2006.
- [12] K. Lavangnananda, S. Chattanachot, Study of discretization methods in classification, 9th International Conference on Knowledge and Smart Technology, pp. 50–55, IEEE, 2017.
- [13] P. Lehtinen, M.i Saarel, T. Elomaa, Online Chi Merge Algorithm, Springer, 2012.
- [14] D. M. Maslove, T. Podchiyska, H. J. Lowe Discretization of continuous features in clinical datasets J Am Med Inform Assoc., vol. 20(3), pp. 544–553, 2013.
- [15] I. Mitov, I. Krassimira, M. Krassimir, V. Velychko, P. Stanchev, K. Vanhoof Comparison of discretization methods for preprocessing data for pyramidal growing network classification method, Information Science & Computing, International Book Series, Number 14, pp. 31–39, 2009.
- [16] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann PUBLISHERs, Inc., San Mateo, CA, 1988.
- [17] S. Ramírez–Gallego, S. García, H. Mouriño–Talín, D. Martínez–Rego, V. Bolón–Canedo, A. Alonso–Betanzos, J. M. Benítez, F. Herrera Data discretization: taxonomy and big data challenge, WIREs Data Mining Knowledge Discovery, 2015.
- [18] A. Rayner, Discretization Numerical Data for Relational Data with One-toMany Relations
- [19] P. Spirtes, C. Glymour, R. Scheines, Causation Prediction and Search, SpringerVerlag, New York, 1993.
- [20] C. Zeynel, Y. Figen, Comparison of Chi-square based algorithms for discretization of continuous chicken egg quality traits, Journal of Agricultural Informatics, vol. 8, pp. 13–22, 2017.
- [21] C. Zeynel, Y. Figen, Unsupervised Discretization of Continuous Variables in a Chicken Egg Quality Traits Dataset, Turkish Journal of Agriculture – Food Science and Technology, vil. 5, pp. 315–320, 2017.
- [22] BayesFusion, LLC, [https://www.bayesfusion.com/], Accessed 19-08- 2017.
- [23] UCI Repository of machine learning databases, [http://archive.ics.uci. edu/ml/datasets.html], Accessed 05-04-2017,
- [24] Volker Lohweg University of Applied Sciences, Ostwestfalen-Lippe [https: //archive.ics.uci.edu/ml/datasets/Banknote+authentication], Accessed 03-07-2017.
- [25] Hungarian Institute of Cardiology. Budapest: Andras Janosi, M.D., University Hospital, Zurich, Switzerland: William Steinbrunn, M.D., University Hospital, Basel, Switzerland: Matthias Pfisterer, M.D., V.A. Medical Center, Long Beach and Cleveland Clinic Foundation: Robert Detrano, M.D., Ph.D, [https: //archive.ics.uci.edu/ml/datasets/heart+disease], Accessed 10-07- 2017.
- [26] Vision Group, University of Massachusetts, [https://archive.ics.uci. edu/ml/datasets/Statlog+(Image+segmentation)], Accessed 01-06- 2017.
- [27] W. J. Nash, T. .L Sellers, S. R. Talbot, Andrew J. Cawthorn, W. B. Ford, The Population Biology of Abalone (_Haliotis_ species) in Tasmania. I. Blacklip Abalone (_H. rubra_) from the North Coast and Islands of Bass Strait, Sea Fisheries Division, Technical Report No. 48 (ISSN 1034-3288), 1994. [https: //archive.ics.uci.edu/ml/datasets/abalone], Accessed 12-07-2017.
Uwagi
Artykuł częściowo zrealizowano w ramach pracy badawczej S/WI/2/2018.
Opracowanie rekordu w ramach umowy 509/P-DUN/2018 ze środków MNiSW przeznaczonych na działalność upowszechniającą naukę (2019).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-b2923c0e-3c15-4a05-aaa4-89dd19cd0a14