PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Powiadomienia systemowe
  • Sesja wygasła!
Tytuł artykułu

Alternatives for Greedy Discrete Subsampling: Various Approaches Including Cluster Subsampling of COVID-19 Data With No Response Variable

Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Konferencja
Federated Conference on Computer Science and Information Systems (16 ; 02-05.09.2021 ; online)
Języki publikacji
EN
Abstrakty
EN
An exhaustive selection of all possible combinations of n = 400 from N = 698 observations of the COVID-19 dataset was used as a benchmark. Building a random set of subsamples and choosing the one that minimized an averaged sum of squares of each variable's category frequency returned similar results as a "forward" subselection reducing the dataset one-by-one observation by the same metric's permanent lowering. That works similarly as k-means clustering (with a random clusters' number) over the original dataset's observations and choosing a subsample from each cluster proportionally to its size. However, the approaches differ significantly in asymptotic time complexity.
Rocznik
Tom
Strony
103--111
Opis fizyczny
Bibliogr. 15 poz., wz., wykr., il.
Twórcy
  • Department of Statistics and Probability, Faculty of Informatics and Statistics, University of Economics, nám. W. Churchilla 4, 130 67 Prague, Czech Republic
  • Institute of Biophysics and Informatics, First Faculty of Medicine Charles University, Salmovská 1, Prague, Czech Republic
  • Department of Statistics and Probability, Faculty of Informatics and Statistics, University of Economics, nám. W. Churchilla 4, 130 67 Prague, Czech Republic
autor
  • Department of Statistics and Probability, Faculty of Informatics and Statistics, University of Economics, nám. W. Churchilla 4, 130 67 Prague, Czech Republic
autor
  • Department of Statistics and Probability, Faculty of Informatics and Statistics, University of Economics, nám. W. Churchilla 4, 130 67 Prague, Czech Republic
Bibliografia
  • 1. Peter C. Austin. “An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies”. In: Multivariate Behavioral Research 46.3 (May 2011), pp. 399–424. URL: https://doi.org/10.1080/00273171.2011.568786.
  • 2. Santhosh Pathical and Gursel Serpen. “Comparison of subsampling techniques for random subspace ensembles”. In: 2010 International Conference on Machine Learning and Cybernetics. IEEE, July 2010. http://dx.doi.org/10.1109/icmlc.2010.5581032.
  • 3. Elizabeth A. Stuart. “Matching Methods for Causal Inference: A Review and a Look Forward”. In: Statistical Science 25.1 (Feb. 2010). URL: https://doi.org/10.1214/09-sts313.
  • 4. Sarda Sahney, Michael J. Benton, and Paul A. Ferry. “Links between global taxonomic diversity, ecological diversity and the expansion of vertebrates on land”. In: Biology Letters 6.4 (Jan. 2010), pp. 544–547. DOI : 10.1098/rsbl.2009.1024. URL: https://doi.org/10.1098/rsbl.2009.1024.
  • 5. David MacKay. Information theory, inference, and learning algorithms. Cambridge, UK New York: Cambridge University Press, 2003. ISBN: 0-521-64298-1.
  • 6. Lubomír Štěpánek, Filip Habarta, Ivana Malá, et al. “Analysis of asymptotic time complexity of an assumption-free alternative to the log-rank test”. In: Proceedings of the 2020 Federated Conference on Computer Science and Information Systems. IEEE, Sept. 2020. URL : https://doi.org/10.15439/2020f198.
  • 7. Malay K. Pakhira. “A Linear Time-Complexity k-Means Algorithm Using Cluster Shifting”. In: 2014 International Conference on Computational Intelligence and Communication Networks. IEEE, Nov. 2014. URL : https://doi.org/10.1109/cicn.2014.220.
  • 8. J. C. Gower. “A General Coefficient of Similarity and Some of Its Properties”. In: Biometrics 27.4 (Dec. 1971), p. 857. URL: https://doi.org/10.2307/2528823.
  • 9. R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2017. URL: https://www.R-project.org/.
  • 10. Lubomír Štěpánek, Pavel Kasal, and Jan Měšt’ák. “Evaluation of facial attractiveness for purposes of plastic surgery using machine-learning methods and image analysis”. In: 2018 IEEE 20th International Conference on e-Health Networking, Applications and Services (Healthcom). IEEE, Sept. 2018. DOI : 10.1109/healthcom.2018.8531195. URL : https://doi.org/10.1109/healthcom.2018.8531195.
  • 11. Lubomír Štěpánek, Pavel Kasal, and Jan Měšt’ák. “Machine-learning at the service of plastic surgery: a case study evaluating facial attractiveness and emotions using R language”. In: Proceedings of the 2019 Federated Conference on Computer Science and Information Systems. IEEE, Sept. 2019. URL: https://doi.org/10.15439/2019f264.
  • 12. Lubomír Štěpánek, Pavel Kasal, and Jan Měšt’ák. “Machine-Learning and R in Plastic Surgery – Evaluation of Facial Attractiveness and Classification of Facial Emotions”. In: Advances in Intelligent Systems and Computing. Springer International Publishing, Sept. 2019, pp. 243–252. URL : https://doi.org/10.1007/978-3-030-30604-5_22.
  • 13. Lubomír Štěpánek, Pavel Kasal, and Jan Měšt’ák. “Machine-learning at the service of plastic surgery: a case study evaluating facial attractiveness and emotions using R language”. In: Proceedings of the 2019 Federated Conference on Computer Science and Information Systems. IEEE, Sept. 2019. URL: https://doi.org/10.15439/2019f264.
  • 14. Lubomír Štěpánek, Pavel Kasal, and Jan Měšt’ák. “Evaluation of Facial Attractiveness after Undergoing Rhinoplasty Using Tree-based and Regression Methods”. In: 2019 E-Health and Bioengineering Conference (EHB). IEEE, Nov. 2019. URL: https://doi.org/10.1109/ehb47216.2019.8969932.
  • 15. Lubomír Štěpánek, Filip Habarta, Ivana Malá, et al. “A Machine-learning Approach to Survival Time-event Predicting: Initial Analyses using Stomach Cancer Data”. In: 2020 International Conference on e-Health and Bioengineering (EHB). IEEE, Oct. 2020. URL: https://doi.org/10.1109/ehb50910.2020.9280301.
Uwagi
1. This paper is supported by the grant OP VVV IGA/A, CZ.02.2.69/0.0/0.0/19_073/0016936 with no. 18/2021, which has been provided by the Internal Grant Agency of the Prague University of Economics and Business.
2. Preface
3. Session: 14th International Workshop on Computational Optimization
4. Communication Papers
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-14854c1a-0802-41e4-a816-5b65d5ba916e
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.