PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Enhancing naive classifier for positive unlabeled data based on logistic regression approach

Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
It is argued that for analysis of Positive Unlabeled (PU) data under Selected Completely At Random (SCAR) assumption it is fruitful to view the problem as fitting of misspecified model to the data. Namely, it is shown that the results on misspecified fit imply that in the case when posterior probability of the response is modelled by logistic regression, fitting the logistic regression to the observable PU data which does not follow this model, still yields the vector of estimated parameters approximately colinear with the true vector of parameters. This observation together with choosing the intercept of the classifier based on optimisation of analogue of F1 measure yields a classifier which performs on par or better than its competitors on several real data sets considered.
Rocznik
Tom
Strony
225--233
Opis fizyczny
Bibliogr. 14 poz., wz., tab.
Twórcy
  • Warsaw University of Technology Faculty of Mathematics and Information Science Koszykowa 75, 00-662 Warsaw, Poland
  • Institute of Computer Science Polish Academy of Sciences Jana Kazimierza 5, 01-248 Warsaw, Poland
  • Warsaw University of Technology Faculty of Mathematics and Information Science Koszykowa 75, 00-662 Warsaw, Poland
Bibliografia
  • 1. J. Bekker and J. Davis. Learning from positive and unlabeled data: a survey. Machine Learning, 109(4):719–760, April 2020. http://dx.doi.org/10.1007/S10994-020-05877-5.
  • 2. Jessa Bekker and Jesse Davis. Estimating the class prior in positive and unlabeled data through decision tree induction. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1):2712–2719, April 2018. https://doi.org/10.1609/aaai.v32i1.11715.
  • 3. T. Cover and J. Thomas. Elements of Information Theory. Wiley, New York, NY, 1991. http://dx.doi.org/10.1002/047174882X.
  • 4. C. Elkan and K. Noto. Learning classifiers from only positive and unlabeled data. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 213–220, August 2008. http://dx.doi.org/10.1145/1401890.1401920.
  • 5. E. Fowlkes and C. Mallows. A method for comparing two hierarchical clusterings. Journal of American Statistical Association, 78:573–586, 1981. https://doi.org/10.2307/2288117.
  • 6. M. Łazecka, J. Mielniczuk, and P. Teisseyre. Estimating the class prior for positive and unlabelled data via logistic regression. Advances in Data Analysis and Classification, 15(4):1039–1068, June 2021. http://dx.doi.org/10.1007/S11634-021-00444-9.
  • 7. W. Lee and B. Liu. Learning with positive and unlabeled exampled using weighted logistic regression. In Proceedings of the Twentieth International Conference on Machine Learning, ICML ’03, pages 448–455, San Francisco, CA, USA, 2003. Morgan Kaufmann Publishers Inc.
  • 8. K-C. Li and N. Duan. Regression analysis under link violation. The Annals of Statistics, 17(3):1009–1052, 1989. http://dx.doi.org/10.1214/aos/1176347254.
  • 9. P. Ruud. Sufficient conditions for the consistency of maximum likelihood estimation despite misspecification of distribution in multinomial discrete choice models. Econometrica, 51:225–228, 1983. http://dx.doi.org/10.2307/1912257.
  • 10. S. Tabatabaei, J. Klein, and M Hoogendoorn. Estimating the F1 score for learning from positive and unlabeled examples. In LOD 2020. Springer, Cham, 2020. https://doi.org/10.1007/978-3-030-64583-0_15.
  • 11. P. Teisseyre, J. Mielniczuk, and M Łazecka. Different strategies of fitting logistic regression for positive and unlabeled data. In Proceedings of the International Conference on Computational Science ICCS’20, pages 3–17, Cham, 2020. Springer International Publishing. https://doi.org/10.1007/978-3-030-50423-6_1.
  • 12. Q. Vuong. Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica, 57:307–333, 1989. https://doi.org/10.2307/1912557.
  • 13. A. Wawrzenczyk and J. Mielniczuk. Strategies for fitting logistic regression for positive and unlabeled data revisited. Int.J. Appl. Math. Comp. Sci., pages 299–309, 2022. https://doi.org/10.34768/amcs-2022-0022.
  • 14. H. White. Maximum likelihood estimation of misspecified models. Econometrica, 50(1):1–25, 1982. https://doi.org/10.2307/1912526.
Uwagi
1. Main Track Regular Papers
2. Opracowanie rekordu ze środków MEiN, umowa nr SONP/SP/546092/2022 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2024).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-e1364cc5-43e1-48dd-a218-26bca656a441
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.