Revisiting strategies for fitting logistic regression for positive and unlabeled data

Wawrzeńczyk, Adam; Mielniczuk, Jan

doi:10.34768/amcs-2022-0022

Artykuł - szczegóły

Tytuł artykułu

Revisiting strategies for fitting logistic regression for positive and unlabeled data

Autorzy

Wawrzeńczyk Adam , Mielniczuk Jan

Treść / Zawartość

Pełne teksty:

10_wawrzenczyk_mielniczyk_revisiting_stagies_for_fitting_logistic_2022_2.pdf

Pobierz

Identyfikatory

DOI

10.34768/amcs-2022-0022

Warianty tytułu

Języki publikacji

Abstrakty

Positive unlabeled (PU) learning is an important problem motivated by the occurrence of this type of partial observability in many applications. The present paper reconsiders recent advances in parametric modeling of PU data based on empirical likelihood maximization and argues that they can be significantly improved. The proposed approach is based on the fact that the likelihood for the logistic fit and an unknown labeling frequency can be expressed as the sum of a convex and a concave function, which is explicitly given. This allows methods such as the concave-convex procedure (CCCP) or its variant, the disciplined convex-concave procedure (DCCP), to be applied. We show by analyzing real data sets that, by using the DCCP to solve the optimization problem, we obtain significant improvements in the posterior probability and the label frequency estimation over the best available competitors.

Słowa kluczowe

positive learning unlabeled learning empirical risk logistic regression concave convex optimization

pozytywne uczenie się nieoznaczone uczenie się ryzyko empiryczne regresja logistyczna

Wydawca

Oficyna Wydawnicza Uniwersytetu Zielonogórskiego

Czasopismo

International Journal of Applied Mathematics and Computer Science

Rocznik

2022

Tom

Vol. 32, no. 2

Strony

299--309

Opis fizyczny

Bibliogr. 17 poz., tab., wykr.

Twórcy

autor

Wawrzeńczyk Adam

a.wawrzenczyk@ipipan.waw.pl

Institute of Computer Science, Polish Academy of Sciences, Jana Kazimierza 5, 01-248 Warsaw, Poland

autor

Mielniczuk Jan

miel@ipipan.waw.pl

Institute of Computer Science, Polish Academy of Sciences, Jana Kazimierza 5, 01-248 Warsaw, Poland
Faculty of Mathematics and Information Sciences, Warsaw University of Technology, Koszykowa 5, 00-662 Warsaw, Poland

Bibliografia

[1] Bahorik, A.L., Newhill, C.E., Queen, C.C. and Eack, S.M. (2014). Under-reporting of drug use among individuals with schizophrenia: Prevalence and predictors, Psychological Medicine 44(12): 61–69, DOI: 10.1017/S0033291713000548.
[2] Bekker, J. and Davis, J. (2018). Estimating the class prior in positive and unlabeled data through decision tree induction, Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, USA 32(1): 2712–2719.
[3] Bekker, J. and Davis, J. (2020). Learning from positive and unlabeled data: A survey, Machine Learning 109(4): 719–760, DOI: 10.1007/s10994-020-05877-5.
[4] Bekker, J., Robberechts, P. and Davis, J. (2019). Beyond the selected completely at random assumption for learning from positive and unlabeled data, in U. Brefeld et al. (Eds), Proceedings of the 2019 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Springer, Cham, pp. 71–85, DOI: 10.1007/978-3-030-46147-8_5.
[5] Cover, T. and Thomas, J. (1991). Elements of Information Theory, Wiley, New York, DOI: 10.1002/047174882X.
[6] Elkan, C. and Noto, K. (2008). Learning classifiers from only positive and unlabeled data, Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, USA, pp. 213–220, DOI: 10.1145/1401890.1401920.
[7] Łazęcka, M., Mielniczuk, J. and Teisseyre, P. (2021). Estimating the class prior for positive and unlabelled data via logistic regression, Advances in Data Analysis and Classification 15(4): 1039–1068, DOI: 10.1007/s11634-021-00444-9.
[8] Lipp, T. and Boyd, S. (2016). Variations and extension of the convex-concave procedure, Optimization and Engineering 17(2): 263–287, DOI: 10.1007/s11081-015-9294-x.
[9] Liu, B., Dai, Y., Li, X., Lee, W.S. and Yu, P.S. (2003). Building text classifiers using positive and unlabeled examples, Proceedings of the 3rd IEEE International Conference on Data Mining, ICDM’03, Melbourne, USA, pp. 179–186, DOI: 10.1109/ICDM.2003.1250918.
[10] Na, B., Kim, H., Song, K., Joo, W., Kim, Y.-Y. and Moon, I.-C. (2020). Deep generative positive-unlabeled learning under selection bias, Proceedings of the 29th ACM International Conference on Information and Knowledge Management, CIKM’20, Ireland, pp. 1155–1164, DOI: 10.1145/3340531.3411971, (virtual event).
[11] Scott, B., Blanchard, G. and Handy, G. (2013). Classification with asymetric label noise: Consistency and maximal denoising, Proceedings of Machine Learning Research 30(2013): 1–23.
[12] Sechidis, K., Sperrin, M., Petherick, E.S., Luján, M. and Brown, G. (2017). Dealing with under-reported variables: An information theoretic solution, International Journal of Approximate Reasoning 85(1): 159–177, DOI: 10.1016/j.ijar.2017.04.002.
[13] Shen, X., Diamond, S., Gu, Y. and Boyd, S. (2016). Disciplined convex-concave programming, Proceedings of 2016 IEEE 55th Conference on Decision and Control (CDC), Las Vegas, USA, pp. 1009–1014, DOI: 10.1109/CDC.2016.7798400.
[14] Teisseyre, P., Mielniczuk, J. and Łazęcka, M. (2020). Different strategies of fitting logistic regression for positive and unlabelled data, in V.V. Krzhizhanovskaya et al. (Eds), Proceedings of the International Conference on Computational Science ICCS’20, Springer International Publishing, Cham, pp. 3–17, DOI: 10.1007/978-3-030-50423-6_1.
[15] Ward, G., Hastie, T., Barry, S., Elith, J. and Leathwick, J. (2009). Presence-only data and the EM algorithm, Biometrics 65(2): 554–563, DOI: 10.1111/j.1541-0420.2008.01116.x.
[16] Yang, P., Li, X., Chua, H., Kwoh, C. and Ng, S. (2014). Ensemble positive unlabeled learning for disease gene identification, PLOS ONE 9(5): 1–11, DOI: 10.1371/journal.pone.0097079.
[17] Yuille, A. and Rangarajan, A. (2003). The concave-convex procedure, Neural Computation 15(4): 915–936, DOI: 10.1162/08997660360581958.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-fe693536-b40e-4d72-9273-823941275b32