Can Unlabelled Data Improve AI Applications? A Comparative Study on Self-Supervised Learning in Computer Vision

Bauer, Markus; Augenstein, Christoph

doi:10.15439/2023F8371

Artykuł - szczegóły

Tytuł artykułu

Can Unlabelled Data Improve AI Applications? A Comparative Study on Self-Supervised Learning in Computer Vision

Autorzy

Bauer Markus , Augenstein Christoph

Wybrane pełne teksty z tego czasopisma

http://annals-csis.org

Identyfikatory

DOI

10.15439/2023F8371

Warianty tytułu

Języki publikacji

Abstrakty

Artificial Intelligence (AI) represents a highly investigated area of study at present and has already become an indispensable component within an extensive range of business models and applications. One major downside of current supervised AI approaches lies in the need of numerous annotated data points to train the models. Self-supervised learning (SSL) circumvents the need for annotation, by creating supervision signals such as labels from the data itself, rather than requiring experts for this task. Current approaches mainly include the use of generative methods such as autoencoders and joint embedding architectures to fulfil this task. Recent works present comparable results to supervised learning in downstream scenarios such as classification after SSL-pretraining. To achieve this, typically modifications are required to suit the approach for the exact downstream task. Yet, current review works haven't paid too much attention to the practical implications of using SSL. Thus, we investigated and implemented popular SSL approaches, suitable for downstream tasks such as classification, from an initial collection of more than 400 papers. We evaluate a selection of these approaches under real-world dataset conditions, and in direct comparison to the supervised learning scenario. We conclude that SSL has the potential to take up with supervised learning, if the right training methods are identified and applied. Furthermore, we also introduce future directions for SSL research, as well as current limitations in real-world applications.

Słowa kluczowe

training computer vision computational modeling supervised learning focusing self-supervised learning machine learning

szkolenie wizja komputerowa modelowanie obliczeniowe Uczenie nadzorowane koncentracja uczenie maszynowe

Wydawca

Polskie Towarzystwo Informatyczne

Czasopismo

Annals of Computer Science and Information Systems

Rocznik

2023

Tom

Vol. 35

Strony

93--101

Opis fizyczny

Bibliogr. 54 poz., rys., tab.

Twórcy

autor

Bauer Markus

bauer@wifa.uni-leipzig.de

Center for Scalable Data Analytics and Artificial Intelligence Humboldtstraße 25, Leipzig, 04105 Germany

autor

Augenstein Christoph

augenstein@wifa.uni-leipzig.de

Center for Scalable Data Analytics and Artificial Intelligence Humboldtstraße 25, Leipzig, 04105 Germany

Bibliografia

1. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in IEEE conference on computer vision and pattern recognition. IEEE, 2009, pp. 248–255. [Online]. Available: http://dx.doi.org/10.1109/CVPR.2009.5206848
2. M. Assran, R. Balestriero, Q. Duval, F. Bordes, I. Misra, P. Bojanowski, P. Vincent, M. Rabbat, and N. Ballas, “The hidden uniform cluster prior in self-supervised learning,” CoRR, vol. abs/2210.07277, 2022.
3. R. Balestriero, M. Ibrahim, V. Sobal, A. Morcos, S. Shekhar, T. Goldstein, F. Bordes, A. Bardes, G. Mialon, Y. Tian, A. Schwarzschild, A. G. Wilson, J. Geiping, Q. Garrido, P. Fernandez, A. Bar, H. Pirsiavash, Y. LeCun, and M. Goldblum, “A cookbook of self-supervised learning,” CoRR, vol. abs/2304.12210, 2023.
4. L. Deng, “The mnist database of handwritten digit images for machine learning research,” IEEE Signal Processing Magazine, vol. 29, no. 6, 2012. [Online]. Available: http://dx.doi.org/10.1109/MSP.2012.2211477
5. M.-E. Nilsback and A. Zisserman, “A visual vocabulary for flower classification,” in IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, 2006, pp. 1447–1454. [Online]. Available: http://dx.doi.org/10.1109/CVPR.2006.42
6. P. Omkar, M., V. Andrea, Z. Andrew, and J. C., V., “Cats and dogs,” in IEEE Conference on Computer Vision and Pattern Recognition, 2012. [Online]. Available: http://dx.doi.org/10.1109/CVPR.2012.6248092
7. W. M. Bramer, G. B. D. Jonge, M. L. Rethlefsen, F. Mast, and J. Kleijnen, “A systematic approach to searching: an efficient and complete method to develop literature searches,” Journal of the Medical Library Association, vol. 106, no. 4, Oct. 2018. [Online]. Available: http://dx.doi.org/10.5195/jmla.2018.283
8. P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol, “Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion,” J. Mach. Learn. Res., vol. 11, dec 2010. [Online]. Available: http://dx.doi.org/10.5555/1756006.1953039
9. D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” CoRR, vol. abs/1312.6114, 2013.
10. A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, and B. Frey, “Adversarial autoencoders,” CoRR, vol. abs/1511.05644, 2015.
11. T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” CoRR, vol. abs/2002.05709, 2020.
12. J. Zbontar, L. Jing, I. Misra, Y. LeCun, and S. Deny, “Barlow twins: Self-supervised learning via redundancy reduction,” CoRR, vol. abs/2103.03230, 2021.
13. M. Noroozi and P. Favaro, “Unsupervised learning of visual representations by solving jigsaw puzzles,” CoRR, vol. abs/1603.09246, 2016.
14. S. Gidaris, P. Singh, and N. Komodakis, “Unsupervised representation learning by predicting image rotations,” CoRR, vol. abs/1803.07728, 2018.
15. Z. Wu, Y. Xiong, S. Yu, and D. Lin, “Unsupervised feature learning via non-parametric instance-level discrimination,” CoRR, vol. abs/1805.01978, 2018.
16. I. Misra and L. van der Maaten, “Self-supervised learning of pretext-invariant representations,” CoRR, vol. abs/1912.01991, 2019.
17. L. Ternes, M. Dane, S. Gross, M. Labrie, G. Mills, J. Gray, L. Heiser, and Y. H. Chang, “A multi-encoder variational autoencoder controls multiple transformational features in single-cell image analysis,” Communications Biology, vol. 5, no. 1, 2022. [Online]. Available: http://dx.doi.org/10.1038/s42003-022-03218-x
18. W. Xiong, L. Zhang, B. Du, and D. Tao, “Combining local and global: Rich and robust feature pooling for visual recognition,” Pattern Recognition, vol. 62, 2017. [Online]. Available: http://dx.doi.org/10.1016/j.patcog.2016.08.006
19. S. Zhang, M. Xu, J. Zhou, and S. Jia, “Unsupervised spatial-spectral cnn-based feature learning for hyperspectral image classification,” IEEE Transactions on Geoscience & Remote Sensing, 2022. [Online]. Available: http://dx.doi.org/10.1109/TGRS.2022.3153673
20. C. Vununu, S.-H. Lee, and K.-R. Kwon, “A strictly unsupervised deep learning method for hep-2 cell image classification,” Sensors (14248220), vol. 20, no. 9, 2020. [Online]. Available: http://dx.doi.org/10.3390/s20092717
21. V. Prasad, D. Das, and B. Bhowmick, “Variational clustering: Leveraging variational autoencoders for image clustering,” CoRR, vol. abs/2005.046132, 2020.
22. J. Guérin, S. Thiery, E. Nyiri, O. Gibaru, and B. Boots, “Combining pretrained cnn feature extractors to enhance clustering of complex natural images,” Neurocomputing, vol. 423, 2021. [Online]. Available: http://dx.doi.org/10.1016/j.neucom.2020.10.068
23. J. Yang, D. Parikh, and D. Batra, “Joint unsupervised learning of deep representations and image clusters,” in IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2016. [Online]. Available: http://dx.doi.org/10.1109/CVPR.2016.556
24. X. Chen, C.-J. Hsieh, and B. Gong, “When vision transformers outperform resnets without pre-training or strong data augmentations,” CoRR, vol. abs/2106.01548, 2021.
25. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” CoRR, vol. abs/1512.03385, 2015.
26. H. Dong, L. Zhang, and B. Zou, “Exploring vision transformers for polarimetric sar image classification,” IEEE Transactions on Geoscience & Remote Sensing, 2022. [Online]. Available: http://dx.doi.org/10.1109/TGRS.2021.3137383
27. X. Wang, J. Zhu, Z. Yan, Z. Zhang, Y. Zhang, Y. Chen, and H. Li, “Last: Label-free self-distillation contrastive learning with transformer architecture for remote sensing image scene classification,” IEEE Geoscience and Remote Sensing Letters, vol. 19, 2022. [Online]. Available: http://dx.doi.org/10.1109/LGRS.2022.3185088
28. W. Zhou, Y. Hou, K. Ouyang, and S. Zhou, “Exploring complementary information of self–supervised pretext tasks for unsupervised video pre–training,” IET Computer Vision (Wiley-Blackwell), vol. 16, no. 3, 2022. [Online]. Available: http://dx.doi.org/10.1049/cvi2.12084
29. J. Ding, E. Xie, H. Xu, C. Jiang, Z. Li, P. Luo, and G.-S. Xia, “Deeply unsupervised patch re-identification for pre-training object detectors,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1, 2022. [Online]. Available: http://dx.doi.org/10.1109/TPAMI.2022.3164911
30. Y. Li, S. Kan, J. Yuan, W. Cao, and Z. He, “Spatial assembly networks for image representation learning,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 13 871–13 880. [Online]. Available: http://dx.doi.org/10.1109/CVPR46437.2021.01366
31. L. Fan, S. Liu, P.-Y. Chen, G. Zhang, and C. Gan, “When does contrastive learning preserve adversarial robustness from pretraining to finetuning?” CoRR, vol. abs/2111.01124, 2021.
32. P. Feng and H. Zhang, “Self-supervised image hash retrieval based on adversarial distillation,” in 2022 Asia Conference on Algorithms, Computing and Machine Learning (CACML), 2022, pp. 732–737. [Online]. Available: http://dx.doi.org/10.1109/CACML55074.2022.00127
33. M. Assran, M. Caron, I. Misra, P. Bojanowski, F. Bordes, P. Vincent, A. Joulin, M. Rabbat, and N. Ballas, “Masked siamese networks for label-efficient learning,” CoRR, vol. abs/2204.07141, 2022.
34. M. Assran, Q. Duval, I. Misra, P. Bojanowski, P. Vincent, M. Rabbat, Y. LeCun, and N. Ballas, “Self-supervised learning from images with a joint-embedding predictive architecture,” CoRR, vol. abs/2301.08243, 2023.
35. J. Yan, H. Chen, X. Li, and J. Yao, “Deep contrastive learning based tissue clustering for annotation-free histopathology image analysis,” Computerized Medical Imaging & Graphics, vol. 97, pp. N.PAG–N.PAG, 2022. [Online]. Available: http://dx.doi.org/10.1016/j.compmedimag.2022.102053
36. M. Caron, I. Misra, J. Mairal, P. Goyal, P. Bojanowski, and A. Joulin, “Unsupervised learning of visual features by contrasting cluster assignments,” CoRR, vol. abs/2006.09882, 2020.
37. A. Gomez-Villa, B. Twardowski, L. Yu, A. D. Bagdanov, and J. van de Weijer, “Continually learning self-supervised representations with projected functional regularization,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2022, pp. 3866–3876. [Online]. Available: http://dx.doi.org/10.1109/CVPRW56347.2022.00432
38. H. Kahng and S. B. Kim, “Self-supervised representation learning for wafer bin map defect pattern classification,” IEEE Transactions on Semiconductor Manufacturing, vol. 34, no. 1, 2021. [Online]. Available: http://dx.doi.org/10.1109/TSM.2020.3038165
39. W. Dai, M. Erdt, and A. Sourin, “Self-supervised pairing image clustering for automated quality control,” Visual Computer, vol. 38, no. 4, 2022. [Online]. Available: http://dx.doi.org/10.1007/s00371-021-02137-y
40. C.-H. Yeh, C.-Y. Hong, Y.-C. Hsu, and T.-L. Liu, “Saga: Self-augmentation with guided attention for representation learning,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 3463–3467. [Online]. Available: http://dx.doi.org/10.1109/ICASSP43922.2022.9747302
41. P. Yin, L. Qi, X. Xi, B. Zhang, and H. Qiao, “Nflb dropout: Improve generalization ability by dropping out the best -a biologically inspired adaptive dropout method for unsupervised learning,” in 2016 International Joint Conference on Neural Networks (IJCNN), 2016, pp. 1180–1186. [Online]. Available: http://dx.doi.org/10.1109/IJCNN.2016.7727331
42. X. Li, X. Hu, X. Qi, L. Yu, W. Zhao, P.-A. Heng, and L. Xing, “Rotation-oriented collaborative self-supervised learning for retinal disease diagnosis,” IEEE Transactions on Medical Imaging, vol. 40, no. 9, 2021. [Online]. Available: http://dx.doi.org/10.1109/TMI.2021.3075244
43. K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” CoRR, vol. abs/1911.05722, 2019.
44. K. Pang, K. Li, Y. Yang, H. Zhang, T. M. Hospedales, T. Xiang, and Y.-Z. Song, “Generalising fine-grained sketch-based image retrieval,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. [Online]. Available: http://dx.doi.org/10.1109/CVPR.2019.00077
45. J. Lu, L. Li, and C. Zhang, “Self-reinforcing unsupervised matching,” IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 44, no. 8, 2022. [Online]. Available: http://dx.doi.org/10.1109/TPAMI.2021.3061945
46. X. Fang, Y. Cai, Z. Cai, X. Jiang, and Z. Chen, “Sparse feature learning of hyperspectral imagery via multiobjective-based extreme learning machine,” Sensors (14248220), vol. 20, no. 5, 2020. [Online]. Available: http://dx.doi.org/10.3390/s20051262
47. J. Liu, M. Gong, and H. He, “Deep associative neural network for associative memory based on unsupervised representation learning,” Neural Networks, vol. 113, 2019. [Online]. Available: http://dx.doi.org/10.1016/j.neunet.2019.01.004
48. Y. LeCun, “A path towards autonomous machine intelligence,” under review, 2022.
49. J. Zhang, H. Wang, J. Chu, S. Huang, T. Li, and Q. Zhao, “Improved gaussian–bernoulli restricted boltzmann machine for learning discriminative representations,” Knowledge-Based Systems, vol. 185, pp. N.PAG–N.PAG, 2019. [Online]. Available: http://dx.doi.org/10.1016/j.knosys.2019.104911
50. B. Xiaojun and W. Haibo, “Contractive slab and spike convolutional deep boltzmann machine,” Neurocomputing, vol. 290, 2018. [Online]. Available: http://dx.doi.org/10.1016/j.neucom.2018.02.048
51. M. Sakkari, M. Hamdi, H. Elmannai, A. AlGarni, and M. Zaied, “Feature extraction-based deep self-organizing map,” Circuits, Systems & Signal Processing, vol. 41, no. 5, 2022. [Online]. Available: http://dx.doi.org/10.1007/s00034-021-01914-3
52. P. Goyal, Q. Duval, J. Reizenstein, M. Leavitt, M. Xu, B. Lefaudeux, M. Singh, V. Reis, M. Caron, P. Bojanowski, A. Joulin, and I. Misra, “VISSL,” https://github.com/facebookresearch/vissl, 2021.
53. S. H. Lee, S. Lee, and B. C. Song, “Vision transformer for small-size datasets,” CoRR, vol. abs/2112.13492, 2021.
54. Y. Zhong, H. Tang, J. Chen, J. Peng, and Y.-X. Wang, “Is self-supervised learning more robust than supervised learning?” CoRR, vol. abs/2206.05259, 2022.

Uwagi

1. Main Track Regular Papers

2. Opracowanie rekordu ze środków MEiN, umowa nr SONP/SP/546092/2022 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2024).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-6d53b2a4-20be-405c-9cfe-328f757c31ef