Learning abstract visual reasoning via task decomposition: A case study in Raven progressive matrices

Kwiatkowski, Jakub; Krawiec, Krzysztof

doi:10.61822/amcs-2024-0022

Artykuł - szczegóły

Tytuł artykułu

Learning abstract visual reasoning via task decomposition: A case study in Raven progressive matrices

Autorzy

Kwiatkowski Jakub , Krawiec Krzysztof

Treść / Zawartość

Pełne teksty:

10_kwiatkowski_krawiec_learning_abstract_visual_reasoning_via_task_decomposition_2024_2.pdf

Pobierz

Identyfikatory

DOI

10.61822/amcs-2024-0022

Warianty tytułu

Języki publikacji

Abstrakty

Learning to perform abstract reasoning often requires decomposing the task in question into intermediate subgoals that are not specified upfront, but need to be autonomously devised by the learner. In Raven progressive matrices (RPMs), the task is to choose one of the available answers given a context, where both the context and answers are composite images featuring multiple objects in various spatial arrangements. As this high-level goal is the only guidance available, learning to solve RPMs is challenging. In this study, we propose a deep learning architecture based on the transformer blueprint which, rather than directly making the above choice, addresses the subgoal of predicting the visual properties of individual objects and their arrangements. The multidimensional predictions obtained in this way are then directly juxtaposed to choose the answer. We consider a few ways in which the model parses the visual input into tokens and several regimes of masking parts of the input in self-supervised training. In experimental assessment, the models not only outperform state-of-the-art methods but also provide interesting insights and partial explanations about the inference. The design of the method also makes it immune to biases that are known to be present in some RPM benchmarks.

Słowa kluczowe

abstract visual reasoning Raven progressive matrices machine learning problem decomposition

abstrakcyjne rozumowanie wizualne uczenie maszynowe rozkład problemu

Wydawca

Oficyna Wydawnicza Uniwersytetu Zielonogórskiego

Czasopismo

International Journal of Applied Mathematics and Computer Science

Rocznik

2024

Tom

Vol. 34, no. 2

Strony

309--321

Opis fizyczny

Bibliogr. 24 poz., rys., tab.

Twórcy

autor

Kwiatkowski Jakub

jakub.k.kwiatkowski@doctorate.put.poznan.pl

Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznań, Poland

autor

Krawiec Krzysztof

krawiec@cs.put.poznan.pl

Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznań, Poland

Bibliografia

[1] Barrett, D., Hill, F., Santoro, A., Morcos, A. and Lillicrap, T. (2018). Measuring abstract reasoning in neural networks, in J. Dy and A. Krause (Eds), Proceedings of the 35th International Conference on Machine Learning, Proceedings of Machine Learning Research, Vol. 80, PMLR, Cambridge, pp. 511-520.
[2] Benny, Y., Pekar, N. and Wolf, L. (2021). Scale-localized abstract reasoning, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, pp. 12557-12565.
[3] Bongard, M. (1970). Pattern Recognition, Spartan Books, Baltimore.
[4] Defays, D. (1995). Numbo: A study in cognition and recognition, https://www.researchgate.net/publication/262363566_Numbo_a_study_in_cognition_and_recognition.
[5] Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K. and Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database, 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, USA, pp. 248-255.
[6] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J. and Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale, arXiv: 2010.11929.
[7] Hahne, L., Lüddecke, T., Wörgötter, F. and Kappel, D. (2019). Attention on abstract visual reasoning, CoRR: abs/1911.05990.
[8] Hofstadter, D.R. (1995). Fluid Concepts & Creative Analogies: Computer Models of the Fundamental Mechanisms of Thought, Basic Books, New York.
[9] Hu, S., Ma, Y., Liu, X., Wei, Y. and Bai, S. (2020). Hierarchical rule induction network for abstract visual reasoning, https://www.researchgate.net/publication/339324056_Hierarchical_Rule_Induction_Network_for_Abstract_Visual_Reasoning.
[10] Hu, S., Ma, Y., Liu, X., Wei, Y. and Bai, S. (2021). Stratified rule-aware network for abstract visual reasoning, Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1567-1574, (virtual).
[11] Kim, Y., Shin, J., Yang, E. and Hwang, S.J. (2020). Few-shot visual reasoning with meta-analogical contrastive learning, in H. Larochelle et al. (Eds), Advances in Neural Information Processing Systems, Vol. 33, Curran Associates, Inc., Red Hook, pp. 16846-16856.
[12] Lei Ba, J., Kiros, J.R. and Hinton, G.E. (2016). Layer normalization, arXiv: 1607.06450.
[13] Luo, W., Li, Y., Urtasun, R. and Zemel, R. (2017). Understanding the effective receptive field in deep convolutional neural networks, arXiv: 1701.04128.
[14] Małkiński, M. and Mańdziuk, J. (2022a). Deep learning methods for abstract visual reasoning: A survey on Raven’s progressive matrices, arXiv: 2201.12382.
[15] Małkiński, M. and Mańdziuk, J. (2022b). Multi-label contrastive learning for abstract visual reasoning, IEEE Transactions on Neural Networks and Learning Systems 35(2): 1941-1953, DOI: 10.1109/TNNLS.2022.3185949.
[16] Raven, J.C. (1936). Mental Tests Used in Genetic, the Performance of Related Individuals on Tests Mainly Educative and Mainly Reproductive, MSc thesis, University of London, London.
[17] Spratley, S., Ehinger, K. and Miller, T. (2020). A closer look at generalisation in Raven, Computer Vision, ECCV 2020: 16th European Conference, Glasgow, UK, pp. 601-616, DOI: 10.1007/978-3-030-58583-9_36.
[18] Tan, M. and Le, Q. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks, in K. Chaudhuri and R. Salakhutdinov (Eds), Proceedings of the 36th International Conference on Machine Learning, Proceedings of Machine Learning Research, Vol. 97, PMLR, Cambridge, pp. 6105-6114.
[19] Tan, M. and Le, Q.V. (2021). EfficientNetV2: Smaller models and faster training, in M. Meila and T. Zhang (Eds), Proceedings of the 38th International Conference on Machine Learning, ICML 2021, Proceedings of Machine Learning Research, Vol. 139, PMLR, Cambrige, pp. 10096-10106.
[20] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L. and Polosukhin, I. (2017). Attention is all you need, in I. Guyon et al. (Eds), Advances in Neural Information Processing Systems, Vol. 30, Curran Associates, Inc., Red Hook.
[21] Wu, Y., Dong, H., Grosse, R.B. and Ba, J. (2020). The scattering compositional learner: Discovering objects, attributes, relationships in analogical reasoning, CoRR: abs/2007.04212.
[22] Zhang, C., Gao, F., Jia, B., Zhu, Y. and Zhu, S.-C. (2019a). Raven: A dataset for relational and analogical visual reasoning, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, pp. 5312-5322.
[23] Zhang, C., Jia, B., Gao, F., Zhu, Y., Lu, H. and Zhu, S.-C. (2019b). Learning perceptual inference by contrasting, in H. Wallach et al. (Eds), Advances in Neural Information Processing Systems, Vol. 32, Curran Associates, Inc., Red Hook.
[24] Zhuo, T. and Kankanhalli, M.S. (2021). Effective abstract reasoning with dual-contrast network, 9th International Conference on Learning Representations, ICLR 2021, (virtual).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-a85010dd-72cd-4d78-b58e-76ad0c8fac7d