Feature map augmentation to improve scale invariance in convolutional neural networks

Kumar, Dinesh; Sharma, Dharmendra

doi:10.2478/jaiscr-2023-0004

Artykuł - szczegóły

Tytuł artykułu

Feature map augmentation to improve scale invariance in convolutional neural networks

Autorzy

Kumar Dinesh , Sharma Dharmendra

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.2478/jaiscr-2023-0004

Warianty tytułu

Języki publikacji

Abstrakty

Introducing variation in the training dataset through data augmentation has been a popular technique to make Convolutional Neural Networks (CNNs) spatially invariant but leads to increased dataset volume and computation cost. Instead of data augmentation, augmentation of feature maps is proposed to introduce variations in the features extracted by a CNN. To achieve this, a rotation transformer layer called Rotation Invariance Transformer (RiT) is developed, which applies rotation transformation to augment CNN features. The RiT layer can be used to augment output features from any convolution layer within a CNN. However, its maximum effectiveness is shown when placed at the output end of final convolution layer. We test RiT in the application of scale-invariance where we attempt to classify scaled images from benchmark datasets. Our results show promising improvements in the networks ability to be scale invariant whilst keeping the model computation cost low.

Słowa kluczowe

convolutional neural network feature map augmentation global features scale-invariant vision system

Wydawca

University of Social Sciences

Czasopismo

Journal of Artificial Intelligence and Soft Computing Research

Rocznik

2023

Tom

Vol. 13, No. 1

Strony

51--74

Opis fizyczny

Bibliogr. 41 poz., rys.

Twórcy

autor

Kumar Dinesh

dinesh.i.kumar@usp.ac.fj

School of Technology, Engineering, Mathematics and Physics, University of the South Pacific, Laucala Bay Road, Suva, Fiji

https://orcid.org/0000-0003-4693-0097

autor

Sharma Dharmendra

Faculty of Science and Technology, University of Canberra, Canberra, ACT, 2617, Australia

https://orcid.org/0000-0002-9856-4685

Bibliografia

[1] J. Dicarlo, D. Zoccolan, and N. C Rust, How does the brain solve visual object recognition? Neuron, vol. 73, pp. 415–34, 02 2012.
[2] D. Kumar, D. Sharma, and R. Goecke, Feature map augmentation to improve rotation invariance in convolutional neural networks, in Advanced Concepts for Intelligent Vision Systems, J. BlancTalon, P. Delmas, W. Philips, D. Popescu, and P. Scheunders, Eds. Cham: Springer International Publishing, 2020, pp. 348–359.
[3] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner et al., Gradient-based learning applied to document recognition, Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[4] K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556, 2014.
[5] K. He, X. Zhang, S. Ren, and J. Sun, Deep residua learning for image recognition, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
[6] A. Krizhevsky, G. Hinton et al., Learning multiple layers of features from tiny images, Citeseer, Tech. Rep., 2009.
[7] H. Xiao, K. Rasul, and R. Vollgraf, FashionMNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms, arXiv, Tech. Rep., 2017.
[8] F. F. Li, A. Karpathy, and J. Johnson, Tiny ImageNet Visual Recognition Challenge, https://tinyimagenet.herokuapp.com/, 2019, [Online; accessed 30-Dec-2019].
[9] A. Shaw, Imagehoof dataset, https://github.com/fastai/imagenette/blob/master/README.md, 2019, [Online; accessed 10-Dec2019].
[10] R. Maximilian and P. Tomaso, Hierarchical models of object recognition in cortex, Nature Neuroscience, vol. 2, pp. 1019–1025, 1999.
[11] T. Serre, L. Wolf, S. Bileschi, M. Riesenhuber, and T. Poggio, Robust object recognition with cortex-like mechanisms, IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 3, pp. 411–426, Mar. 2007. [Online]. Available:http://dx.doi.org/10.1109/TPAMI.2007.56
[12] T. Serre, Hierarchical Models of the Visual System, in Encyclopedia of Computational Neuroscience, D. Jaeger and R. Jung, Eds. New York, NY: Springer New York, 2013, pp. 1–12.
[13] T. Poggio and T. Serre, Models of visual cortex, Scholarpedia, vol. 8, no. 4, p. 3516, 2013, revision #149958.
[14] P. M. Bays, A signature of neural coding at human perceptual limits, Journal of Vision, vol. 16, no. 11, pp. 4–4, 09 2016. [Online]. Available: https://doi.org/10.1167/16.11.4
[15] D. H. Hubel and T. N. Wiesel, Receptive fields of single neurons in the cat’s striate cortex, J. Physiol, vol. 148, pp. 574–591, apr 1959.
[16] Q. Zhao, T. Sheng, Y. Wang, Z. Tang, Y. Chen, L. Cai, and H. Ling, M2det: A single-shot object detector based on multi-level feature pyramid network, in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, 2019, pp. 9259–9266.
[17] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, Going deeper with convolutions, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9.
[18] R. Girshick, Fast r-cnn, in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440–1448.
[19] N. Van Noord and E. Postma, Learning scalevariant and scale-invariant features for deep image classification, Pattern Recognition, vol. 61, pp. 583–592, 2017.
[20] A. Kanazawa, A. Sharma, and D. W. Jacobs, Locally scale-invariant convolutional neural networks, CoRR, vol. abs/1412.5104, 2014.[21] D. Marcos, B. Kellenberger, S. Lobry, and D. Tuia, Scale equivariance in cnns with vector fields, arXiv preprint ar iv:1807.11783, 2018.
[22] L. Ou, Z. Chen, J. Lu, and Y. Luo, Regularizing cnn via feature augmentation, in International Conference on Neural Information Processing. Springer, 2017, pp. 325–332.
[23] T. DeVries and G. W. Taylor, Dataset augmentation in feature space, arXiv preprint arXiv:1702.05538, 2017.
[24] B. Bayar and M. C. Stamm, Augmented convolutional feature maps for robust cnn-based camera model identification, in 2017 IEEE International Conference on Image Processing (ICIP). IEEE, 2017, pp. 4098–4102.
[25] D. Marcos, M. Volpi, and D. Tuia, Learning rotation invariant convolutional filters for texture classification, in 2016 23rd International Conference on Pattern Recognition (ICPR). IEEE, 2016, pp. 2012–2017.
[26] M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu, Spatial transformer networks, in Advances in Neural Information Processing Systems 28. Curran Associates, Inc., 2015, pp. 2017–2025.
[27] L. Finnveden, Y. Jansson, and T. Lindeberg, The problems with using stns to align cnn feature maps, arXiv preprint arXiv:2001.05858, 2020.
[28] Y. Gong, L. Wang, R. Guo, and S. Lazebnik, Multiscale orderless pooling of deep convolutional activation features, in European conference on computer vision. Springer, 2014, pp. 392–407.
[29] S. Zagoruyko and N. Komodakis, Learning to compare image patches via convolutional neural networks, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 4353–4361.
[30] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, Rethinking the inception architecture for computer vision, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2818–2826.
[31] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, Inception-v4, inception-resnet and the impact of residual connections on learning, in ThirtyFirst AAAI Conference on Artificial Intelligence, 2017.
[32] D. Kumar and D. Sharma, Distributed information integration in convolutional neural networks, in Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications – Volume 5: VISAPP,. SciTePress, 2020, pp. 491–498.
[33] D. Kumar and D. Sharma, Feature map upscaling to improve scale invariance in convolutional neural networks, in Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, vol. 5. Scitepress, Feb. 2021, pp. 113–122.
[34] J. Heaton, Introduction to Neural Networks for Java, 2Nd Edition, 2nd ed. Heaton Research, Inc., 2008.
[35] H. Hosseini, B. Xiao, M. Jaiswal, and R. Poovendran, On the limitation of convolutional neural networks in recognizing negative images, in 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, 2017, pp. 352–358.
[36] D. Kumar, Multi-modal information extraction and fusion with convolutional neural networks for classification of scaled images, Ph.D. dissertation, University of Canberra, Canberra, Australia, 2020.
[37] D. Kumar and D. Sharma, Multi-modal information extraction and fusion with convolutional neural networks, in 2020 International Joint Conference on Neural Networks (IJCNN). IEEE World Congress on Computational Intelligence (IEEE WCCI), 2020, pp. 1–9.
[38] P. P. Tanner, P. Jolicoeur, W. B. Cowan, K. Booth, and F. D. Fishman, Antialiasing: A technique for smoothing jagged lines on a computer graphics image—an implementation on the amiga, Behavior Research Methods, Instruments, & Computers, vol. 21, no. 1, pp. 59–66, 1989.
[39] T. G. Dietterich, Approximate statistical tests for comparing supervised classification learning algorithms, Neural computation, vol. 10, no. 7, pp. 1895–1923, 1998.
[40] R. Meyes, M. Lu, C. W. de Puiseau, and T. Meisen, Ablation studies in artificial neural networks, arXiv preprint arXiv:1901.08644, 2019.
[41] R. Annunziata, C. Sagonas, and J. Calı, Destnet: Densely fused spatial transformer networks, arXiv preprint arXiv:1807.04050, 2018.

Uwagi

Opracowanie rekordu ze środków MEiN, umowa nr SONP/SP/546092/2022 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2022-2023).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-b7dc91ad-24b3-4fb6-9be2-0c1babd4e76a