High-level and Low-level Feature Set for Image Caption Generation with Optimized Convolutional Neural Network

Padate, Roshni; Jain, Amit; Kalla, Mukesh; Sharma, Arvind

doi:10.26636/jtit.2022.164222

Powiadomienia systemowe

Sesja wygasła!

Artykuł - szczegóły

Tytuł artykułu

High-level and Low-level Feature Set for Image Caption Generation with Optimized Convolutional Neural Network

Autorzy

Padate Roshni , Jain Amit , Kalla Mukesh , Sharma Arvind

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.26636/jtit.2022.164222

Warianty tytułu

Języki publikacji

Abstrakty

Automatic creation of image descriptions, i.e. cap- tioning of images, is an important topic in artificial intelligence (AI) that bridges the gap between computer vision (CV) and natural language processing (NLP). Currently, neural networks are becoming increasingly popular in captioning images and researchers are looking for more efficient models for CV and sequence-sequence systems. This study focuses on a new image caption generation model that is divided into two stages. Ini- tially, low-level features, such as contrast, sharpness, color and their high-level counterparts, such as motion and facial impact score, are extracted. Then, an optimized convolutional neural network (CNN) is harnessed to generate the captions from im- ages. To enhance the accuracy of the process, the weights of CNN are optimally tuned via spider monkey optimization with sine chaotic map evaluation (SMO-SCME). The development of the proposed method is evaluated with a diversity of metrics.

Słowa kluczowe

CNN image caption proposed contrast sharpness SMO-SCME algorithm

Wydawca

Instytut Łączności - Państwowy Instytut Badawczy

Czasopismo

Journal of Telecommunications and Information Technology

Rocznik

2022

Tom

nr 4

Strony

67--74

Opis fizyczny

Bibliogr. 41 poz., rys., wykr.

Twórcy

autor

Padate Roshni

Roshni.padate@spsu.ac.in

Department of Computer Science and Engineering, Sir Padampat Singhania University, India

autor

Jain Amit

amit.jain@spsu.ac.in

Department of Computer Science and Engineering, Sir Padampat Singhania University, India

autor

Kalla Mukesh

mukesh.kalla@spsu.ac.in

Department of Computer Science and Engineering, Sir Padampat Singhania University, India

autor

Sharma Arvind

sharma.arvind@spsu.ac.in

Department of Computer Science and Engineering, Sir Padampat Singhania University, India

Bibliografia

[1] Z. Deng, Z. Jiang, R. Lan, W. Huang, and X. Luo, “Image captioning using DenseNet network and adaptive attention”, Signal Processing: Image Communication, vol. 85, 2020 (DOI: 10.1016/j.image.2020.115836).
[2] J. Su, J. Tang, Z. Lu, X. Han, and H. Zhang, “A neural image captioning model with caption-to-images semantic constructor”, Neurocomputing, vol. 367, 2019, pp. 144–151 (DOI: 10.1016/j.neucom.2019.08.012).
[3] S. Bang and H. Kim, “Context-based information generation for managing UAV-acquired data using image captioning”, Automation in Construction, vol. 112, 2020 (DOI: 10.1016/j.autcon.2020.103116).
[4] H. Wang, H. Wang, and K. Xu, “Evolutionary recurrent neural network for image captioning”, Neurocomputing, vol. 401, pp. 249–256, 2020 (DOI: 10.1016/j.neucom.2020.03.087).
[5] R. Li, H. Liang, Y. Shi, F. Feng, and X. Wang, “Dual-CNN: A convolutional language decoder for paragraph image captioning”, Neurocomputing, vol. 396, pp. 92–101, 2020 (DOI: 10.1016/j.neucom.2020.02.041).
[6] J. Guan and E. Wang, “Repeated review based image captioning for image evidence review”, Signal Processing: Image Communication, vol. 63, pp. 141–148, 2018 (DOI: 10.1016/j.image.2018.02.005).
[7] A. Singh, T.D. Singh, and S. Bandyopadhyay, “An encoder-decoder based framework for hindi image caption generation”, Multimed. Tools Appl 80, pp. 35721–35740, 2021 (DOI: 10.1007/s11042-021-11106- 5).
[8] Ph. Kinghorn, L. Zhang, and L. Shao, “A region-based image caption generator with refined descriptions”, Neurocomputing, vol. 272, pp. 416–424, 2018 (DOI: 10.1016/j.neucom.2017.07.014).
[9] Q. Liu, Y. Chen, J. Wang, and S. Zhang, “Multi-view pedestrian captioning with an attention topic CNN model”, Computers in Industry, vol. 97, pp. 47–53, 2018 (DOI: 10.1016/j.compind.2018.01.015).
[10] G. Christie, A. Laddha, A. Agrawal, S. Antol, and D. Batra, “Resolving vision and language ambiguities together: Joint segmentation & prepositional attachment resolution in captioned scenes”, Computer Vision and Image Understanding, vol. 163, pp. 101–112, 2017 (DOI: 10.1016/j.cviu.2017.09.001).
[11] F. Xiao, X. Gong, Y. Zhang, Y. Shen, and X. Gao, “DAA: Dual LSTMs with adaptive attention for image captioning”, Neurocomputing, vol. 364, pp. 322–329, 2019 (DOI: 10.1016/j.neucom.2019.06.085).
[12] G. Huang and H. Hu, “c-RNN: A Fine-Grained Language Model for Image Captioning”, Neural Process Lett, 2018 (DOI: 10.1007/s11063- 018-9836-2).
[13] C. Wu, Y. Wei, X. Chu, F. Su, and L. Wang, “Modeling visual and word-conditional semantic attention for image captioning”, Signal Processing: Image Communication, vol. 67, pp. 100–107, 2018 (DOI: 10.1016/j.image.2018.06.002).
[14] J. Yang, Y. Sun, J. Liang, B. Ren, and S. Lai, “Image captioning by incorporating affective concepts learned from both visual and textual components”, Neurocomputing, 2018 (DOI: 10.1016/j.neucom.2018.03.078).
[15] T. Yinghua and C.S. Chee, “Phrase-based Image Caption Generator with Hierarchical LSTM Network”, Neurocomputing, 2018 (DOI: 10.1016/j.neucom.2018.12.026).
[16] A. Yuan, X. Li, and X. Lu, “3G structure for image caption generation”, Neurocomputing, 2018 (DOI: 10.1016/j.neucom.2018.10.059).
[17] Ch. Fan, Z. Zhang, and D.J. Crandall, “Deepdiary: Lifelogging image captioning and summarization”, Journal of Visual Communication and Image Representation, vol. 55, pp. 40–55, 2018 (DOI: 10.1016/j.jvcir.2018.05.008).
[18] X. Chen, M. Zhang, Z. Wang, L. Zuo, and Y. Yang, “Leveraging Unpaired Out-of-Domain Data for Image Captioning”, Pattern Recognition Letters, In press, accepted manuscript, 2018 (DOI: 10.1016/j.patrec.2018.12.018).
[19] Z. Ye, et al., “A novel automatic image caption generation using bidirectional long-short term memory framework”, Multimed Tools Appl 80, pp. 25557–25582, 2021 (DOI: 10.1007/s11042-021-10632- 6).
[20] H. Zhang et al., “Novel model to integrate word embeddings and syntactic trees for automatic caption generation from images”, Soft Comput 24, pp. 1377–1397, 2020 (DOI: 10.1007/s00500-019-03973- w).
[21] C. Sur, “AACR: Feature Fusion Effects of Algebraic Amalgamation Composed Representation on (De)Compositional Network for Caption Generation for Images”, SN Comput. Sci. 1, 229, 2020 (DOI: 10.1007/s42979-020-00238-4).
[22] C. Shan, A. Gaoyun, Z. Zhenxing, and R. Qiuqi, “Interactions guided generative adversarial network for unsupervised image captioning”, Neurocomputing, vol. 417, pp. 419–431, 2020 (DOI: 10.1016/j.neucom.2020.08.019).
[23] Y. Wei, L. Wang, and C. Wu, “Multi-Attention Generative Adversarial Network for image captioning”, Neurocomputing, vol. 387, pp. 91–99, 2019 (DOI: 10.1016/j.neucom.2019.12.073).
[24] M. Yang et al., “An Ensemble of Generation- and Retrieval-Based Image Captioning With Dual Generator Generative Adversarial Network”, IEEE Transactions on Image Processing, vol. 29, pp. 9627– 9640, 2020 (DOI: 10.1109/TIP.2020.3028651).
[25] D. Zhao, Z. Chang, and S. Guo, “A multimodal fusion approach for image captioning”, Neurocomputing, vol. 329, pp. 476–485, 2019 (DOI: 10.1016/j.neucom.2018.11.004).
[26] S. Ding, S. Qu, and S. Wan, “Image caption generation with high-level image features”, Pattern Recognition Letters, vol. 123, pp. 89–95, 2019 (DOI: 10.1016/j.patrec.2019.03.021).
[27] S.R. Kodituwakku, “Comparison of Color Features for Image Retrieval”, Indian Journal of Computer Science and Engineering, vol. 1, no. 3, pp. 207–211 (http://www.ijcse.com/docs/IJCSE10- 01-03-06.pdf).
[28] https://photography.tutsplus.com/tutorials/whatis-image-sharpening--cms-26627.
[29] T. Bouwmans, C. Silva, C. Marghes, M.S. Zitouni, H. Bhaskar, and C. Frelicot, “On the role and the importance of features for background modeling and foreground detection”, Computer Science Review, vol. 28, pp. 26–91, 2018 (ISSN 15740137, DOI: 10.1016/j.cosrev.2018.01.004).
[30] https://en.wikipedia.org/wiki/Motion_analysis.
[31] S. Harish, G. Hazrati, and J.C. Bansal, “Spider Monkey Optimization Algorithm”, 2019 (DOI: 10.1007/978-3-319-91341-4_4).
[32] B.R. Rajakumar, “Impact of Static and Adaptive Mutation Techniques on Genetic Algorithm”, International Journal of Hybrid Intelligent Systems, vol. 10, no. 1, pp. 11–22, 2013 (DOI: 10.3233/HIS-120161).
[33] B.R. Rajakumar, “Static and Adaptive Mutation Techniques for Genetic algorithm: A Systematic Comparative Analysis”, International Journal of Computational Science and Engineering, vol. 8, no. 2, pp. 180–193, 2013 (DOI: 10.1504/IJCSE.2013.053087).
[34] S.M. Swamy, B.R. Rajakumar and I.R. Valarmathi, “Design of Hybrid Wind and Photovoltaic Power System using Opposition-based Genetic Algorithm with Cauchy Mutation”, IET Chennai Fourth International Conference on Sustainable Energy and Intelligent Systems (SEISCON 2013), 2013 (DOI: 10.1049/ic.2013.0361).
[35] A. George and B.R. Rajakumar, “APOGA: An Adaptive Population Pool Size based Genetic Algorithm”, AASRI Procedia – 2013 AASRI Conference on Intelligent Systems and Control (ISC 2013), vol. 4, pp. 288–296, 2013 (DOI: 10.1016/j.aasri.2013.10.043).
[36] B.R. Rajakumar and A. George, “A New Adaptive Mutation Tech74 JOURNAL OF TELECOMMUNICATIONS AND INFORMATION TECHNOLOGY 4/2022 High-level and Low-level Feature Set for Image Caption Generation with Optimized Convolutional Neural Network nique for Genetic Algorithm”, In proceedings of IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), pp. 1–7, 2012 (DOI: 10.1109/ICCIC.2012.6510293).
[37] M.B. Wagh and N. Gomathi, “Improved GWO-CS Algorithm-Based Optimal Routing Strategy in VANET”, Journal of Networking and Communication Systems, vol. 2, no. 1, pp. 34–42, 2019 (DOI: 10.46253/jnacs.v2i1.a4).
[38] S. Halbhavi, S.F. Kodad, S.K. Ambekar, and D. Manjunath, “Enhanced Invasive Weed Optimization Algorithm with Chaos Theory for Weightage based Combined Economic Emission Dispatch”, Journal of Computational Mechanics, Power System and Control, vol. 2, no. 3, pp. 19–27, 2019 (DOI: 10.46253/jcmps.v2i3.a3).
[39] A.N. Jadhav and N. Gomathi, “DIGWO: Hybridization of Dragonfly Algorithm with Improved Grey Wolf Optimization Algorithm for Data Clustering”, Multimedia Research, vol. 2, no. 3, pp. 1–11, 2019 (DOI: 10.46253/j.mr.v2i3.a1).
[40] https://www.kaggle.com/ming666/flicker8k-dataset.
[41] D. Songtao, et al., “Image caption generation with high-level image features”, Pattern Recognition Letters 123, pp. 89–95, 2019 (DOI: 10.1016/j.patrec.2019.03.021).

Uwagi

Opracowanie rekordu ze środków MEiN, umowa nr SONP/SP/546092/2022 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2022-2023).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-5cea2132-e4ef-4ca9-b0d8-ce1d99d44f88