PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Powiadomienia systemowe
  • Sesja wygasła!
Tytuł artykułu

High-level and Low-level Feature Set for Image Caption Generation with Optimized Convolutional Neural Network

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Automatic creation of image descriptions, i.e. cap- tioning of images, is an important topic in artificial intelligence (AI) that bridges the gap between computer vision (CV) and natural language processing (NLP). Currently, neural networks are becoming increasingly popular in captioning images and researchers are looking for more efficient models for CV and sequence-sequence systems. This study focuses on a new image caption generation model that is divided into two stages. Ini- tially, low-level features, such as contrast, sharpness, color and their high-level counterparts, such as motion and facial impact score, are extracted. Then, an optimized convolutional neural network (CNN) is harnessed to generate the captions from im- ages. To enhance the accuracy of the process, the weights of CNN are optimally tuned via spider monkey optimization with sine chaotic map evaluation (SMO-SCME). The development of the proposed method is evaluated with a diversity of metrics.
Słowa kluczowe
Rocznik
Tom
Strony
67--74
Opis fizyczny
Bibliogr. 41 poz., rys., wykr.
Twórcy
  • Department of Computer Science and Engineering, Sir Padampat Singhania University, India
autor
  • Department of Computer Science and Engineering, Sir Padampat Singhania University, India
autor
  • Department of Computer Science and Engineering, Sir Padampat Singhania University, India
  • Department of Computer Science and Engineering, Sir Padampat Singhania University, India
Bibliografia
  • [1] Z. Deng, Z. Jiang, R. Lan, W. Huang, and X. Luo, “Image captioning using DenseNet network and adaptive attention”, Signal Processing: Image Communication, vol. 85, 2020 (DOI: 10.1016/j.image.2020.115836).
  • [2] J. Su, J. Tang, Z. Lu, X. Han, and H. Zhang, “A neural image captioning model with caption-to-images semantic constructor”, Neurocomputing, vol. 367, 2019, pp. 144–151 (DOI: 10.1016/j.neucom.2019.08.012).
  • [3] S. Bang and H. Kim, “Context-based information generation for managing UAV-acquired data using image captioning”, Automation in Construction, vol. 112, 2020 (DOI: 10.1016/j.autcon.2020.103116).
  • [4] H. Wang, H. Wang, and K. Xu, “Evolutionary recurrent neural network for image captioning”, Neurocomputing, vol. 401, pp. 249–256, 2020 (DOI: 10.1016/j.neucom.2020.03.087).
  • [5] R. Li, H. Liang, Y. Shi, F. Feng, and X. Wang, “Dual-CNN: A convolutional language decoder for paragraph image captioning”, Neurocomputing, vol. 396, pp. 92–101, 2020 (DOI: 10.1016/j.neucom.2020.02.041).
  • [6] J. Guan and E. Wang, “Repeated review based image captioning for image evidence review”, Signal Processing: Image Communication, vol. 63, pp. 141–148, 2018 (DOI: 10.1016/j.image.2018.02.005).
  • [7] A. Singh, T.D. Singh, and S. Bandyopadhyay, “An encoder-decoder based framework for hindi image caption generation”, Multimed. Tools Appl 80, pp. 35721–35740, 2021 (DOI: 10.1007/s11042-021-11106- 5).
  • [8] Ph. Kinghorn, L. Zhang, and L. Shao, “A region-based image caption generator with refined descriptions”, Neurocomputing, vol. 272, pp. 416–424, 2018 (DOI: 10.1016/j.neucom.2017.07.014).
  • [9] Q. Liu, Y. Chen, J. Wang, and S. Zhang, “Multi-view pedestrian captioning with an attention topic CNN model”, Computers in Industry, vol. 97, pp. 47–53, 2018 (DOI: 10.1016/j.compind.2018.01.015).
  • [10] G. Christie, A. Laddha, A. Agrawal, S. Antol, and D. Batra, “Resolving vision and language ambiguities together: Joint segmentation & prepositional attachment resolution in captioned scenes”, Computer Vision and Image Understanding, vol. 163, pp. 101–112, 2017 (DOI: 10.1016/j.cviu.2017.09.001).
  • [11] F. Xiao, X. Gong, Y. Zhang, Y. Shen, and X. Gao, “DAA: Dual LSTMs with adaptive attention for image captioning”, Neurocomputing, vol. 364, pp. 322–329, 2019 (DOI: 10.1016/j.neucom.2019.06.085).
  • [12] G. Huang and H. Hu, “c-RNN: A Fine-Grained Language Model for Image Captioning”, Neural Process Lett, 2018 (DOI: 10.1007/s11063- 018-9836-2).
  • [13] C. Wu, Y. Wei, X. Chu, F. Su, and L. Wang, “Modeling visual and word-conditional semantic attention for image captioning”, Signal Processing: Image Communication, vol. 67, pp. 100–107, 2018 (DOI: 10.1016/j.image.2018.06.002).
  • [14] J. Yang, Y. Sun, J. Liang, B. Ren, and S. Lai, “Image captioning by incorporating affective concepts learned from both visual and textual components”, Neurocomputing, 2018 (DOI: 10.1016/j.neucom.2018.03.078).
  • [15] T. Yinghua and C.S. Chee, “Phrase-based Image Caption Generator with Hierarchical LSTM Network”, Neurocomputing, 2018 (DOI: 10.1016/j.neucom.2018.12.026).
  • [16] A. Yuan, X. Li, and X. Lu, “3G structure for image caption generation”, Neurocomputing, 2018 (DOI: 10.1016/j.neucom.2018.10.059).
  • [17] Ch. Fan, Z. Zhang, and D.J. Crandall, “Deepdiary: Lifelogging image captioning and summarization”, Journal of Visual Communication and Image Representation, vol. 55, pp. 40–55, 2018 (DOI: 10.1016/j.jvcir.2018.05.008).
  • [18] X. Chen, M. Zhang, Z. Wang, L. Zuo, and Y. Yang, “Leveraging Unpaired Out-of-Domain Data for Image Captioning”, Pattern Recognition Letters, In press, accepted manuscript, 2018 (DOI: 10.1016/j.patrec.2018.12.018).
  • [19] Z. Ye, et al., “A novel automatic image caption generation using bidirectional long-short term memory framework”, Multimed Tools Appl 80, pp. 25557–25582, 2021 (DOI: 10.1007/s11042-021-10632- 6).
  • [20] H. Zhang et al., “Novel model to integrate word embeddings and syntactic trees for automatic caption generation from images”, Soft Comput 24, pp. 1377–1397, 2020 (DOI: 10.1007/s00500-019-03973- w).
  • [21] C. Sur, “AACR: Feature Fusion Effects of Algebraic Amalgamation Composed Representation on (De)Compositional Network for Caption Generation for Images”, SN Comput. Sci. 1, 229, 2020 (DOI: 10.1007/s42979-020-00238-4).
  • [22] C. Shan, A. Gaoyun, Z. Zhenxing, and R. Qiuqi, “Interactions guided generative adversarial network for unsupervised image captioning”, Neurocomputing, vol. 417, pp. 419–431, 2020 (DOI: 10.1016/j.neucom.2020.08.019).
  • [23] Y. Wei, L. Wang, and C. Wu, “Multi-Attention Generative Adversarial Network for image captioning”, Neurocomputing, vol. 387, pp. 91–99, 2019 (DOI: 10.1016/j.neucom.2019.12.073).
  • [24] M. Yang et al., “An Ensemble of Generation- and Retrieval-Based Image Captioning With Dual Generator Generative Adversarial Network”, IEEE Transactions on Image Processing, vol. 29, pp. 9627– 9640, 2020 (DOI: 10.1109/TIP.2020.3028651).
  • [25] D. Zhao, Z. Chang, and S. Guo, “A multimodal fusion approach for image captioning”, Neurocomputing, vol. 329, pp. 476–485, 2019 (DOI: 10.1016/j.neucom.2018.11.004).
  • [26] S. Ding, S. Qu, and S. Wan, “Image caption generation with high-level image features”, Pattern Recognition Letters, vol. 123, pp. 89–95, 2019 (DOI: 10.1016/j.patrec.2019.03.021).
  • [27] S.R. Kodituwakku, “Comparison of Color Features for Image Retrieval”, Indian Journal of Computer Science and Engineering, vol. 1, no. 3, pp. 207–211 (http://www.ijcse.com/docs/IJCSE10- 01-03-06.pdf).
  • [28] https://photography.tutsplus.com/tutorials/whatis-image-sharpening--cms-26627.
  • [29] T. Bouwmans, C. Silva, C. Marghes, M.S. Zitouni, H. Bhaskar, and C. Frelicot, “On the role and the importance of features for background modeling and foreground detection”, Computer Science Review, vol. 28, pp. 26–91, 2018 (ISSN 15740137, DOI: 10.1016/j.cosrev.2018.01.004).
  • [30] https://en.wikipedia.org/wiki/Motion_analysis.
  • [31] S. Harish, G. Hazrati, and J.C. Bansal, “Spider Monkey Optimization Algorithm”, 2019 (DOI: 10.1007/978-3-319-91341-4_4).
  • [32] B.R. Rajakumar, “Impact of Static and Adaptive Mutation Techniques on Genetic Algorithm”, International Journal of Hybrid Intelligent Systems, vol. 10, no. 1, pp. 11–22, 2013 (DOI: 10.3233/HIS-120161).
  • [33] B.R. Rajakumar, “Static and Adaptive Mutation Techniques for Genetic algorithm: A Systematic Comparative Analysis”, International Journal of Computational Science and Engineering, vol. 8, no. 2, pp. 180–193, 2013 (DOI: 10.1504/IJCSE.2013.053087).
  • [34] S.M. Swamy, B.R. Rajakumar and I.R. Valarmathi, “Design of Hybrid Wind and Photovoltaic Power System using Opposition-based Genetic Algorithm with Cauchy Mutation”, IET Chennai Fourth International Conference on Sustainable Energy and Intelligent Systems (SEISCON 2013), 2013 (DOI: 10.1049/ic.2013.0361).
  • [35] A. George and B.R. Rajakumar, “APOGA: An Adaptive Population Pool Size based Genetic Algorithm”, AASRI Procedia – 2013 AASRI Conference on Intelligent Systems and Control (ISC 2013), vol. 4, pp. 288–296, 2013 (DOI: 10.1016/j.aasri.2013.10.043).
  • [36] B.R. Rajakumar and A. George, “A New Adaptive Mutation Tech74 JOURNAL OF TELECOMMUNICATIONS AND INFORMATION TECHNOLOGY 4/2022 High-level and Low-level Feature Set for Image Caption Generation with Optimized Convolutional Neural Network nique for Genetic Algorithm”, In proceedings of IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), pp. 1–7, 2012 (DOI: 10.1109/ICCIC.2012.6510293).
  • [37] M.B. Wagh and N. Gomathi, “Improved GWO-CS Algorithm-Based Optimal Routing Strategy in VANET”, Journal of Networking and Communication Systems, vol. 2, no. 1, pp. 34–42, 2019 (DOI: 10.46253/jnacs.v2i1.a4).
  • [38] S. Halbhavi, S.F. Kodad, S.K. Ambekar, and D. Manjunath, “Enhanced Invasive Weed Optimization Algorithm with Chaos Theory for Weightage based Combined Economic Emission Dispatch”, Journal of Computational Mechanics, Power System and Control, vol. 2, no. 3, pp. 19–27, 2019 (DOI: 10.46253/jcmps.v2i3.a3).
  • [39] A.N. Jadhav and N. Gomathi, “DIGWO: Hybridization of Dragonfly Algorithm with Improved Grey Wolf Optimization Algorithm for Data Clustering”, Multimedia Research, vol. 2, no. 3, pp. 1–11, 2019 (DOI: 10.46253/j.mr.v2i3.a1).
  • [40] https://www.kaggle.com/ming666/flicker8k-dataset.
  • [41] D. Songtao, et al., “Image caption generation with high-level image features”, Pattern Recognition Letters 123, pp. 89–95, 2019 (DOI: 10.1016/j.patrec.2019.03.021).
Uwagi
Opracowanie rekordu ze środków MEiN, umowa nr SONP/SP/546092/2022 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2022-2023).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-5cea2132-e4ef-4ca9-b0d8-ce1d99d44f88
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.