Automatic creation of image descriptions, i.e. cap- tioning of images, is an important topic in artificial intelligence (AI) that bridges the gap between computer vision (CV) and natural language processing (NLP). Currently, neural networks are becoming increasingly popular in captioning images and researchers are looking for more efficient models for CV and sequence-sequence systems. This study focuses on a new image caption generation model that is divided into two stages. Ini- tially, low-level features, such as contrast, sharpness, color and their high-level counterparts, such as motion and facial impact score, are extracted. Then, an optimized convolutional neural network (CNN) is harnessed to generate the captions from im- ages. To enhance the accuracy of the process, the weights of CNN are optimally tuned via spider monkey optimization with sine chaotic map evaluation (SMO-SCME). The development of the proposed method is evaluated with a diversity of metrics.
2
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
Image captioning is the task of generating semantically and grammatically correct caption for a given image. Captioning model usually has an encoder-decoder structure where encoded image is decoded as list of words being a consecutive elements of the descriptive sentence. In this work, we investigate how encoding of the input image and way of coding words affects the result of the training of the encoder-decoder captioning model. We performed experiments with image encoding using 10 all-purpose popular backbones and 2 types of word embeddings. We compared those models using most popular image captioning evaluation metrics. Our research shows that the model's performance highly depends on the optimal combination of the neural image feature extractor and language processing model. The outcome of our research are applicable in all the research works that lead to the developing the optimal encoder- decoder image captioning model.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.