Detecting objects using Rolling Convolution and Recurrent Neural Network

Huang, WenQing; Huang, MingZhu; Wang, YaMing

doi:10.24425/ijet.2019.126313

Artykuł - szczegóły

Tytuł artykułu

Detecting objects using Rolling Convolution and Recurrent Neural Network

Autorzy

Huang WenQing , Huang MingZhu , Wang YaMing

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.24425/ijet.2019.126313

Warianty tytułu

Języki publikacji

Abstrakty

At present, most of the existing target detection algorithms use the method of region proposal to search for the target in the image. The most effective regional proposal method usually requires thousands of target prediction areas to achieve high recall rate.This lowers the detection efficiency. Even though recent region proposal network approach have yielded good results by using hundreds of proposals, it still faces the challenge when applied to small objects and precise locations. This is mainly because these approaches use coarse feature. Therefore, we propose a new method for extracting more efficient global features and multi-scale features to provide target detection performance. Given that feature maps under continuous convolution lose the resolution required to detect small objects when obtaining deeper semantic information; hence, we use rolling convolution (RC) to maintain the high resolution of low-level feature maps to explore objects in greater detail, even if there is no structure dedicated to combining the features of multiple convolutional layers. Furthermore, we use a recurrent neural network of multiple gated recurrent units (GRUs) at the top of the convolutional layer to highlight useful global context locations for assisting in the detection of objects. Through experiments in the benchmark data set, our proposed method achieved 78.2% mAP in PASCAL VOC 2007 and 72.3% mAP in PASCAL VOC 2012 dataset. It has been verified through many experiments that this method has reached a more advanced level of detection.

Słowa kluczowe

multi-scale features global context information rolling convolution recurrent neural network

Wydawca

Polish Academy of Sciences, Committee of Electronics and Telecommunication

Czasopismo

International Journal of Electronics and Telecommunications

Rocznik

2019

Tom

Vol. 65, No. 2

Strony

293--301

Opis fizyczny

Bibliogr. 30 poz., rys., tab., fot.

Twórcy

autor

Huang WenQing

patternrecog@163.com

School of Information Science and Technology, Zhejiang Sci-Tech University, Hangzhou, China

autor

Huang MingZhu

851489278@qq.com

School of Information Science and Technology, Zhejiang Sci-Tech University, Hangzhou, China

autor

Wang YaMing

yamingwang2000@163.com

School of Information Science and Technology, Zhejiang Sci-Tech University, Hangzhou

Bibliografia

[1] WoHler C, Anlauf J K. An adaptable time-delay neural-network algorithm for image sequence analysis[J]. IEEE Transactions on Neural Networks, 1999, 10(6):1531-1536.
[2] Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]// Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. IEEE, 2005:886-893.
[3] Laptev I. Improvements of Object Detection Using Boosted Histograms[C]// British Machine Vision Conference 2006, Edinburgh, Uk, September. DBLP, 2006:949-958.
[4] Shet V D, Neumann J, Ramesh V, et al. Bilattice-based Logical Reasoning for Human Detection[C]// Computer Vision and Pattern Recognition, 2007. CVPR ’07. IEEE Conference on. IEEE, 2007:1-8.
[5] Zhang L, Wu B, Nevatia R. Detection and Tracking of Multiple Humans with Extensive Pose Articulation[C]// IEEE, International Conference on Computer Vision. IEEE, 2007:1-8.
[6] Azizpour H, Laptev I. Object Detection Using Strongly-Supervised Deformable Part Models[M]// Computer Vision ECCV 2012. Springer Berlin Heidelberg, 2012:836-849.
[7] Dalal N, Triggs B, Schmid C. Human detection using oriented histograms of flow and appearance[J]. 2006, 3952:428-441.
[8] Dollar P, Wojek C, Schiele B, et al. Pedestrian Detection: An Evaluation of the State of the Art[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(4):743.
[9] Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: Neural Information Processing Systems. (2012)11061114
[10] Lin, M., Chen, Q., Yan, S.: Network in network. In: International Conference on Learning Representations. (2014)
[11] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D.,Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. arXiv preprintarXiv:1409.4842 (2014)
[12] Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scaleimage recognition. arXiv preprint arXiv:1409.1556 (2014)
[13] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition.arXiv preprint arXiv:1512.03385 (2015)
[14] Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. (2014) 580587
[15] Uijlings, J., van de Sande, K., Gevers, T., Smeulders, A.: Selective search for object recognition. International Journal on Computer Vision 104(2) (2013) 154171
[16] Zitnick, C.L., Doll ar, P.: Edge boxes: Locating object proposals from edges. In:European Conference on Computer Vision. (2014) 391405
[17] Arbelez, P., Pont-Tuset, J., Barron, J., Marques, F., Malik, J.: Multiscale combinatorial grouping. In: IEEE Conference on Computer Vision and Pattern Recognition. (2014) 328335
[18] Girshick, R.: Fast r-cnn. In: IEEE Conference on Computer Vision and Pattern Recognition. (2015) 14401448
[19] Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. In: Neural Information Processing Systems.(2015) 9199
[20] Liang, X., Wei, Y., Shen, X., Jie, Z., Feng, J., Lin, L., Yan, S.: Reversible recursiveinstance-level object segmentation. arXiv preprint arXiv:1511.04517 (2015)
[21] Zeng, X., Ouyang, W., Wang, X.: Window-object relationship guided representation learning for generic object detections. arXiv preprint arXiv:1512.02736 (2015)
[22] Gidaris, S., Komodakis, N.: Object detection via a multi-region and semanticsegmentation-aware cnn model. In: IEEE International Conference on ComputerVision. (2015) 11341142
[23] Long, J., Shelhamer, E., Darrell, T. Fully convolutionalnetworks for semantic segmentation. In CVPR, 2015.
[24] Hariharan, B., Arbelez, P., Girshick, R., Malik, J. Hypercolumns for object segmentation and fine-grained localization. In CVPR, 2015.
[25] Kong, T., Yao, A., Chen, Y., Sun, F. Hypernet: Towards accurate region proposal generation and joint object detection. In CVPR, 2016.
[26] Liu, W., Rabinovich, A., Berg, A.C. ParseNet: Lookingwider to see better. In ICLR workshop, 2016.
[27] Bell, S., Zitnick, C.L., Bala, K., Girshick, R. Inside-outside net: Detecting objects in context with skip poolingand recurrent neural networks. In CVPR, 2016.
[28] Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N. A unified multi-scale deep convolutional neural network for fast object detection. In ECCV, 2016.
[29] Li J, Wei Y, Liang X, et al. Attentive Contexts for Object Detection[J]. IEEE Transactions on Multimedia, 2017, 19(5):944-954.
[30] Stewart, R., Andriluka, M. End-to-end people detection in crowded scenes. arXiv preprint arXiv:1506.04878 (2015).

Uwagi

1. Opracowanie rekordu w ramach umowy 509/P-DUN/2018 ze środków MNiSW przeznaczonych na działalność upowszechniającą naukę (2019).

2. This work was supported by the Natural Science Foundation of Zhejiang Province (LZ15F020004), the Natural Science Foundation of National(61272311) and 521 Project of Zhejiang Sci-Tech University.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-a71f6b6a-e140-4c0c-8ba2-18a54c673f9b