A hierarchical inferential method for indoor scene classification

Jiang, J.; Liu, P.; Ye, Z.; Zhao, W.; Tang, X.

doi:10.1515/amcs-2017-0059

Powiadomienia systemowe

Sesja wygasła!
Sesja wygasła!

Artykuł - szczegóły

Tytuł artykułu

A hierarchical inferential method for indoor scene classification

Autorzy

Jiang J. , Liu P. , Ye Z. , Zhao W. , Tang X.

Treść / Zawartość

Pełne teksty:

14_jiang_liu_ye_zhao_tang_a_hierarchical_inferential_2017_4.pdf

Pobierz

Identyfikatory

DOI

10.1515/amcs-2017-0059

Warianty tytułu

Języki publikacji

Abstrakty

Indoor scene classification forms a basis for scene interaction for service robots. The task is challenging because the layout and decoration of a scene vary considerably. Previous studies on knowledge-based methods commonly ignore the importance of visual attributes when constructing the knowledge base. These shortcomings restrict the performance of classification. The structure of a semantic hierarchy was proposed to describe similarities of different parts of scenes in a fine-grained way. Besides the commonly used semantic features, visual attributes were also introduced to construct the knowledge base. Inspired by the processes of human cognition and the characteristics of indoor scenes, we proposed an inferential framework based on the Markov logic network. The framework is evaluated on a popular indoor scene dataset, and the experimental results demonstrate its effectiveness.

Słowa kluczowe

indoor scene classification semantic hierarchical structure rule based inference Markov logic network

struktura hierarchiczna regułowy system wnioskowania sieć logiczna Markova

Wydawca

Oficyna Wydawnicza Uniwersytetu Zielonogórskiego

Czasopismo

International Journal of Applied Mathematics and Computer Science

Rocznik

2017

Tom

Vol. 27, no. 4

Strony

839--852

Opis fizyczny

Bibliogr. 77 poz., rys., tab., wykr.

Twórcy

autor

Jiang J.

School of Computer Science and Technology, Harbin Institute of Technology, No. 92 West Dazhi Street, Harbin, China

autor

Liu P.

School of Computer Science and Technology, Harbin Institute of Technology, No. 92 West Dazhi Street, Harbin, China

autor

Ye Z.

School of Computer Science and Technology, Harbin Institute of Technology, No. 92 West Dazhi Street, Harbin, China

autor

Zhao W.

zhaowei@hit.edu.cn

School of Computer Science and Technology, Harbin Institute of Technology, No. 92 West Dazhi Street, Harbin, China

autor

Tang X.

School of Computer Science and Technology, Harbin Institute of Technology, No. 92 West Dazhi Street, Harbin, China

Bibliografia

[1] Alleysson, D., Susstrunk, S. and Herault, J. (2005). Linear demosaicing inspired by the human visual system, IEEE Transactions on Image Processing 14(4): 439–449.
[2] Banerji, S., Sinha, A. and Liu, C. (2013). New image descriptors based on color, texture, shape, and wavelets for object and scene image classification, Neurocomputing 117(0): 173–185.
[3] Bannour, H. and Hudelot, C. (2012a). Building Semantic Hierarchies Faithful to Image Semantics, Lecture Notes in Computer Science, Vol. 7131, Springer, Berlin/Heidelberg, pp. 4–15.
[4] Bannour, H. and Hudelot, C. (2012b). Hierarchical image annotation using semantic hierarchies, Proceedings of the 21st ACM International Conference on Information and Knowledge Management, Maui, HI, USA, pp. 2431–2434.
[5] Bell, S., Lawrence Zitnick, C., Bala, K. and Girshick, R. (2016). Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 2874–2883.
[6] Bottou, L. (2013). From machine learning to machine reasoning, Machine Learning 94(2): 133–149.
[7] Carneiro, G., Chan, A.B., Moreno, P.J. and Vasconcelos, N. (2007). Supervised learning of semantic classes for image annotation and retrieval, IEEE Transactions on Pattern Analysis and Machine Intelligence 29(3): 394–410.
[8] Chaojie,W., Jun, Y. and Dapeng, T. (2013). High-level attributes modeling for indoor scenes classification, Neurocomputing 121: 337–343.
[9] Chaves, R., Ramrez, J., Grriz, J. and Illn, I. (2012). Functional brain image classification using association rules defined over discriminant regions, Pattern Recognition Letters 33(12): 1666–1672.
[10] Csurka, G., Dance, C., Fan, L., Willamowski, J. and Bray, C. (2004). Visual categorization with bags of keypoints, Workshop on Statistical Learning in Computer Vision, Prague, Czech Republic, Vol. 1, pp. 1–2.
[11] Delaigle, J., Devleeschouwer, C., Macq, B. and Langendijk, L. (2002). Human visual system features enabling watermarking, 2002 IEEE International Conference on Multimedia and Expo. ICME ’02, Los Angeles, CA, USA, Vol. 2, pp. 489–492.
[12] Deng, J., Berg, A.C. and Fei-Fei, L. (2011). Hierarchical semantic indexing for large scale image retrieval, 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Denver, CO, USA, pp. 785–792.
[13] Dixit, M., Chen, S., Gao, D., Rasiwasia, N. and Vasconcelos, N. (2015). Scene classification with semantic fisher vectors, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, pp. 2974–2983.
[14] Escobar, M.-J. and Kornprobst, P. (2012). Action recognition via bio-inspired features: The richness of center–surround interaction, Computer Vision and Image Understanding 116(5): 593–605.
[15] Farhadi, A., Endres, I., Hoiem, D. and Forsyth, D. (2009). Describing objects by their attributes, IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, Miami, FL, USA, pp. 1778–1785.
[16] Faria, D.R., Trindade, P., Lobo, J. and Dias, J. (2014). Knowledge-based reasoning from human grasp demonstrations for robot grasp synthesis, Robotics and Autonomous Systems 62(6): 794–817.
[17] Fei-Fei, L. and Perona, P. (2005). A Bayesian hierarchical model for learning natural scene categories, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, San Diego, CA, USA, Vol. 2, pp. 524–531.
[18] Felzenszwalb, P.F. and McAllester, D. (2007). The generalized a* architecture, Journal of Artificial Intelligence Research pp. 153–190.
[19] Felzenszwalb, P., Girshick, R. and McAllester, D. (2010a). Cascade object detection with deformable part models, 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, pp. 2241–2248.
[20] Felzenszwalb, P., Girshick, R., McAllester, D. and Ramanan, D. (2010b). Object detection with discriminatively trained part-based models, IEEE Transactions on Pattern Analysis and Machine Intelligence 32(9): 1627–1645.
[21] Felzenszwalb, P., McAllester, D. and Ramanan, D. (2008). A discriminatively trained, multiscale, deformable part model, IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, Anchorage, AK, USA, pp. 1–8.
[22] Feng, Q., Yuan, C., Pan, J.S., Yang, J.F., Chou, Y.T., Zhou, Y. and Li, W. (2017). Superimposed sparse parameter classifiers for face recognition, IEEE Transactions on Cybernetics 47(2): 378–390.
[23] Feng, Q. and Zhou, Y. (2016). Kernel regularized data uncertainty for action recognition, IEEE Transactions on Circuits and Systems for Video Technology PP(99): 1–1.
[24] Feng, Q., Zhou, Y. and Lan, R. (2016). Pairwise linear regression classification for image set retrieval, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 4865–4872.
[25] Girshick, R.B., Felzenszwalb, P.F. and McAllester, D.A. (2011). Object detection with grammar models, in J. Shawe-Taylor et al. (Eds.), Advances in Neural Information Processing Systems 24, Curran Associates, Inc., Granada, pp. 442–450.
[26] Gupta, P., Arrabolu, S.S., Brown, M. and Savarese, S. (2009). Video scene categorization by 3D hierarchical histogram matching, IEEE 12th International Conference on Computer Vision, Kyoto, Japan, pp. 1655–1662.
[27] Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P. and Witten, I.H. (2009). The Weka data mining software: An update, ACM SIGKDD Explorations Newsletter 11(1): 10–18.
[28] He, K., Zhang, X., Ren, S. and Sun, J. (2016). Deep residual learning for image recognition, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 770–778.
[29] Hoiem, D., Efros, A.A. and Hebert, M. (2005). Automatic photo pop-up, ACM SIGGRAPH 2005, Los Angeles, CA, USA, pp. 577–584.
[30] Hosang, J., Benenson, R., Doll´ar, P. and Schiele, B. (2016). What makes for effective detection proposals?, IEEE Transactions on Pattern Analysis and Machine Intelligence 38(4): 814–830.
[31] Huang, K., Tao, D., Yuan, Y., Li, X. and Tan, T. (2011). Biologically inspired features for scene classification in video surveillance, IEEE Transactions on Systems, Man, and Cybernetics B: Cybernetics 41(1): 307–313.
[32] jia Li, L., Su, H., Fei-fei, L. and Xing, E.P. (2010). Object bank: A high-level image representation for scene classification and semantic feature sparsification, in J. Lafferty et al. (Eds.), Advances in Neural Information Processing Systems 23, Curran Associates, Inc., Cambridge, pp. 1378–1386.
[33] Kembhavi, A., Yeh, T. and Davis, L.S. (2010). Why did the person cross the road (there)? Scene understanding using probabilistic logic models and common sense reasoning, in K. Daniilidis et al. (Eds.), Computer Vision—ECCV 2010: 11th European Conference on Computer Vision, Part II, Springer, Berlin/Heidelberg, pp. 693–706.
[34] Khan, S., Bennamoun, M., Sohel, F. and Togneri, R. (2014). Geometry Driven Semantic Labeling of Indoor Scenes, Lecture Notes in Computer Science, Vol. 8689, Springer International Publishing, Berlin, pp. 679–694.
[35] Kong, T., Yao, A., Chen, Y. and Sun, F. (2016). Hypernet: Towards accurate region proposal generation and joint object detection, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 845–853.
[36] Lazebnik, S., Schmid, C. and Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, New York, NY, USA, Vol. 2, pp. 2169–2178.
[37] Li-Jia, L., Chong, W., Yongwhan, L., Blei, D.M. and Li, F.-F. (2010). Building and using a semantivisual image hierarchy, 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, pp. 3336–3343.
[38] Li, L.-J., Su, H., Lim, Y. and Fei-Fei, L. (2014). Object bank: An object-level image representation for high-level visual recognition, International Journal of Computer Vision 107(1): 20–39.
[39] Lin, D., Lu, C., Liao, R. and Jia, J. (2014). Learning important spatial pooling regions for scene classification, 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, pp. 3726–3733.
[40] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y. and Berg, A.C. (2016). SSD: Single Shot Multi-Box Detector, Springer International Publishing, Cham, pp. 21–37.
[41] Liu, Z. and von Wichert, G. (2013). Applying rule-based context knowledge to build abstract semantic maps of indoor environments, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Tokyo, Japan, pp. 5141–5147.
[42] Lorenza Saitta, J.-D.Z. (2013). Abstraction in Artificial Intelligence and Complex Systems, Springer, New York, NY.
[43] Marszalek, M. and Schmid, C. (2007). Semantic hierarchies for visual object recognition, IEEE Conference on Computer Vision and Pattern Recognition, CVPR’07, Minneapolis, MN, USA, pp. 1–7.
[44] MIT (n.d.) Indoor scene recognition. Dataset, http://web.mit.edu/torralba/www/indoor.html.
[45] Mottaghi, R., Fidler, S., Yao, J., Urtasun, R. and Parikh, D. (2013). Analyzing semantic segmentation using hybrid human-machine CRFS, 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA, pp. 3143–3150.
[46] Neville, J. and Jensen, D. (2007). Relational dependency networks, Journal of Machine Learning Research 8: 653–692.
[47] Nguyen, D.T., Ogunbona, P.O. and Li, W. (2013). A novel shape-based non-redundant local binary pattern descriptor for object detection, Pattern Recognition 46(5): 1485–1500.
[48] Penatti, O.A., Silva, F.B., Valle, E., Gouet-Brunet, V. and Torres, R.d.S. (2014). Visual word spatial arrangement for image retrieval and classification, Pattern Recognition 47(2): 705–720.
[49] Porway, J., Wang, Q. and Zhu, S.C. (2010). A hierarchical and contextual model for aerial image parsing, International Journal of Computer Vision 88(2): 254–283.
[50] Quattoni, A. and Torralba, A. (2009). Recognizing indoor scenes, IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, Miami, FL, USA, pp. 413–420.
[51] Ren, X. and Ramanan, D. (2013). Histograms of sparse codes for object detection, 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, pp. 3246–3253.
[52] Ribeiro, M.X., Bugatti, P.H., Traina Jr, C., Marques, P.M.A., Rosa, N.A. and Traina, A.J.M. (2009). Supporting content-based image retrieval and computer-aided diagnosis systems with association rule-based techniques, Data and Knowledge Engineering 68(12): 1370–1382.
[53] Richardson, M. and Domingos, P. (2006). Markov logic networks, Machine Learning 62(1): 107–136.
[54] Rigamonti, R., Brown, M.A. and Lepetit, V. (2011). Are sparse representations really relevant for image classification?, 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, CO, USA, pp. 1545–1552.
[55] Rigamonti, R., Sironi, A., Lepetit, V. and Fua, P. (2013). Learning separable filters, 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA, pp. 2754–2761.
[56] Sadovnik, A. and Chen, T. (2011). Pictorial structures for object recognition and part labeling in drawings, 18th IEEE International Conference on Image Processing (ICIP), Brussels, Belgium, pp. 3613–3616.
[57] Sharif Razavian, A., Azizpour, H., Sullivan, J. and Carlsson, S. (2014). CNN features off-the-shelf: An astounding baseline for recognition, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, pp. 806–813.
[58] Shotton, J., Blake, A. and Cipolla, R. (2005). Contour-based learning for object detection, 10th IEEE International Conference on Computer Vision, ICCV 2005, Beijing, China, Vol. 1, pp. 503–510.
[59] Siagian, C. and Itti, L. (2007). Rapid biologically-inspired scene classification using features shared with visual attention, IEEE Transactions on Pattern Analysis and Machine Intelligence 29(2): 300–312.
[60] Singla, P. and Domingos, P. (2006). Entity resolution with Markov logic, 6th International Conference on Data Mining, ICDM’06, Hong Kong, China, pp. 572–582.
[61] Tang, J., Zha, Z.-J., Tao, D. and Chua, T.-S. (2012). Semantic-gap-oriented active learning for multilabel image annotation, IEEE Transactions on Image Processing 21(4): 2354–2360.
[62] Tang, T. and Qiao, H. (2014). Improving invariance in visual classification with biologically inspired mechanism, Neurocomputing 133(8): 328–341.
[63] Teo, C.L., Fermller, C. and Aloimonos, Y. (2015). A Gestaltist approach to contour-based object recognition: Combining bottom-up and top-down cues, International Journal of Robotics Research 34(4-5): 627–652.
[64] Vondrick, C., Khosla, A., Malisiewicz, T. and Torralba, A. (2013). HOGgles: Visualizing object detection features, IEEE International Conference on Computer Vision, Sydney, Australia, pp. 1–8.
[65] Welter, P., Riesmeier, J., Fischer, B., Grouls, C., Kuhl, C. and Deserno (né Lehmann), T.M. (2011). Bridging the integration gap between imaging and information systems: A uniform data concept for content-based image retrieval in computer-aided diagnosis, Journal of the American Medical Informatics Association 18(4): 506–510.
[66] Xie, L., Tian, Q., Wang, M. and Zhang, B. (2014a). Spatial pooling of heterogeneous features for image classification, IEEE Transactions on Image Processing 23(5): 1994–2008.
[67] Xie, L., Wang, J., Guo, B., Zhang, B. and Tian, Q. (2014b). Orientational pyramid matching for recognizing indoor scenes, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, pp. 3734–3741.
[68] Xu, M. and Petrou, M. (2010). Learning logic rules for scene interpretation based on Markov logic networks, ACCV 9th Asian Conference on Computer Vision, Xi’an, China, pp. 341–350.
[69] Xu, M., Petrou, M. and Lu, J. (2011). Learning logic rules for the tower of knowledge using Markov logic networks, International Journal of Pattern Recognition and Artificial Intelligence 25(06): 889–907.
[70] Ye, Z., Liu, P., Zhao, W. and Tang, X. (2015). Cognition inspired framework for indoor scene annotation, Journal of Electronic Imaging 24(5): 053013.
[71] Yu, J., Rui, Y., Tang, Y.Y. and Tao, D. (2014). High-order distance-based multiview stochastic learning in image classification, IEEE Transactions on Cybernetics 44(12): 2431–2442.
[72] Yu, J., Tao, D., Rui, Y. and Cheng, J. (2013). Pairwise constraints based multiview features fusion for scene classification, Pattern Recognition 46(2): 483–496.
[73] Yu, J., Tao, D. and Wang, M. (2012a). Adaptive hypergraph learning and its application in image classification, IEEE Transactions on Image Processing 21(7): 3262–3272.
[74] Yu, J., Wang, M. and Tao, D. (2012b). Semisupervised multiview distance metric learning for cartoon synthesis, IEEE Transactions on Image Processing 21(11): 4636–4648.
[75] Zhang, C., Liu, J., Tian, Q., Liang, C. and Huang, Q. (2013). Beyond visual features: A weak semantic image representation using exemplar classifiers for classification, Neurocomputing 120(0): 318–324.
[76] Zhou, L., Zhou, Z. and Hu, D. (2013). Scene classification using a multi-resolution bag-of-features model, Pattern Recognition 46(1): 424–433.
[77] Zhu, Y., Fathi, A. and Fei-Fei, L. (2014). Reasoning about object affordances in a knowledge base representation, in D. Fleet et al. (Eds.), Computer Vision ECCV 2014, Lecture Notes in Computer Science, Vol. 8690, Springer International Publishing, Zurich, pp. 408–424.

Uwagi

Opracowanie ze środków MNiSW w ramach umowy 812/P-DUN/2016 na działalność upowszechniającą naukę (zadania 2017).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-7d6fbd7c-5e3c-49e1-bd8f-9ad88c86d28b