Utilizing relevant RGB-D data to help recognize RGB images in the target domain

Gao, Depeng; Liu, Jiafeng; Wu, Rui; Cheng, Dansong; Fan, Xiaopeng; Tang, Xianglong

doi:10.2478/amcs-2019-0045

Artykuł - szczegóły

Tytuł artykułu

Utilizing relevant RGB-D data to help recognize RGB images in the target domain

Autorzy

Gao Depeng , Liu Jiafeng , Wu Rui , Cheng Dansong , Fan Xiaopeng , Tang Xianglong

Treść / Zawartość

Pełne teksty:

15_gao_liu_wu_utilizing_relevant_rgb_d_data_to_help_2019_3.pdf

Pobierz

Identyfikatory

DOI

10.2478/amcs-2019-0045

Warianty tytułu

Języki publikacji

Abstrakty

With the advent of 3D cameras, getting depth information along with RGB images has been facilitated, which is helpful in various computer vision tasks. However, there are two challenges in using these RGB-D images to help recognize RGB images captured by conventional cameras: one is that the depth images are missing at the testing stage, the other is that the training and test data are drawn from different distributions as they are captured using different equipment. To jointly address the two challenges, we propose an asymmetrical transfer learning framework, wherein three classifiers are trained using the RGB and depth images in the source domain and RGB images in the target domain with a structural risk minimization criterion and regularization theory. A cross-modality co-regularizer is used to restrict the two-source classifier in a consistent manner to increase accuracy. Moreover, an L_2,1 norm cross-domain co-regularizer is used to magnify significant visual features and inhibit insignificant ones in the weight vectors of the two RGB classifiers. Thus, using the cross-modality and cross-domain co-regularizer, the knowledge of RGB-D images in the source domain is transferred to the target domain to improve the target classifier. The results of the experiment show that the proposed method is one of the most effective ones.

Słowa kluczowe

object recognition RGB-D image transfer learning privileged information

rozpoznawanie obiektu obraz RGB-D uczenie maszynowe informacja poufna

Wydawca

Oficyna Wydawnicza Uniwersytetu Zielonogórskiego

Czasopismo

International Journal of Applied Mathematics and Computer Science

Rocznik

2019

Tom

Vol. 29, no. 3

Strony

611--621

Opis fizyczny

Bibliogr. 41 poz., rys., tab., wykr.

Twórcy

autor

Gao Depeng

School of Computer Science and Technology, Harbin Institute of Technology, No. 92 Xidazhi Street, Harbin, China

autor

Liu Jiafeng

School of Computer Science and Technology, Harbin Institute of Technology, No. 92 Xidazhi Street, Harbin, China

autor

Wu Rui

School of Computer Science and Technology, Harbin Institute of Technology, No. 92 Xidazhi Street, Harbin, China

autor

Cheng Dansong

School of Computer Science and Technology, Harbin Institute of Technology, No. 92 Xidazhi Street, Harbin, China

autor

Fan Xiaopeng

School of Computer Science and Technology, Harbin Institute of Technology, No. 92 Xidazhi Street, Harbin, China

autor

Tang Xianglong

tangxl@hit.edu.cn

School of Computer Science and Technology, Harbin Institute of Technology, No. 92 Xidazhi Street, Harbin, China

Bibliografia

[1] Argyriou, A., Evgeniou, T. and Pontil, M. (2008). Convex multi-task feature learning, Machine Learning 73(3): 243–272.
[2] Axler, S. (1997). Linear Algebra Done Right, Undergraduate Texts in Mathematics, Vol. 2, Springer, New York, NY.
[3] Belkin, M., Niyogi, P. and Sindhwani, V. (2006). Manifold regularization: A geometric framework for learning from labeled and unlabeled examples, Journal of Machine Learning and Research 7: 2399–2434.
[4] Bo, L., Ren, X. and Fox, D. (2013). Multipath sparse coding using hierarchical matching pursuit, 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, pp. 660–667.
[5] Chen, L., Li, W. and Xu, D. (2014). Recognizing RGB images by learning from RGB-D data, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, pp. 1418–1425.
[6] Dai, W., Yang, Q., Xue, G.R. and Yu, Y. (2007). Boosting for transfer learning, International Conference on Machine Learning, Corvallis, FL, USA, pp. 193–200.
[7] Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K. and Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database, IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, Miami, FL, USA, pp. 248–255.
[8] Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E. and Darrell, T. (2013). DeCAF: A deep convolutional activation feature for generic visual recognition, Proceedings of the 31st International Conference on Machine Learning, Beijing, China, pp. 647–655.
[9] Evgeniou, T. and Pontil, M. (2004). Regularized multi-task learning, 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA, pp. 109–117.
[10] Feyereisl, J. and Aickelin, U. (2012). Privileged information for data clustering, Information Sciences 194: 4–23.
[11] Fouad, S., Tino, P., Raychaudhury, S. and Schneider, P. (2013). Incorporating privileged information through metric learning, IEEE Transactions on Neural Networks and Learning Systems 24(7): 1086–1098.
[12] Gehler, P.V. and Nowozin, S. (2009). Let the kernel figure it out: Principled learning of pre-processing for kernel classifiers, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, pp. 2836–2843.
[13] Goswami, G., Vatsa, M. and Singh, R. (2014). RGB-D face recognition with texture and attribute features, IEEE Transactions on Information Forensics and Security 9(10): 1629–1640.
[14] Griffin, G., Holub, A. and Perona, P. (2007). Caltech-256 object category dataset, California Institute of Technology, Pasadena, CA.
[15] Hadfield, S. and Bowden, R. (2013). Hollywood 3D: Recognizing actions in 3D natural scenes, IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, pp. 3398–3405.
[16] Huynh, T., Min, R. and Dugelay, J.L. (2012). An efficient LBP-based descriptor for facial depth images applied to gender recognition using RGB-D face data, Proceedings of the Asian Conference on Computer Vision, Tokyo, Japan, pp. 133–145.
[17] Janoch, A., Karayev, S., Jia, Y., Barron, J.T., Fritz, M., Saenko, K. and Darrell, T. (2013). A category-level 3d object dataset: Putting the kinect to work, in A. Fossati et al. (Eds), Consumer Depth Cameras for Computer Vision, Springer, London, pp. 141–165.
[18] Jiang, J. and Zhai, C.X. (2007). Instance weighting for domain adaptation in NLP, Meeting of the Association of Computational Linguistics, Prague, Czech Republic, pp. 264–271.
[19] Kovashka, A. and Grauman, K. (2010). Learning a hierarchy of discriminative space-time neighborhood features for human action recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, pp. 2046–2053.
[20] Kulis, B., Saenko, K. and Darrell, T. (2011). What you saw is not what you get: Domain adaptation using asymmetric kernel transforms, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, USA, pp. 1785–1792.
[21] Lai, K., Bo, L., Ren, X. and Fox, D. (2011). A large-scale hierarchical multi-view RGB-D object dataset, 2011 IEEE International Conference on Robotics and Automation, Shanghai, China, pp. 1817–1824.
[22] LeCun, Y., Bengio, Y. and Hinton, G. (2015). Deep learning, Nature 521(7553): 436–444.
[23] Li, W., Chen, L., Xu, D. and Gool, L.V. (2018). Visual recognition in RGB images and videos by learning from RGB-D data, IEEE Transactions on Pattern Analysis and Machine Intelligence PP(99): 1–1.
[24] Li, W., Duan, L., Xu, D. and Tsang, I.W. (2014). Learning with augmented features for supervised and semi-supervised heterogeneous domain adaptation, IEEE Transactions on Pattern Analysis and Machine Intelligence 36(6): 1134–1148.
[25] Li, X., Fang, M., Zhang, J.-J. and Wu, J. (2017). Domain adaptation from RGB-D to RGB images, Signal Processing 131: 27–35.
[26] Liu, J., Ji, S. and Ye, J. (2009). Multi-task feature learning via efficient l2,1-norm minimization, Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, Montreal, Canada, pp. 339–348.
[27] Long, M., Wang, J., Ding, G., Pan, S.J. and Yu, P.S. (2014). Adaptation regularization: A general framework for transfer learning, IEEE Transactions on Knowledge and Data Engineering 26(5): 1076–1089.
[28] Mihalkova, L., Huynh, T. and Mooney, R.J. (2007). Mapping and revising Markov logic networks for transfer learning, Proceedings of the 22nd AAAI Conference on Artificial Intelligence, Vancouver, Canada, pp. 608–614.
[29] Motiian, S. and Doretto, G. (2016). Information bottleneck domain adaptation with privileged information for visual recognition, Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, pp. 630–647.
[30] Motiian, S., Piccirilli, M., Adjeroh, D.A. and Doretto, G. (2016). Information bottleneck learning using privileged information for visual recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, pp. 1496–1505.
[31] Nuricumbo, J.R., Ali, H., Mrton, Z.C. and Grzegorzek, M. (2015). Improving object classification robustness in RGB-D using adaptive SVMS, Multimedia Tools and Applications 75(12): 1–19.
[32] Pan, S.J. and Yang, Q. (2010). A survey on transfer learning, IEEE Transactions on Knowledge and Data Engineering 22(10): 1345–1359.
[33] Saenko, K., Kulis, B., Fritz, M. and Darrell, T. (2010). Adapting Visual Category Models to New Domains, Springer, Berlin/Heidelberg.
[34] Sharmanska, V., Quadrianto, N. and Lampert, C.H. (2013). Learning to rank using privileged information, Proceedings of the IEEE International Conference on Computer Vision, Portland, OR, USA, pp. 825–832.
[35] Sun, S. (2013). A survey of multi-view machine learning, Neural Computing and Applications 23(7–8): 2031–2038.
[36] Vapnik, V. and Vashist, A. (2009). A new learning paradigm: Learning using privileged information, Neural Networks 22(5): 544–557.
[37] Weiss, K., Khoshgoftaar, T.M. and Wang, D. (2016). A survey of transfer learning, Journal of Big Data 3(1): 9.
[38] Xiao, Y., Wu, S.Y. and He, B.S. (2013). A proximal alternating direction method for l2,1-norm least squares problem in multi-task feature learning, Journal of Industrial and Management Optimization 8(4): 1057–1069.
[39] Xu, Y., Pan, S.J., Xiong, H., Wu, Q., Luo, R., Min, H. and Song, H. (2017). A unified framework for metric transfer learning, IEEE Transactions on Knowledge and Data Engineering 29(6): 1158–1171.
[40] Yang, J., Yan, R. and Hauptmann, A.G. (2007). Cross-domain video concept detection using adaptive SVMS, Proceedings of the 15th ACM International Conference on Multimedia, Augsburg, Germany, pp. 188–197.
[41] Yu, K. and Fu, Y. (2016). Discriminative relational representation learning for RGB-D action recognition, IEEE Transactions on Image Processing 25(6): 2856–2865.

Uwagi

Opracowanie rekordu w ramach umowy 509/P-DUN/2018 ze środków MNiSW przeznaczonych na działalność upowszechniającą naukę (2019).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-461a9f37-2097-44ee-85b7-7e9274d9a55c