PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Enhancing Biometric Security with Bimodal Deep Learning and Feature-level Fusion of Facial and Voice Data

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Recent research in biometric technologies underscores the benefits of multimodal systems that use multiple traits to enhance security by complicating the replication of samples from genuine users. To address this, we present a bimodal deep learning network (BDLN or BNet) that integrates facial and voice modalities. Voice features are extracted using the Sinc- Net architecture, and facial image features are obtained from convolutional layers. Proposed network fuses these feature vectors using either averaging or concatenation methods. A dense connected layer then processes the combined vector to produce a dual-modal vector that encapsulates distinctive user features. This dual-modal vector, processed through a softmax activation function and another dense connected layer, is used for identification. The presented system achieved an identification accuracy of 99% and a low equal error rate (EER) of 0.13% for verification. These results, derived from the VidTimit and BIOMEX-DB datasets, highlight the effectiveness of the proposed bimodal approach in improving biometric security.
Rocznik
Tom
Strony
31--42
Opis fizyczny
Bibliogr. 58 poz., rys., wykr.
Twórcy
autor
  • Laboratory of TIT, Department of Electrical Engineering Tahri Mohammed University of Bechar, Bechar, Algeria
  • Tahri Mohammed University of Bechar, Bechar, Algeria
Bibliografia
  • [1] S.A. Abdulrahman and B. Alhayani, “A Comprehensive Survey on the Biometric Systems Based on Physiological and Behavioral Characteristics”, Materials Today: Proceedings, vol. 80, pp. 2642–2646, 2023 (https://doi.org/10.1016/j.matpr.2021.07.005).
  • [2] S.K.S. Modak and V.K. Jha, “Multibiometric Fusion Strategy and its Applications: A Review”, Information Fusion, vol. 49, pp. 174–204, 2019 (https://doi.org/10.1016/j.inffus.2018.11.018).
  • [3] D. Patel, S. Patel, A.A. Thadeshwar, and R. Chaturvedi, “Multimodal Biometric Systems: A Review”, International Journal of Advanced Research in Computer Science, vol. 9, no. 2, pp. 361–365, 2018 (https://doi.org/10.26483/ijarcs.v9i2.5742).
  • [4] H. Mandalapu et al., “Audio-visual Biometric Recognition and Presentation Attack Detection: A Comprehensive Survey”, IEEE Access, vol. 9, pp. 37431–37455, 2021 (https://doi.org/10.1109/access.2021.3063031).
  • [5] M. Singh, R. Singh, and A. Ross, “A Comprehensive Overview of Biometric Fusion”, Information Fusion, vol. 52, pp. 187–205, 2019 (https://doi.org/10.1016/j.inffus.2018.12.003).
  • [6] N. Alay and H.H. Al-Baity, “Deep Learning Approach for Multimodal Biometric Recognition System Based on Fusion of Iris, Face, and Finger Twenty Traits”, Sensors, vol. 20, no. 19, art. no. 5523, 2020 (https://doi.org/10.3390/s20195523).
  • [7] S. Shakil, D. Arora, and T. Zaidi, “Feature Based Classification of Voice Based Biometric Data Through Machine Learning Algorithm”, Materials Today: Proceedings, vol. 51, pp. 240–247, 2022 (https://doi.org/10.1016/j.matpr.2021.05.261).
  • [8] N.D. Al-Shakarchy, H.K. Obayes, and Z.N. Abdullah, “Person Identification Based on Voice Biometric Using Deep Neural Network”, International Journal of Information Technology, vol. 15, no. 2, pp. 789–795, 2023 (https://doi.org/10.1007/s41870-022-01142-1).
  • [9] N.K. Benamara, E. Zigh, T.B. Stambouli, and M. Keche, “Towards a Robust Thermal-visible Heterogeneous Face Recognition Approach Based on a Generative Cycle Adversarial Network”, International Journal of Interactive Multimedia and Artificial Intelligence, vol. 7, no. 4, pp. 132–145, 2022 (https://doi.org/10.9781/ijimai.2021.12.003).
  • [10] D.M. Jiménez-Bravo et al., “Edge Face Recognition System Based on One-shot Augmented Learning”, International Journal of Interactive Multimedia and Artificial Intelligence, vol. 7, no. 6, pp. 31–44, 2022 (https://doi.org/10.9781/ijimai.2022.09.001).
  • [11] A. Alcaide et al., “LIPSNN: A Light Intrusion-proving Siamese Neural Network Model for Facial Verification”, International Journal of Interactive Multimedia and Artificial Intelligence, vol. 7, no. 4, pp. 121–131, 2022 (https://doi.org/10.9781/ijimai.2021.11.003).
  • [12] V. Talreja, M.C. Valenti, and N.M. Nasrabadi, “Multibiometric Secure System Based on Deep Learning”, 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP), Montreal, Canada, 2017 (https://doi.org/10.1109/globalsip.2017.8308652).
  • [13] Q. Zhang, H. Li, Z. Sun, and T. Tan, “Deep Feature Fusion for Iris and Periocular Biometrics on Mobile Devices”, IEEE Transactions on Information Forensics and Security, vol. 13, no. 11, pp. 2897–2912, 2018 (https://doi.org/10.1109/tifs.2018.2833033).
  • [14] Y. Xin et al., “Multimodal Feature-level Fusion for Biometrics Identification System on IoMT Platform”, IEEE Access, vol. 6, pp. 21418–21426, 2018 (https://doi.org/10.1109/access.2018.2815540).
  • [15] V.V. Khryashchev, A.I. Topnikov, A.F. Stefanidi, and A.L. Priorov, “Bimodal Person Identification Using Voice Data and Face Images”, Eleventh International Conference on Machine Vision (ICMV 2018), Munich, Germany, 2019 (https://doi.org/10.1117/12.2523138).
  • [16] A. Abozaid, A. Haggag, H. Kasban, and M. Eltokhy, “Multimodal Biometric Scheme for Human Authentication Technique Based on Voice and Face Recognition Fusion”, Multimedia Tools and Applications, vol. 78, pp. 16345–16361, 2019 (https://doi.org/10.1007/s11042-018-7012-3).
  • [17] O. Olazabal et al., “Multimodal Biometrics for Enhanced IoT Security”, 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, USA, 2019 (https://doi.org/10.1109/ccwc.2019.8666599).
  • [18] X. Zhang et al., “An Efficient Android-based Multimodal Biometric Authentication System with Face and Voice”, IEEE Access, vol. 8, pp. 102757–102772, 2020 (https://doi.org/10.1109/access.2020.2999115).
  • [19] E. Al Alkeem et al., “Robust Deep Identification Using ECG and Multimodal Biometrics for Industrial Internet of Things”, Ad Hoc Networks, vol. 121, art. no. 102581, 2021 (https://doi.org/10.1016/j.adhoc.2021.102581).
  • [20] M. Leghari et al., “Deep Feature Fusion of Fingerprint and Online Signature for Multimodal Biometrics”, Computers, vol. 10, no. 2,art. no. 21, 2021 (https://doi.org/10.3390/computers10020021).
  • [21] C.F.F. Costa-Filho, J.V. Negreiro, and M.G.F. Costa, “Multimodal Biometric System Based on Autoencoders and Learning Vector Quantization”, Brazilian Congress on Biomedical Engineering, Vitoria, Brazil, 2020 (https://doi.org/10.1007/978-3-030-70601-2_236).
  • [22] C. Kamlaskar and A. Abhyankar, “Feature Level Fusion Framework for Multimodal Biometric System Based on CCA with SVM Classifier and Cosine Similarity Measure”, Australian Journal of Electrical and Electronics Engineering, vol. 20, no. 2, pp. 205–218, 2023 (https: //doi.org/10.1080/1448837x.2022.2129147).
  • [23] Z. Zhang, H. Lu, P. Sang, and J. Wang, “MultiBioGM: A Hand Multimodal Biometric Model Combining Texture Prior Knowledge to Enhance Generalization Ability”, in: Biometric Recognition (CCBR 2023), pp. 106–115, 2023 (https://doi.org/10.1007/978-981-99-8565-4_11).
  • [24] V. Gurunathan and R. Sudhakar, “Multimodal Biometric System Using Palm Vein and Ear Images”, Proceeding of International Conference on Computer Visions and Robotics, pp. 439–451, 2023 (https://doi.org/10.1007/978-981-99-4577-1_36).
  • [25] T. Hafs, H. Zehir, A. Hafs, and A. Nait-Ali, “Multimodal Biometric System Based on the Fusion in Score of Fingerprint and Online Handwritten Signature”, Applied Computer Systems, vol. 28, no. 1, pp. 37–49, 2023 (https://doi.org/10.2478/acss-2023-0006).
  • [26] M. Ravanelli and Y. Bengio, “Speaker Recognition from Raw Waveform with SincNet”, 2018 IEEE Spoken Language Technology Workshop (SLT), Athens, Greece, 2018 (https://doi.org/10.1109/slt.2018.8639585).
  • [27] Y. Badr, P. Mukherjee, and S.M. Thumati, “Speech Emotion Recognition using MFCC and Hybrid Neural Networks”, Proceedings of the 13th International Joint Conference on Computational Intelligence, pp. 366–373, 2021 (https://doi.org/10.5220/0010707400003063).
  • [28] A.K. Dubey and V. Jain, “Comparative Study of Convolution Neural Network’s ReLU and Leaky-ReLU Activation Functions”, in: Applications of Computing, Automation and Wireless Systems in Electrical Engineering, pp. 873–880, 2019 (https://doi.org/10.1007/978-981-13-6772-4_76).
  • [29] D.B. Jadhav, G.S. Chavan, V.C. Bagal, and R.R. Manza, “Review on Multimodal Biometric Recognition System Using Machine Learning”, Artificial Intelligence and Applications, vol. 20, pp. 1–7, 2023 (https://doi.org/10.47852/bonview3202593).
  • [30] C. Sanderson and B.C. Lovell, “Multi-region Probabilistic Histograms for Robust and Scalable Identity Inference”, in: Advances in Biometrics (Conference Proceedings), pp. 199–208, 2009 (https://doi.org/10.1007/978-3-642-01793-3_21).
  • [31] D. Snyder, D. Povey, and G. Chen, “MUSAN: A Music, Speech, and Noise Corpus”, ArXiv, 2015 (https://doi.org/10.48550/arxiv.1510.08484).
  • [32] A. Zelinsky, “Learning OpenCV–Computer Vision with the OpenCV Library”, IEEE Robotics & Automation Magazine, vol. 16, no. 3, p. 100, 2009 (https://doi.org/10.1109/mra.2009.933612).
  • [33] M. Wang, Z. Wang, and J. Li, “Deep Convolutional Neural Network Applies to Face Recognition in Small and Medium Databases”, 2017 4th International Conference on Systems and Informatics (ICSAI), Hangzhou, China, 2017 (https://doi.org/10.1109/icsai.2017.8248499).
  • [34] P. Ke, M. Cai, H. Wang, and J. Chen, “A Novel Face Recognition Algorithm Based on the Combination of LBP and CNN”, 2018 14th IEEE International Conference on Signal Processing (ICSP), Beijing, China, 2018 (https://doi.org/10.1109/icsp.2018.8652477).
  • [35] Q. Xu and N. Zhao, “A Facial Expression Recognition Algorithm Based on CNN and LBP Feature”, 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Chongqing, China, 2020 (https://doi.org/10.1109/itnec48623.2020.9084763).
  • [36] A.B. Jung et al., “Imgaug”, GitHub: San Francisco, USA, 2020 (https://github.com/aleju/imgaug).
  • [37] J.-M. Cheng and H.-C. Wang, “A Method of Estimating the Equal Error Rate for Automatic Speaker Verification”, 2004 International Symposium on Chinese Spoken Language Processing, Hong Kong, China, 2004 (https://doi.org/10.1109/CHINSL.2004.1409642).
  • [38] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition”, IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016 (https://doi.org/10.1109/cvpr.2016.90).
  • [39] I. Aliyu, M.A. Bomoi, and M. Maishanu, “A Comparative Study of Eigenface and Fisherface Algorithms Based on OpenCV and Sci-kit Libraries Implementations”, International Journal of Information Engineering & Electronic Business, vol. 14, no. 3, pp. 30–40, 2022 (https://doi.org/10.5815/ijieeb.2022.03.04).
  • [40] D. Snyder et al., “X-vectors: Robust DNN Embeddings for Speaker Recognition”, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, Canada, 2018 (https://doi.org/10.1109/icassp.2018.8461375).
  • [41] Y. Kortli, M. Jridi, A. Al Falou, and M. Atri, “Face Recognition Systems: A Survey”, Sensors, vol. 20, no. 2, art. no. 342, 2020 (https://doi.org/10.3390/s20020342).
  • [42] A. Verma, A. Goyal, N. Kumar, and H. Tekchandani, “Face Recognition: A Review and Analysis”, in: Computational Intelligence in Data Mining (Conference Proceedings), pp. 195–210, 2022 (https://doi.org/10.1007/978-981-16-9447-9_15).
  • [43] Z. Bai and X. L. Zhang, “Speaker Recognition Based on Deep Learning: An Overview”, Neural Networks, vol. 140, pp. 65–99, 2021 (https://doi.org/10.1016/j.neunet.2021.03.004).
  • [44] A.Q. Ohi, M.F. Mridha, M.A. Hamid, and M.M. Monowar, “Deep Speaker Recognition: Process, Progress, and Challenges”, IEEE Access, vol. 9, pp. 89619–89643, 2021 (https://doi.org/10.1109/access.2021.3090109).
  • [45] M. Gofman et al., “Multimodal Biometrics via Discriminant Correlation Analysis on Mobile Devices”, International Conference on Security and Management (SAM), Las Vegas, USA, 2018.
  • [46] R. Ramachandra et al., “Smartphone Multimodal Biometric Authentication: Database and Evaluation”, ArXiv, 2019 (https://doi.org/10.48550/arXiv.1912.02487).
  • [47] G. Antipov, N. Gengembre, O.L. Blouch, and G.L. Lan, “Automatic Quality Assessment for Audio-visual Verification Systems: The LOVe Submission to NIST SRE Challenge 2019”, ArXiv, 2020 (https://doi.org/10.48550/arXiv.2008.05889).
  • [48] S.O. Sadjadi et al., “The 2019 NISTAudio-visual Speaker Recognition Evaluation”, The Speaker and Language Recognition Workshop: Odyssey 2020, Tokyo, Japan, 2020 (https://doi.org/10.21437/odyssey.2020-37).
  • [49] M. Liu et al., “Exploring Deep Learning for Joint Audio-visual Lip Biometrics”, ArXiv, 2021 (https://doi.org/10.48550/arXiv.2104.08510).
  • [50] G. Fenu and M. Marras, “Demographic Fairness in Multimodal Biometrics: A Comparative Analysis on Audio-visual Speaker Recognition Systems”, Procedia Computer Science, vol. 198, pp. 249–254, 2022 (https://doi.org/10.1016/j.procs.2021.12.236).
  • [51] M.S. Saeed et al., “Single-branch Network for Multimodal Training”, 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 2023 (https://doi.org/10.1109/ICASSP49357.2023.10097207).
  • [52] G.P. Rajasekhar and J. Alam, “Audio-visual Speaker Verification via Joint Cross-attention”, International Conference on Speech and Computer, Dharwad, India, 2023 (https://doi.org/10.1007/978-3-031-48312-7_2).
  • [53] R. Tao et al., “Multi-stage Face-voice Association Learning with Keynote Speaker Diarization”, ArXiv, 2024 (https://doi.org/10.48550/arXiv.2407.17902).
  • [54] M. Abdrakhmanova et al., “One Model to Rule Them All: A Universal Transformer for Biometric Matching”, IEEE Access, vol. 12, pp. 96729–96739, 2024 (https://doi.org/10.1109/ACCESS.2024.3426602).
  • [55] A. Farhadipour, M. Chapariniya, T. Vukovic, and V. Dellwo, “Comparative Analysis of Modality Fusion Approaches for Audio-visual Person Identification and Verification”, ArXiv, 2024 (https://doi.org/10.48550/arXiv.2409.00562).
  • [56] C. Wang, H. Zhu, and L. Xu, “Research on the Improvement of the Target Speaker Recognition System Based on Dual-Modal Fusion”, 2024 5th International Conference on Computer Vision, Image and Deep Learning (CVIDL), Zhuhai, China, 2024 (https://doi.org/10.1109/cvidl62147.2024.10603613).
  • [57] Y. Jiang et al., “Target Speech Diarization with Multimodal Prompts”, ArXiv, 2024 (https://doi.org/10.48550/arXiv.2406.07198).
  • [58] C. Peng, L. He, and D. Su, “Fuse after Align: Improving Facevoice Association Learning via Multimodal Encoder”, ArXiv, 2024 (https://doi.org/10.48550/arXiv.2404.09509).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-ea561b8a-f1c2-4f4a-a9ca-b6767097e821
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.