Wyniki wyszukiwania - BazTech

1

Multi-model hybrid ensemble weighted adaptive approach with decision level fusion for personalized affect recognition based on visual cues

Jadhav Nagesh, Sugandhi Rekha

Bulletin of the Polish Academy of Sciences. Technical Sciences

|

2021

|

Vol. 69, nr 6

art. no. e138819

EN

In the domain of affective computing different emotional expressions play an important role. To convey the emotional state of human emotions, facial expressions or visual cues are used as an important and primary cue. The facial expressions convey humans affective state more convincingly than any other cues. With the advancement in the deep learning techniques, the convolutional neural network (CNN) can be used to automatically extract the features from the visual cues; however variable sized and biased datasets are a vital challenge to be dealt with as far as implementation of deep models is concerned. Also, the dataset used for training the model plays a significant role in the retrieved results. In this paper, we have proposed a multi-model hybrid ensemble weighted adaptive approach with decision level fusion for personalized affect recognition based on the visual cues. We have used a CNN and pre-trained ResNet-50 model for the transfer learning. VGGFace model’s weights are used to initialize weights of ResNet50 for fine-tuning the model. The proposed system shows significant improvement in test accuracy in affective state recognition compared to the singleton CNN model developed from scratch or transfer learned model. The proposed methodology is validated on The Karolinska Directed Emotional Faces (KDEF) dataset with 77.85% accuracy. The obtained results are promising compared to the existing state of the art methods.

2

Multi-View Attention-based Late Fusion (MVALF) CADx system for breast cancer using deep learning

Iftikhar Hina, Shahid Ahmad Raza, Raza Basit, Khan Hasan

Machine Graphics and Vision

|

2020

|

Vol. 29, No. 1/4

55--78

EN

Breast cancer is a leading cause of death among women. Early detection can significantly reduce the mortality rate among women and improve their prognosis. Mammography is the first line procedure for early diagnosis. In the early era, conventional Computer-Aided Diagnosis (CADx) systems for breast lesion diagnosis were based on just single view information. The last decade evidence the use of two views mammogram: Medio-Lateral Oblique (MLO) and Cranio-Caudal (CC) view for the CADx systems. Most recent studies show the effectiveness of four views of mammogram to train CADx system with feature fusion strategy for classification task. In this paper, we proposed an end-to-end Multi-View Attention-based Late Fusion (MVALF) CADx system that fused the obtained predictions of four view models, which is trained for each view separately. These separate models have different predictive ability for each class. The appropriate fusion of multi-view models can achieve better diagnosis performance. So, it is necessary to assign the proper weights to the multi-view classification models. To resolve this issue, attention-based weighting mechanism is adopted to assign the proper weights to trained models for fusion strategy. The proposed methodology is used for the classification of mammogram into normal, mass, calcification, malignant masses and benign masses. The publicly available datasets CBIS-DDSM and mini-MIAS are used for the experimentation. The results show that our proposed system achieved 0.996 AUC for normal vs. abnormal, 0.922 for mass vs. calcification and 0.896 for malignant vs. benign masses. Superior results are seen for the classification of malignant vs benign masses with our proposed approach, which is higher than the results using single view, two views and four views early fusion-based systems. The overall results of each level show the potential of multi-view late fusion with transfer learning in the diagnosis of breast cancer.

3

A Study on the Impact of Lombard Effect on Recognition of Hindi Syllabic Units Using CNN Based Multimodal ASR Systems

Uma Maheswari Sadasivam, Shahina A., Rishickesh Ramesh, Nayeemulla Khan A.

Archives of Acoustics

|

2020

|

Vol. 45, No. 3

419--431

EN

Research work on the design of robust multimodal speech recognition systems making use of acoustic, and visual cues, extracted using the relatively noise robust alternate speech sensors is gaining interest in recent times among the speech processing research fraternity. The primary objective of this work is to study the exclusive influence of Lombard effect on the automatic recognition of the confusable syllabic consonant-vowel units of Hindi language, as a step towards building robust multimodal ASR systems in adverse environments in the context of Indian languages which are syllabic in nature. The dataset for this work comprises the confusable 145 consonant-vowel (CV) syllabic units of Hindi language recorded simultaneously using three modalities that capture the acoustic and visual speech cues, namely normal acoustic microphone (NM), throat microphone (TM) and a camera that captures the associated lip movements. The Lombard effect is induced by feeding crowd noise into the speaker’s headphone while recording. Convolutional Neural Network (CNN) models are built to categorise the CV units based on their place of articulation (POA), manner of articulation (MOA), and vowels (under clean and Lombard conditions). For validation purpose, corresponding Hidden Markov Models (HMM) are also built and tested. Unimodal Automatic Speech Recognition (ASR) systems built using each of the three speech cues from Lombard speech show a loss in recognition of MOA and vowels while POA gets a boost in all the systems due to Lombard effect. Combining the three complimentary speech cues to build bimodal and trimodal ASR systems shows that the recognition loss due to Lombard effect for MOA and vowels reduces compared to the unimodal systems, while the POA recognition is still better due to Lombard effect. A bimodal system is proposed using only alternate acoustic and visual cues which gives a better discrimination of the place and manner of articulation than even standard ASR system. Among the multimodal ASR systems studied, the proposed trimodal system based on Lombard speech gives the best recognition accuracy of 98%, 95%, and 76% for the vowels, MOA and POA, respectively, with an average improvement of 36% over the unimodal ASR systems and 9% improvement over the bimodal ASR systems.

4

Application of content-based image analysis to environmental microorganism classification

Li C., Shirahama K., Grzegorzek M.

Biocybernetics and Biomedical Engineering

|

2015

|

Vol. 35, no. 1

10--21

EN

Environmental microorganisms (EMs) are single-celled or multi-cellular microscopic organ-isms living in the environments. They are crucial to nutrient recycling in ecosystems as they act as decomposers. Occurrence of certain EMs and their species are very informative indicators to evaluate environmental quality. However, the manual recognition of EMs in microbiological laboratories is very time-consuming and expensive. Therefore, in this article an automatic EM classification system based on content-based image analysis (CBIA) techniques is proposed. Our approach starts with image segmentation that determines the region of interest (EM shape). Then, the EM is described by four different shape descriptors, whereas the Internal Structure Histogram (ISH), a new and original shape feature extraction technique introduced in this paper, has turned out to possess the most discriminative properties in this application domain. Afterwards, for each descriptor a support vector machine (SVM) is constructed to distinguish different classes of EMs. At last, results of SVMs trained for all four feature spaces are fused in order to obtain the final classification result. Experimental results certify the effectiveness and practicability of our automatic EM classification system.