Wyniki wyszukiwania - BazTech

1

Fault Diagnosis of Suspension System Based on Spectrogram Image and Vision Transformer

Arun Balaji, Venkatesh Naveen S, Sugumaran V

Eksploatacja i Niezawodność

|

2024

|

Vol. 26, no. 1

art. no. 174860

EN

The suspension system in an automobile is essential for comfort and control. Implementing a monitoring system is crucial to ensure proper function, prevent accidents, maintain performance, and reduce both downtime and costs. Traditionally, diagnosing faults in suspension systems has relied on specialized setups and vibration analysis. The conventional approach typically involves either wavelet analysis or a machine learning approach. While these methods are effective, they often demand specialized expertise and time consumable. Alternatively, using deep learning for suspension system fault diagnosis enables faster and more precise real-time fault detection. This study explores the use of vision transformers as an innovative approach to fault diagnosis in suspension systems, utilizing spectrogram images. The process involves extracting spectrogram images from vibration signals, which serve as inputs for the vision transformer model. The test results demonstrate that the proposed fault diagnosis system achieves an impressive accuracy rate of 98.12% in identifying faults.

2

Exploring automated object detection methods for manholes using classical computer vision and deep learning for autonomous vehicles

Rao Shika, Mitnala Nitya

Machine Graphics and Vision

|

2023

|

Vol. 32, No. 1

25--53

EN

Open, broken, and improperly closed manholes can pose problems for autonomous vehicles and thus need to be included in obstacle avoidance and lane-changing algorithms. In this work, we propose and compare multiple approaches for manhole localization and classification like classical computer vision, convolutional neural networks like YOLOv3 and YOLOv3-Tiny, and vision transformers like YOLOS and ViT. These are analyzed for speed, computational complexity, and accuracy in order to determine the model that can be used with autonomous vehicles. In addition, we propose a size detection pipeline using classical computer vision to determine the size of the hole in an improperly closed manhole with respect to the manhole itself. The evaluation of the data showed that convolutional neural networks are currently better for this task, but vision transformers seem promising.

3

An efficient pedestrian attribute recognition system under challenging conditions

Nguyen Ha X., Hoang Dong N., Tran Tuan A., Dang Tuan M.

Machine Graphics and Vision

|

2023

|

Vol. 32, No. 2

3--18

EN

In this work, an efficient pedestrian attribute recognition system is introduced. The system is based on a novel processing pipeline that combines the best-performing attribute extraction model with an efficient attribute filtering algorithm using keypoints of human pose. The attribute extraction models are developed based on several state-of-the-art deep networks via transfer learning techniques, including ResNet50, Swin-transformer, and ConvNeXt. Pre-trained models of these networks are fine-tuned using the Ensemble Pedestrian Attribute Recognition (EPAR) dataset. Several optimization techniques, including the advanced optimizer Adam with Decoupled Weight Decay Regularization (AdamW), Random Erasing (RE), and weighted loss functions, are adopted to solve issues of data unbalancing or challenging conditions like partial and occluded bodies. Experimental evaluations are performed via EPAR that contains 26 993 images of 1477 person IDs, most of which are in challenging conditions. The results show that the ConvNeXt-v2-B outperforms other networks; mean accuracy (mA) reaches 85.57%, and other indices are also the highest. The addition of AdamW or RE can improve accuracy by 1-2%. The use of new loss functions can solve the issue of data unbalancing, in which the accuracy of data-less attributes improves by a maximum of 14% in the best case. Significantly, when the attribute filtering algorithm is applied, the results are dramatically improved, and mA reaches an excellent value of 94.85%. Utilizing the state-of-the-art attribute extraction model with optimization techniques on the large-scale and diverse dataset and attribute filtering has shown a good approach and thus has a high potential for practical applications.

4

Transformer-based cross-modal multi-contrast network for ophthalmic diseases diagnosis

Yu Yang, Zhu Hongqing

Biocybernetics and Biomedical Engineering

|

2023

|

Vol. 43, no. 3

507--527

EN

Automatic diagnosis of various ophthalmic diseases from ocular medical images is vital to support clinical decisions. Most current methods employ a single imaging modality, especially 2D fundus images. Considering that the diagnosis of ophthalmic diseases can greatly benefit from multiple imaging modalities, this paper further improves the accuracy of diagnosis by effectively utilizing cross-modal data. In this paper, we propose Transformerbased cross-modal multi-contrast network for efficiently fusing color fundus photograph (CFP) and optical coherence tomography (OCT) modality to diagnose ophthalmic diseases. We design multi-contrast learning strategy to extract discriminate features from crossmodal data for diagnosis. Then channel fusion head captures the semantically shared information across different modalities and the similarity features between patients of the same category. Meanwhile, we use a class-balanced training strategy to cope with the situation that medical datasets are usually class-imbalanced. Our method is evaluated on public benchmark datasets for cross-modal ophthalmic disease diagnosis. The experimental results demonstrate that our method outperforms other approaches. The codes and models are available at https://github.com/ecustyy/tcmn.

5

TL-med: A Two-stage transfer learning recognition model for medical images of COVID-19

Meng Jiana, Tan Zhiyong, Yu Yuhai, Wang Pengjie, Liu Shuang

Biocybernetics and Biomedical Engineering

|

2022

|

Vol. 42, no. 3

842--855

EN

The recognition of medical images with deep learning techniques can assist physicians in clinical diagnosis, but the effectiveness of recognition models relies on massive amounts of labeled data. With the rampant development of the novel coronavirus (COVID-19) worldwide, rapid COVID-19 diagnosis has become an effective measure to combat the outbreak. However, labeled COVID-19 data are scarce. Therefore, we propose a two-stage transfer learning recognition model for medical images of COVID-19 (TL-Med) based on the concept of ‘‘generic domain-target-related domain-target domain”. First, we use the Vision Transformer (ViT) pretraining model to obtain generic features from massive heterogeneous data and then learn medical features from large-scale homogeneous data. Two-stage transfer learning uses the learned primary features and the underlying information for COVID-19 image recognition to solve the problem by which data insufficiency leads to the inability of the model to learn underlying target dataset information. The experimental results obtained on a COVID-19 dataset using the TL-Med model produce a recognition accuracy of 93.24%, which shows that the proposed method is more effective in detecting COVID-19 images than other approaches and may greatly alleviate the problem of data scarcity in this field.

6

Explainable COVID-19 detection using fractal dimension and vision transformer with Grad-CAM on cough sounds

Sobahi Nebras, Atila Orhan, Deniz Erkan, Sengur Abdulkadir, Acharya U. Rajendra

Biocybernetics and Biomedical Engineering

|

2022

|

Vol. 42, no. 3

1066--1080

EN

The polymerase chain reaction (PCR) test is not only time-intensive but also a contact method that puts healthcare personnel at risk. Thus, contactless and fast detection tests are more valuable. Cough sound is an important indicator of COVID-19, and in this paper, a novel explainable scheme is developed for cough sound-based COVID-19 detection. In the presented work, the cough sound is initially segmented into overlapping parts, and each segment is labeled as the input audio, which may contain other sounds. The deep Yet Another Mobile Network (YAMNet) model is considered in this work. After labeling, the segments labeled as cough are cropped and concatenated to reconstruct the pure cough sounds. Then, four fractal dimensions (FD) calculation methods are employed to acquire the FD coefficients on the cough sound with an overlapped sliding window that forms a matrix. The constructed matrixes are then used to form the fractal dimension images. Finally, a pretrained vision transformer (ViT) model is used to classify the constructed images into COVID-19, healthy and symptomatic classes. In this work, we demonstrate the performance of the ViT on cough sound-based COVID-19, and a visual explainability of the inner workings of the ViT model is shown. Three publically available cough sound datasets, namely COUGHVID, VIRUFY, and COSWARA, are used in this study. We have obtained 98.45%, 98.15%, and 97.59% accuracy for COUGHVID, VIRUFY, and COSWARA datasets, respectively. Our developed model obtained the highest performance compared to the state-of-the-art methods and is ready to be tested in real-world applications.