In recent years, due to the proliferation of inertial measurement units (IMUs) in mobile devices such as smartphones, attitude estimation using inertial and magnetic sensors has been the subject of considerable research. Traditional methods involve probabilistic and iterative state estimation; however, these approaches do not generalize well over continuously changing motion dynamics and environmental conditions. Therefore, this paper proposes a deep learning-based approach for attitude estimation. This approach segments data from sensors into different windows and estimates attitude by separately extracting local features and global features from sensor data using a residual network (ResNet18) and a long short-term memory network (LSTM). To improve the accuracy of attitude estimation, a multi-scale attention mechanism is designed within ResNet18 to capture finer temporal information in the sensor data. The experimental results indicate that the accuracy of attitude estimation using this method surpasses that of other methods proposed in recent years.
The engine bleed air system (BAS) is one of the important systems for civil aircraft, and fault prediction of BAS is necessary to improve aircraft safety and the operator's profit. A dual-stage two-phase attention-based encoder-decoder (DSTP-ED) prediction model is proposed for BAS normal state estimation. Unlike traditional ED networks, the DSTP-ED combines spatial and temporal attention to better capture the spatiotemporal relationships to achieve higher prediction accuracy. Five data-driven algorithms, autoregressive integrated moving average (ARIMA), support vector regression (SVR), long short-term memory (LSTM), ED, and DSTP-ED, are applied to build prediction models for BAS. The comparison experiments show that the DSTP-ED model outperforms the other four data-driven models. An exponentially weighted moving average (EWMA) control chart is used as the evaluation criterion for the BAS failure warning. An empirical study based on Quick Access Recorder (QAR) data from Airbus A320 series aircraft demonstrates that the proposed method can effectively predict failures.
3
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
In the context of today’s green development, it is the core task of the financial sector at all levels to enhance the utilisation of resources and to guide the high-quality development of industries, especially to channel funds originally gathered in high-pollution and energy-intensive industries to sectors with green and high-technology, to achieve the harmonious development of the economy and the resources and environment. This paper proposes a green financial text classification model based on machine learning. The model consists of four modules: the input module, the data analysis module, the data category module, and the classification module. Among them, the data analysis module and the data category module extract the data information of the input information and the green financial category information respectively, and the two types of information are finally fused by the attention mechanism to achieve the classification of green financial data in financial data. Extensive experiments are conducted on financial text datasets collected from the Internet to demonstrate the superiority of the proposed green financial text classification method.
4
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
With the deepening of green and sustainable development and the rapid development of the social economy, the modern logistics industry has also developed to an unprecedented level. In the logistics supply chain, due to the high value of the items inside the arrival carton, appearance inspection must be carried out before warehousing. However, manual inspection is slow and ineffective, resulting in the waste of manpower and packaging carton resources, which is not conducive to sustainable development. To address the above problems, this paper designs a logistics supply chain carton packaging quality defect detection system based on improved Single Shot MultiBox Detector (SSD) in the context of green sustainable development. The Implicit Feature Pyramid Network (IFPN) is introduced into SSD to improve the feature extraction ability of the model; the multiscale attention mechanism is introduced to collect more feature information. The experiment shows that the mAP and FPS of the system on the self-built data set reach 0.9662 and 36 respectively, which can realise the detection of the appearance defects of logistics cartons and help promote green sustainable development.
5
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
Urine microscopy is an essential diagnostic tool for kidney and urinary tract diseases, with automated analysis of urinary sediment particles improving diagnostic efficiency. However, some urinary sediment particles remain challenging to identify due to individual variations, blurred boundaries, and unbalanced samples. This research aims to mitigate the adverse effects of urine sediment particles while improving multi-class detection performance. We proposed an innovative model based on improved YOLOX for detecting urine sediment particles (YUS-Net). The combination of urine sediment data augmentation and overall pre-trained weights enhances model optimization potential. Furthermore, we incorporate the attention module into the critical feature transfer path and employ a novel loss function, Varifocal loss, to facilitate the extraction of discriminative features, which assists in the identification of densely distributed small objects. Based on the USE dataset, YUS-Net achieves the mean Average Precision (mAP) of 96.07%, 99.35% average precision, and 96.77% average recall, with a latency of 26.13 ms per image. The specific metrics for each category are as follows: cast: 99.66% AP; cryst: 100% AP; epith: 92.31% AP; epithn: 100% AP; eryth: 92.31% AP; leuko: 99.90% AP; mycete: 99.96% AP. With a practical network structure, YUS-Net achieved efficient, accurate, end-to-end urinary sediment particle detection. The model takes native high-resolution images as input without additional steps. Finally, a data augmentation strategy appropriate for the urinary microscopic image domain is established, which provides a novel approach for applying other methods in urine microscopic images.
Face recognition technology has been widely used in all aspects of people's lives. However, the accuracy of face recognition is greatly reduced due to the obscuring of objects, such as masks and sunglasses. Wearing masks in public has been a crucial approach to preventing illness, especially since the Covid-19 outbreak. This poses challenges to applications such as face recognition. Therefore, the removal of masks via image inpainting has become a hot topic in the field of computer vision. Deep learning-based image inpainting techniques have taken observable results, but the restored images still have problems such as blurring and inconsistency. To address such problems, this paper proposes an improved inpainting model based on generative adversarial network: the model adds attention mechanisms to the sampling module based on pix2pix network; the residual module is improved by adding convolutional branches. The improved inpainting model can not only effectively restore faces obscured by face masks, but also realize the inpainting of randomly obscured images of human faces. To further validate the generality of the inpainting model, tests are conducted on the datasets of CelebA, Paris Street and Place2, and the experimental results show that both SSIM and PSNR have improved significantly.
7
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
Seismic data collected from desert areas contain a large amount of low-frequency random noise with similar waveforms to the effective signals. The complex noise characteristics make it difficult to effectively identify and recover seismic signals, which will adversely affect subsequent seismic data processing and imaging. In order to recover the complex seismic events from low-frequency random noise, we propose an attention mechanism guided deep convolutional autoencoder network (ADCAE) to assign different importance to different features at different spatial position. In ADCAE, an attention module (AM) is connected to the deep convolutional autoencoder network (DCAE) with soft-thresholded symmetric skip connection that helps to enhance the ability of feature extraction. By combining the global features of the input data and the output local features of DCAE, AM generates an attention weight matrix, which assigns different weights to the features associated with the seismic events and random noise during the training process. In this way, AM can guide the update of the target gradient, thus retains the complex structure of the seismic events in the denoised results and improves the training efficiency of the model. The ADCAE is applied to the synthetic data and field seismic data, and denoised results show that ADCAE has achieved satisfactory denoising performance in signals recovery and low-frequency random noise suppression at the low signal-to-noise ratio.
This work proposes a segmentation-free approach to Arabic Handwritten Text Recog-nition (AHTR): an attention-based Convolutional Neural Network - Recurrent Neural Network - Con-nectionist Temporal Classification (CNN-RNN-CTC) deep learning architecture. The model receives asinput an image and provides, through a CNN, a sequence of essential features, which are transferred toan Attention-based Bidirectional Long Short-Term Memory Network (BLSTM). The BLSTM gives features sequence in order, and the attention mechanism allows the selection of relevant information from the features sequences. The selected information is then fed to the CTC, enabling the loss calculation and the transcription prediction. The contribution lies in extending the CNN by dropout layers, batch normalization, and dropout regularization parameters to prevent over-fitting. The output of the RNN block is passed through an attention mechanism to utilize the most relevant parts of the input sequence in a flexible manner. This solution enhances previous methods by improving the CNN speed and performance and controlling over model over-fitting. The proposed system achieves the best accuracy of97.1% for the IFN-ENIT Arabic script database, which competes with the current state-of-the-art. It was also tested for the modern English handwriting of the IAM database, and the Character Error Rate of 2.9% is attained, which confirms the model’s script independence.
Convolutional neural networks have achieved tremendous success in the areas of image processing and computer vision. However, they experience problems with low-frequency information such as semantic and category content and background color, and high-frequency information such as edge and structure. We propose an efficient and accurate deep learning framework called the multi-frequency feature extraction and fusion network (MFFNet) to perform image processing tasks such as deblurring. MFFNet is aided by edge and attention modules to restore high-frequency information and overcomes the multiscale parameter problem and the low-efficiency issue of recurrent architectures. It handles information from multiple paths and extracts features such as edges, colors, positions, and differences. Then, edge detectors and attention modules are aggregated into units to refine and learn knowledge, and efficient multi-learning features are fused into a final perception result. Experimental results indicate that the proposed framework achieves state-of-the-art deblurring performance on benchmark datasets.
As the fundamental part of other Intelligent Transportation Systems (ITS) applications, short-term traffic volume prediction plays an important role in various intelligent transportation tasks, such as traffic management, traffic signal control and route planning. Although Neural-network-based traffic prediction methods can produce good results, most of the models can’t be explained in an intuitive way. In this paper, we not only proposed a model that increase the short-term prediction accuracy of the traffic volume, but also improved the interpretability of the model by analyzing the internal attention score learnt by the model. we propose a spatiotemporal attention mechanism-based multistep traffic volume prediction model (SAMM). Inside the model, an LSTM-based Encoder-Decoder network with a hybrid attention mechanism is introduced, which consists of spatial attention and temporal attention. In the first level, the local and global spatial attention mechanisms considering the micro traffic evolution and macro pattern similarity, respectively, are applied to capture and amplify the features from the highly correlated entrance stations. In the second level, a temporal attention mechanism is employed to amplify the features from the time steps captured as contributing more to the future exit volume. Considering the time-dependent characteristics and the continuity of the recent evolutionary traffic volume trend, the timestamp features and historical exit volume series of target stations are included as the external inputs. An experiment is conducted using data from the highway toll collection system of Guangdong Province, China. By extracting and analyzing the weights of the spatial and temporal attention layers, the contributions of the intermediate parameters are revealed and explained with knowledge acquired by historical statistics. The results show that the proposed model outperforms the state-of-the-art model by 29.51% in terms of MSE, 13.93% in terms of MAE, and 5.69% in terms of MAPE. The effectiveness of the Encoder-Decoder framework and the attention mechanism are also verified.
Speech emotion recognition (SER) is a complicated and challenging task in the human-computer interaction because it is difficult to find the best feature set to discriminate the emotional state entirely. We always used the FFT to handle the raw signal in the process of extracting the low-level description features, such as short-time energy, fundamental frequency, formant, MFCC (mel frequency cepstral coefficient) and so on. However, these features are built on the domain of frequency and ignore the information from temporal domain. In this paper, we propose a novel framework that utilizes multi-layers wavelet sequence set from wavelet packet reconstruction (WPR) and conventional feature set to constitute mixed feature set for achieving the emotional recognition with recurrent neural networks (RNN) based on the attention mechanism. In addition, the silent frames have a disadvantageous effect on SER, so we adopt voice activity detection of autocorrelation function to eliminate the emotional irrelevant frames. We show that the application of proposed algorithm significantly outperforms traditional features set in the prediction of spontaneous emotional states on the IEMOCAP corpus and EMODB database respectively, and we achieve better classification for both speaker-independent and speaker-dependent experiment. It is noteworthy that we acquire 62.52% and 77.57% accuracy results with speaker-independent (SI) performance, 66.90% and 82.26% accuracy results with speaker-dependent (SD) experiment in final.
Specific emitter identification (SEI) can distinguish single-radio transmitters using the subtle features of the received waveform. Therefore, it is used extensively in both military and civilian fields. However, the traditional identification method requires extensive prior knowledge and is time-consuming. Furthermore, it imposes various effects associated with identifying the communication radiation source signal in complex environments. To solve the problem of the weak robustness of the hand-crafted feature method, many scholars at home and abroad have used deep learning for image identification in the field of radiation source identification. However, the classification method based on a real-numbered neural network cannot extract In-phase/Quadrature (I/Q)-related information from electromagnetic signals. To address these shortcomings, this paper proposes a new SEI framework for deep learning structures. In the proposed framework, a complex-valued residual network structure is first used to mine the relevant information between the in-phase and orthogonal components of the radio frequency baseband signal. Then, a one-dimensional convolution layer is used to a) directly extract the features of a specific one-dimensional time-domain signal sequence, b) use the attention mechanism unit to identify the extracted features, and c) weight them according to their importance. Experiments show that the proposed framework having complex-valued residual networks with attention mechanism has the advantages of high accuracy and superior performance in identifying communication radiation source signals.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.