One of possible representations of similarities shared by a group of data in pattern recognition problems is the position weight matrix (PWM). PWM is convenient for description of data that are represented by the same alphabet. PWMs are very suitable for motif recognition and they are widely used in computational genomics. A specific motif of DNA or protein sequence is frequently described by an associated PWM obtained from raw data. In this paper we introduce a new method that increases recognition capabilities of motif-like patterns compared to the one based on the originally determined PWMs. The method uses information contained in the original PWM for a specific motif and makes neural network clustering of the training data, after which it generates a set of PWMs, one for each of the generated clusters. Such a description, contained in the combination of the clustering neural network and the associated collection of PWMs, is much more efficient in pattern recognition problems than the one based on the original PWM used in motif description. The new method allows a significant reduction of the false recognition level. The method is tested on the example of the cap site components of promoters in the vertebrate DNA sequences.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.