PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Neural network clustering of DNA data for improved PWM determination

Autorzy
Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
One of possible representations of similarities shared by a group of data in pattern recognition problems is the position weight matrix (PWM). PWM is convenient for description of data that are represented by the same alphabet. PWMs are very suitable for motif recognition and they are widely used in computational genomics. A specific motif of DNA or protein sequence is frequently described by an associated PWM obtained from raw data. In this paper we introduce a new method that increases recognition capabilities of motif-like patterns compared to the one based on the originally determined PWMs. The method uses information contained in the original PWM for a specific motif and makes neural network clustering of the training data, after which it generates a set of PWMs, one for each of the generated clusters. Such a description, contained in the combination of the clustering neural network and the associated collection of PWMs, is much more efficient in pattern recognition problems than the one based on the original PWM used in motif description. The new method allows a significant reduction of the false recognition level. The method is tested on the example of the cap site components of promoters in the vertebrate DNA sequences.
Rocznik
Strony
49--61
Opis fizyczny
Bibliogr. 22 poz.
Twórcy
autor
Bibliografia
  • [1] P. Baldi and S. Brunak, Bioinformatics: The Machine Learning Approach, MIT Press, Cambridge, MA, 1998.
  • [2] D. Barrick, K. Villaneuba, J. Childs, R. Kalii, T. D. Schneider, C. E. Lawerence, L. Gold, D. Stromo, Quantitative analysis of ribosome binding sites in E. coli. Nucleic Acids Res., 22, 1287-1295, 1994.
  • [3] O. G. Berg and P. H. von Hippel, Selection of DNA binding sites by regulatory proteins: Statistical-mechanical theory and application to operators and promoters J. Mol. Biol., 193, 723-750, 1987.
  • [4] O. G. Berg and P. H. von Hippel, Selection of DNA binding sites by regulatory proteins. II. The binding specificity of cyclic AMP receptor protein to recognition sites. J. Mol. Biol., 200, 709-723, 1988.
  • [5] P. Bucher, Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences. J. Mol. Biol., 212, 563-578, 1990.
  • [6] P. Bucher and E. N. Trifonov, Compilation and analysis of eukaryotic Pol II promoter sequences. Nucl. Acids Res., 14, 10009-10026, 1986.
  • [7] Q. K. Chen, G. Z. Hertz and G. Stormo, MATRIX SEARCH 1.0: a computer program that scans DNA sequences for transcriptional elements using a database of weight matrices. Computer Applications in Biosciences, 11, 563-566, 1995.
  • [8] M. Claverie and S. Audic, The statistical significance of nucleotide position-weight matrix matches. Computer Applications in Biosciences, 12, 431-439, 1996.
  • [9] I.W. Fickett and A. G. Hatzigeorgiou, Eukaryotic promoter recognition. Genome Research, 1 (9), 861-878, 1997.
  • [10] R. Harr, M. Haggstrom and P. Gustafsson, Search algorithm for pattern match analysis of nucleic acid sequence. Nucl. Acids Res., 11, 2943-2957, 1983.
  • [11] K. Hofmann and P. Bucher, The FHA domain: A putative nuclear signaling domain found in protein kinases and transcription factors. Trends. Biochem. Sei.. 20, 347- 349, 1995.
  • [12] T. Kohonen, Self-Organizing Maps, Series in Information Sciences, Vol. 30, Springer, Heidelberg, Second Ed., 1997.
  • [13] A. G. Pedersen, P. Baldi, Y. Chauvin and S. Brunak, The biology of eukaryotic promoter prediction - a review. Computers & Chemistry, 23, 191-207, 1999.
  • [14] D. S. Prestridge, Predicting Pol II promoter sequences using transcription factor binding sites. J. Mol. Biol., 249, 923-32, 1995.
  • [15] D. S. Prestridge, Computer software for eukaryotic promoter analysis, internet based text, 1999.
  • [16] K. Quandt, K. Frech, H. Karas, E. Wingender and T. Werner, Matlnd and Matlnspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data. Nucl. Acids Res., 23, 4878-4884, 1995.
  • [17] J. D. Spragins, J. L. Hammond and K. Pawlikowski, Telecommunications: Protocols and Design, Addison-Wesley, Reading, Massachusetts, 1997.
  • [18] R. Staden, Computer methods to locate signals in nucleic acid sequences. Nucl. Acids Res., 12, 505-519, 1984.
  • [19] G. D. Stormo, T. D. Schneider, L. Gold and A. Ehrenfeucht, Use of the ‘Perceptron’ algorithm to distinguish translational initiation sites in E. coli. Nuci. Acids Res., 10, 2997-3011, 1982.
  • [20] V. Veljkovic and I. Slavic, Simple General-Model Pseudopotential. Phys. Rev. Lett., 29 (5), 105-107, 1972.
  • [21] V. Veljkovic, I. Ćosić, В. Dimitrijevic and D. Lalovic, Is It Possible to Analyze DNA and Protein Sequences by the Methods of Digital Signal Processing?. IEEE Trans. Biomed. Eng., 32 (5), 337-341, 1985.
  • [22] E. Wingender, P. Dietze, H. Karas and R. Knüppel, TRANSFAC: a database on transcription factors and their DNA binding sites. Nucl. Acids Res., 24, 238-241, 1996.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-article-LOD7-0028-0041
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.