Position weight matrix model as a tool for the study of regulatory elements distribution across the DNA sequence
Treść / Zawartość
Ab initio methods of DNA regulatory sequence region prediction known as transcription factor binding sites (TFBS) are a very big challenge to modern bioinformatics. Although the currently available methods are not perfect they are fairly reliable and can be used to search for new potential protein-DNA interaction sites. The biggest problem of ab initio approaches is the very high false positive rate of predicted sites which results mainly from the fact that TFBS are very short and highly degenerate. Because of that they can occur by chance every few hundred bases making the task of computational prediction extremely difficult if one aims to reduce the high false positive rate keeping highest possible sensitivity to predict biologically meaningful sequence regions. In this work we present a new application that can be used to predict TFBS regions in very large datasets based on position weight matrix models (PWM’s) using one of the most popular prediction methods. The presented application was used to predict the concentration of TFBS in a set of nearly 2.2 thousand unique sequences of human gene promoter regions. The study revealed that the concentration of TFBS further than 1kbp from the transcription initiation site is constant but it decreases rapidly while getting closer to the transcription initiation site. The decreasing TFBS concentration in the vicinity of genes might result from evolutionary selection which keeps only sites responsible for interactions with proteins being part of a specific regulatory mechanism leading to cells survival.
Bibliogr. 33 poz., rys., wzory
-  T. R. GREGORY, ET AL.: Eukaryotic genome size databases. Nucleic Acids Res., 35 (2007), D332-D338.
-  D. S. LATCHMAN: Transcription factors: an overview. Int. J. Biochem. Cell Biol., 29 (1997), 1305-1312.
-  M. KARIN: Too many transcription factors: positive and negative interactions. New Biol., 2 (1990), 126-131.
-  R. G. ROEDER: The role of general initiation factors in transcription by RNA polymerase II. Trends Biochem. Sci., 21 (1996), 327-335.
-  D. B. NIKOLOV and S. K. BURLEY: RNA polymerase II transcription initiation: a structural view. Proc. Natl. Acad. Sci. USA, 94 (1997), 15-22.
-  T. I. LEE and R.A. YOUNG: Transcription of eukaryotic protein-coding genes. Annu. Rev. Genet., 34 (2000), 77-137.
-  M. M. BABU, ET AL.: Structure and evolution of transcriptional regulatory networks. Curr. Opin. Struct. Biol., 14 (2004), 283-291.
-  A. H. BRIVANLOU and J. E. DARNELL, JR.: Signal transduction and the control of gene expression. Science, 295 (2002), 813-818.
-  M. LEVINE and R. TJIAN: Transcription regulation and animal diversity. Nature, 424, (2003), 147-151.
-  K. K. BARTHEL and X. LIU: A transcriptional enhancer from the coding region of ADAMTS5. PLoS One, 3 (2008), e2184.
-  G. GILL: Regulation of the initiation of eukaryotic transcription. Essays Biochem., 37 (2001), 33-43.
-  G. J. NARLIKAR, H. Y. FAN and R. E. KINGSTON: Cooperation between complexes that regulate chromatin structure and transcription. Cell, 108 (2002), 475-487.
-  L. XU, C. K. GLASS and M. G. ROSENFELD: Coactivator and corepressor complexes in nuclear receptor function. Curr. Opin. Genet. Dev., 9 (1999), 140-147.
-  J. M. WONG and E. BATEMAN: TBP-DNA interactions in the minor groove discriminate between A:T and T:A base pairs. Nucleic Acids Res., 22 (1994), 1890-1896.
-  F. MUKUMOTO, ET AL.: DNA sequence requirement of a TATA element-binding protein from Arabidopsis for transcription in vitro. Plant Mol. Biol., 23 (1993), 995-1003.
-  W. H. DAY and F.R. MCMORRIS: Critical comparison of consensus methods for molecular sequences. Nucleic Acids Res., 20(5), (1992), 1093-1099.
-  J. M. CLAVERIE and S. AUDIC: The statistical significance of nucleotide positionweight matrix matches. Comput. Appl. Biosci., 12 (1996), 431-439.
-  G. D. STORMO, T.D. SCHNEIDER and L.M. GOLD: Characterization of translational initiation sites in E. coli. Nucleic Acids Res., 10 (1982), 2971-2996.
-  A. SANDELIN, W.W. WASSERMAN and B. LENHARD: ConSite: web-based prediction of regulatory elements using cross-species comparison. Nucleic Acids Res., 32 (2004), W249-W252.
-  P. BUCHER: Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences. J. Mol. Biol., 212 (1990), 563-578.
-  K. MASUDA, ET AL.: Androgen receptor binding sites identified by a GREF GATA model. J. Mol. Biol., 353 (2005), 763-771.
-  G. BERNARDI: Isochores and the evolutionary genomics of vertebrates. Gene, 241 (2000), 3-17.
-  H. TOUZET and J.S. VARRÉ: Efficient and accurate P-value computation for Position Weight Matrices. Algorithms for Molecular Biology, 2 (2007).
-  J. ZHANG, B. JIANG, M. LI, J. TROMP, X. ZHANG and M.Q. ZHANG: Computing exact P-values for DNA motifs. Bioinformatics, 23 (2007), 531-537.
-  Y. BARASH, ET AL.: Modeling dependencies in protein-DNA binding sites, Proceedings of the seventh annual international conference on Research in computational molecular biology, Berlin 2003, pp. 28-37.
-  X. ZHAO, H. HUANG and T.P. SPEED: Finding short DNA motifs using permuted Markov models. J. Comput. Biol., 12 (2005), 894-906.
-  O. D. KING and F. P. ROTH: A non-parametric model for transcription factor binding sites. Nucleic Acids Res., 31 (2003), e116.
-  B. LENHARD and W.W. WASSERMAN: TFBS: Computational framework for transcription factor binding site analysis. Bioinformatics, 18 (2002), 1135-1136.
-  G. Z. HERTZ, G.W. HARTZELL, 3RD and G.D. STORMO: Identification of consensus patterns in unaligned DNA sequences known to be functionally related. Comput. Appl. Biosci., 6 (1990), 81-92.
-  A. E. KEL, ET AL.: MATCH: A tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Res., 31 (2003), 3576-3579.
-  G. G. LOOTS and I. OVCHARENKO: rVISTA 2.0: evolutionary analysis of transcription factor binding sites. Nucleic Acids Res., 32 (2004), W217-W221.
-  V. D. MARINESCU, I.S. KOHANE and A. RIVA: MAPPER: a search engine for the computational identification of putative transcription factor binding sites in multiple genomes. BMC Bioinformatics, 6 (2005), 79.
-  A. SANDELIN, ET AL.: JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res., 32 (2004), D91-D94.