The new SIMD Implementation of the Smith-Waterman Algorithm on Cell Microprocessor

Rudnicki, W.R.; Jankowski, A.; Modzelewski, A.; Piotrowski, A.; Zadrożny, A.

Artykuł - szczegóły

Tytuł artykułu

The new SIMD Implementation of the Smith-Waterman Algorithm on Cell Microprocessor

Autorzy

Rudnicki W.R. , Jankowski A. , Modzelewski A. , Piotrowski A. , Zadrożny A.

Wybrane pełne teksty z tego czasopisma

https://fi.episciences.org/

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

Algorithms for estimating similarity between two macromolecular sequences are of profound importance for molecular biology. The standard methods utilize so-called primary structure, that is a string of characters denoting the sequence of monomers in hetero-polymer. These methods find the substrings of maximal similarity, as defined by the so-called similarity matrix, for a pair of two molecules. The problem is solved either by the exact dynamic programming method, or by approximate heuristic methods. The approximate algorithms are almost two orders of magnitude faster in comparison with the standard version of the exact Smith-Waterman algorithm, when executed on the same hardware, hence the exact algorithm is relatively rarely used. Recently a very efficient implementation of Smith-Waterman algorithm utilizing SIMD extensions to the standard instruction set reduced the speed advantage of heuristic algorithms to factor of three. Here we present an improved implementation of the Smith-Waterman algorithm on the Cell processor. Implementation presented here achieves execution speed of approximately 9 GCUPS. The performance is independent on the scoring system. It is 4 to 10 times faster than best Smith-Waterman implementation running on a PC and 1.5 to 3 times faster than the same implementation running on Sony PlayStation 3. It is also 5 times faster than the recent implementation of the Smith-Waterman utilizing Nvidia GPU. Our implementation running on Sony PlayStation 3 has performance which is directly comparable with that of BLAST running on PC, being up to 4 times faster in the best case and no more than two times slower in the worst case. This performance level opens possibility for using the exact Smith-Waterman algorithm in applications, where currently approximate algorithms are used.

Słowa kluczowe

bioinformatics protein sequence alignment Cell processor

Wydawca

IOS Press

Czasopismo

Fundamenta Informaticae

Rocznik

2009

Tom

Vol. 96, nr 1/2

Strony

181--194

Opis fizyczny

Bibliogr. 23 poz., tab., wykr.

Twórcy

autor

Rudnicki W.R.

autor

Jankowski A.

autor

Modzelewski A.

autor

Piotrowski A.

autor

Zadrożny A.

Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, Pawińskiego 5A, 02-106 Warszawa, Poland, W.Rudnicki@icm.edu.pl

Bibliografia

[1] Watson, J. D., Crick, F. H. C.: A Structure for Deoxyribose Nucleic Acid, Nature, 171, 1953, 737-738.
[2] Crick, F. H., Barnett, L., Brenner, S., Watts-Tobin, R. J.: General nature of the genetic code for proteins, Nature, 192, 1961, 1227-1232.
[3] Nirenberg, M. W., Matthaei, J. H.: The Dependence of Cell-Free Protein Synthesis in E. coli upon Naturally Occurring or Synthetic Polyribonucleotides, Proceedings of the National Academy of Sciences, 47, 1961, 1588-1602.
[4] Söll, D., Ohtsuka, E., Jones, D. S., Lohrmann, R., Hayatsu, H., Nishimura, S., Khorana, H. G.: Studies on polynucleotides, XLIX. Stimulation of the binding of aminoacyl-sRNA's to ribosomes by ribotrinucleotides and a survey of codon assignments for 20 amino acids, Proceedings of the National Academy of Sciences, 54, 1965, 1378-85.
[5] Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C.: A model of evolutionary change in proteins. In: Atlas of Protein Sequence and Structure. Edited by Dayhoff, M.O., vol. 5, National Biomedical Research Foundation, 1978.
[6] Needleman, S. B., Wunsch, C. D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins, Journal of Molecular Biology, 48, 1970, 443-53.
[7] Smith, T. F., Waterman, M. S.: Identification of common molecular subsequences, Journal of Molecular Biology, 147, 1981, 195-197.
[8] Gotoh, O.: An improved algorithm for matching biological sequences. Journal of Molecular Biology, 162, 1982, 705-708.
[9] Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., Lipman, D. J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Research, 25, 1997, 3389-3402.
[10] Pearson,W. R., Lipman, D. J.: Improved Tools for Biological Sequence Comparison, Proceedings of the National Academy of Sciences, 85, 1988, 2444-2448.
[11] Alpern, B., Carter, L., Gatlin, K. S.: Microparallelism and High-Performance Protein Matching. In: ACM/IEEE Supercomputing Conference, San Diego, California, 1995.
[12] Wozniak, A.: Using video-oriented instructions to speed up sequence comparison, Comput. Appl. Biosci., 13, 1997, 145-150.
[13] Rognes, T., Seeberg, E.: Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors, Bioinformatics (Oxford, England), 16, 2000, 699-706.
[14] Farrar, M.: Striped Smith-Waterman speeds database searches six times over other SIMD implementations. Bioinformatics (Oxford, England), 23, 2007, 156-161.
[15] Samuel, W., John, S., Leonid, O., Shoahib, K., Parry, H., Katherine, Y.: The potential of the Cell processor for scientific computing. In: 3rd Conference on Computing Frontiers, Ischia, Italy, ACM Press, 2006, 9-20.
[16] Brenner, S. E., Koehl, P., Levitt, M.: The ASTRAL compendium for protein structure and sequence analysis, Nucleic Acids Research, 28, 2000, 254-256.
[17] Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks, Proceedings of the National Academy of Sciences, 89, 1992, 10915-10919.
[18] Dydel, S., Bała, P.: Large Scale Protein Sequence Alignment Using FPGA Reprogrammable Logic Devices. In: Field Programmable Logic and Application. Edited by Becker, J., Platzner, M., Vernalde, S., vol. 3203. Berlin / Heidelberg, Springer, 2004, 23-32.
[19] Oliver, T., Schmidt, B., Maskell, D.: Hyper customized processors for biosequence database scanning on FPGAs. In: ACM/SIGDA 13th International Symposium on Field-Programmable Gate Arrays, Monterey, USA, ACM, 2005, 229-237.
[20] Benkrid, K., Liu, Y., Benkrid, A.: High Performance Biosequence Database Scanning using FPGAs. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2007, Honolulu, USA, IEEE, 2007, pp. I-361-I-364.
[21] Van Court, T., Herbordt, M. C.: Families of FPGA-Based Algorithms for Approximate String Matching. In: Application-Specific Systems, Architectures, and Processors, ASAP'04, Galveston, USA, 2004, 354-364.
[22] Li, I. T., Shum, W., Truong, K.: 160-fold acceleration of the Smith-Waterman algorithm using a field programmable gate array (FPGA). BMC Bioinformatics, 8:185, 2007.
[23] Manavski, S. A., Valle, G.: CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment. BMC Bioinformatics, 9(Suppl 2):S10, 2008.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BUS8-0008-0047