Nowa wersja platformy, zawierająca wyłącznie zasoby pełnotekstowe, jest już dostępna.
Przejdź na https://bibliotekanauki.pl

PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
2013 | Vol. 128, nr 3 | 255--280
Tytuł artykułu

Towards Scalable and Cost-aware Bioinformatics Workflow Execution in the Cloud : Recent Advances to the Tavaxy Workflow System

Wybrane pełne teksty z tego czasopisma
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Cloud-based scientific workflow systems can play an important role in the development of cost effective bioinformatics analysis applications. So far, most efforts for supporting cloud computing in such workflow systems have focused on simply porting them to the cloud environment. The next due steps are to optimize these systems to exploit the advantages of the cloud computing model, basically in terms of managing resource elasticity and the associated business model. In this paper, we introduce new advancements in designing scalable and cost-effective workflows in the cloud using the Tavaxy workflow system, focusing on genome analysis applications. We provide an overview of the system and describe its key cloud features including the configuration and execution of complete workflows and/or specific sub-workflows in the cloud. Taking real world examples, we demonstrate the key elasticity management features of the system. These features are designed to support two common scenarios: (1) minimizing workflow execution time under budget constraints and (2) minimizing budget spend under workflow deadline constraints. We evaluate the effectiveness of our approach by conducting experiments on the Amazon EC2 cloud with dynamic pricing and variable heterogeneous resource allocation.
Wydawca

Rocznik
Strony
255--280
Opis fizyczny
Bibliogr. 48 poz., rys., tab.
Twórcy
autor
autor
  • Department of Computing, Imperial College London, London, England, m.ghanem@mdx.ac
Bibliografia
  • [1]. Galaxy Published Page: Windshield Splatter, http://main.g2.bx.psu.edu/u/aun1/p/windshield-splatter.
  • [2]. Abouelhoda, M., Issa, S., Ghanem, M.: Tavaxy: integrating Taverna and Galaxy workflows with cloud computing support, BMC Bioinformatics, 13(1), 2012, 77+.
  • [3]. Afgan, E., Coraor, N., Chapman, B., Nekrutenko, A., Taylor, J.: Galaxy CloudMan: delivering cloud compute clusters., BMC bioinformatics, 11 Suppl 12, 2010, S4+.
  • [4]. Angiuoli, S., Matalka, M., Gussman, A., Galens, K., Vangala, M., Riley, D., Arze, C., White, J., White, O., Fricke, W. F.: CloVR: a virtual machine for automated and portable sequence analysis from the desktop using cloud computing, BMC Bioinformatics, 12(1), 2011, 356+.
  • [5]. AWS: Amazon Web Services: http://aws.amazon.com.
  • [6]. Azure, W.: www.microsoft.com/windowsazure.
  • [7]. Bateman, A., Wood, M.: Cloud computing, Bioinformatics, 25, 2009, 1475.
  • [8]. Bradley, J., Brown, C., Carpenter, B., et al.: The omii software distribution, All Hands Meeting, Humana Press, 2006.
  • [9]. Curcin, V., Ghanem, M.: Scientific workflow systems - can one size fit all?, Proceedings of CIBEC, IEEE, 2008.
  • [10]. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters, Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6, 0SDI’04, USENIX Association, 2004.
  • [11]. Deelman, E., Singh, G., Su, M.-H., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, G. B., Good, J., Laity, A., Jacob, J. C., Katz, D.: Pegasus: a Framework for Mapping Complex Scientific Workflows onto Distributed Systems, Scientific Programming, 3, 2005, 219-237.
  • [12]. D.H. Huson, D., A.F., A., Qi, J., Schuster, S.: MEGAN Analysis of Metagenomic Data, Genome Research, 17, 2007, 377-386.
  • [13]. DIAG-Data Intensive Academic Grid: http://diagcomputing.org.
  • [14]. Dudley, L., Butte, A.: In silico research in the era of cloud computing, Nature biotechnology, 28, 2010, 1181-1185.
  • [15]. Elmroth, E., Hernandez, F., Tordsson, J.: Three fundamental dimensions of scientific workflow interoperability: Model of computation, language, and execution environment, Future Generation Computer Systems, 26(2), 2010, 245-256.
  • [16]. Fusaro, V., Patil, P., Gafni, E., Wall, D., Tonellato, P.: Biomedical Cloud Computing With Amazon Web Services, PLoS Computational Biol, 7(8), 2011, e1002147.
  • [17]. Ghanem, M., Curcin, V., Wendel, P., Guo, Y.: Building and using analytical workflows in discovery net, in: Data mining on the Grid, John Wiley and Sons, 2008.
  • [18]. Giardine, B., Riemer, C., Hardison, R., et al.: Galaxy: A platform for interactive large-scale genome analysis, Genome Research, 15(10), 2005, 1451-5.
  • [19]. Gilbert, J., Dupont, C.: Microbial Metagenomics: Beyond the Genome, Annual Review of Marine Science, 3, 2010, 347-371.
  • [20]. Han, R., Ghanem, M., Guo, L., Guo, Y., Osmond, M.: Enabling cost-aware and adaptive elasticity of multitier cloud applications, Future Generation Computer Systems, 2012.
  • [21]. Hull, D., Wolstencroft, K., Stevens, R., Goble, C., et al.: Tavema: a tool for building and running workflows of services, Nucleic Acids Research, 34, 2006, W729-32.
  • [22]. Juve, G., Deelman, E., Berriman, G., Berman, B., Maechling, P.: An Evaluation of the Cost and Performance of Scientific Workflows on Amazon EC2, J. Grid Comput., 10(1), 2012, 5-21.
  • [23]. Juve, G., Deelman, E., Vahi, K., Mehta, G., Berriman, B., Berman, B., Maechling, P.: Data Sharing Options for Scientific Workflows on Amazon EC2, Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, 2010.
  • [24]. Kahn, G., Macqueen, D.: Coroutines and networks of parallel processes, Information Processing 77, North Holland Publishing Company, 1977.
  • [25]. Kosakovsky Pond, S., Wadhawan, S., Chiaromonte, F., Ananda, G., Chung, W., Taylor, J., Nekrutenko, A., Team, T. G.: Windshield splatter analysis with the Galaxy metagenomic pipeline, Genome Research, 19(11), 2009, 2144-2153.
  • [26]. Kuehn, H., Liberzon, A., Reich, M., Mesirov, J.: Using GenePattern for gene expression analysis, Current Protocols in Bioinformatics, 12(7), 2008.
  • [27]. Langmead, B., Hansen, K., Leek, J.: Cloud-scale RNA-sequencing differential expression analysis with Myrna, Genome Biology, 11(8), 2010, R83+.
  • [28]. Langmead, B., Schatz, M., Lin, J., Pop, M., Salzberg, S.: Searching for SNPs with cloud computing, Genome Biology, 10(R134), 2009.
  • [29]. Langmead, B., Trapnell, C., Pop, M., Salzberg, S.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome, Genome Biology, 10(3), 2009, R25+.
  • [30]. Linke, B., Giegerich, R., Goesmann, A.: Conveyor: a workflow engine for bioinformatics analyses, Bioinformatics, 27(7), 2011, 903-911.
  • [31]. Ludascher, B., Altintas, I., Berkley, C., D., H., et al.: Scientific workflow management and the Kepler system., Concurrency and Computation: Practice and Experience, 18(10), 2006, 1039-1065.
  • [32]. Magellan-a cloud for Science: http://magellan.alcf.anl.gov.
  • [33]. Montagnat, J., Isnard, B., Glatard, T., Maheshwari, K., Fornarino, M.: A data-driven workflow language for grids based on array programming principles, Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science, WORKS ’09, 2009.
  • [34]. Oinn, T., Addis, M., Ferris, J., Marvin, D., et al.: Taverna: a tool for the composition and enactment of bioinformatics workflows, Bioinformatics, 20(17), 2004, 3045-54.
  • [35]. Petrosino, J., Highlander, S., Luna, R., Gibbs, R., Versalovic, J.: Metagenomic Pyrosequencing and Microbial Identification, Clinical Chemistry, 55(5), 2009, 856866.
  • [36]. Rackspace: www.rackspace.com.
  • [37]. Reich, M., Liefeld, T., Gould, J., Lerner, J., Tamayo, P., Mesirov, J.: GenePattern 2.0., Nature Genetics, 38, 2006, 500-501.
  • [38]. Rowe, A., Kalaitzopoulos, D., Osmond, M., Ghanem, M., Guo, Y.: The discovery net system for high throughput bioinformatics, Bioinformatics, 19(90001), 2003, 225i-231.
  • [39]. Schatz, M., Langmead, B., Salzberg, S.: Cloud computing and the DNA data race, Nature Biotechnology, 28, 2010, 691-693.
  • [40]. Shah, S., He, D., Sawkins, J., Druce, J., Quon, G., Lett, D., Zheng, G., Xu, T., Ouellette, B.: Pegasys: software for executing and integrating analyses of biological sequences, BMC Bioinformatics, 5(40), 2004.
  • [41]. Shields, M.: Control-versus data-driven workflows, Springer, 2007, 167-173.
  • [42]. Stein, L.: The case for cloud computing in genome informatics, Genome, 11(207), 2010.
  • [43]. Taylor, I., Shields, M., Wang, I., Harrison, A.: Visual Grid Workflow in Triana, J. Grid Computing, 3(3-4), 2005, 153-169.
  • [44]. Taylor, I., Shields, M., Wang, I., Harrison, A.: The Triana Workflow Environment: Architecture and Applications, in: Workflows for e-Science, Springer, 2007, 320-339.
  • [45]. Venter, J., Remington, K., Heidelberg, J., Halpern, A., Rusch, D., Eisen, J., Wu, D., Paulsen, I., Nelson, K., Nelson, W. e. a.: Environmental genome shotgun sequencing of the Sargasso Sea, Science, 17, 2004, 377-386.
  • [46]. Voelkerding, K., Dames, S., Durtschi, J.: Next-generation sequencing: from basic research to diagnostics, Clinical Chemistry, 55(4), 2009, 641-58.
  • [47]. Wall, D., Kudtarkar, P., Fusaro, V., Pivovarov, R., Patil, P., Tonellato, P.: Cloud computing for comparative genomics, BMC Bioinformatics, 11, 2010, 259.
  • [48]. Zhang, Z., Schwartz, S., Wagner, L., Miller, W.: A greedy algorithm for aligning DNA sequences, Journal of Computational biology, 7(1-2), 2000, 203-214.
Typ dokumentu
Bibliografia
Identyfikatory
Identyfikator YADDA
bwmeta1.element.baztech-bf57973e-f822-4d28-be7f-162d2b83cc4e
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.