A novel adaptive checkpointing method based on information obtained from workflow structure

Kail, E.; Kacsuk, P.; Kozlovszky, M.

doi:10.7494/csci.2016.17.3.387

Artykuł - szczegóły

Tytuł artykułu

A novel adaptive checkpointing method based on information obtained from workflow structure

Autorzy

Kail E. , Kacsuk P. , Kozlovszky M.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.7494/csci.2016.17.3.387

Warianty tytułu

Języki publikacji

Abstrakty

Scientific workflows are data- and compute-intensive; thus, they may run for days or even weeks on parallel and distributed infrastructures such as grids, supercomputers, and clouds. In these high-performance computing infrastruc- tures, the number of failures that can arise during scientific-workflow enact- ment can be high, so the use of fault-tolerance techniques is unavoidable. The most-frequently used fault-tolerance technique is taking checkpoints from time to time; when failure is detected, the last consistent state is restored. One of the most-critical factors that has great impact on the effectiveness of the checkpointing method is the checkpointing interval. In this work, we propose a Static (Wsb) and an Adaptive (AWsb) Workflow Structure Based checkpoint- ing algorithm. Our results showed that, compared to the optimal checkpointing strategy, the static algorithm may decrease the checkpointing overhead by as much as 33% without affecting the total processing time of workflow execution. The adaptive algorithm may further decrease this overhead while keeping the overall processing time at its necessary minimum.

Słowa kluczowe

scientific workflow checkpoint dynamic execution

Wydawca

Wydawnictwa AGH

Czasopismo

Computer Science

Rocznik

2016

Tom

Vol. 17 (3)

Strony

387--406

Opis fizyczny

Bibliogr. 11 poz., rys., wykr., tab.

Twórcy

autor

Kail E.

kail.eszter@nik.uni-obuda.hu

Obuda University, John von Neumann Faculty of Informatics, 1034 B ́ecsi str. 96/b., Budapest, Hungary

autor

Kacsuk P.

peter.kacsuk@sztaki.mta.hu

University of Westminster, 115 New Cavendish Street, London, United Kingdom
MTA SZTAKI, 1518 Budapest, Hungary

autor

Kozlovszky M.

kozlovszky.miklos@nik.uni-obuda.hu

Obuda University, John von Neumann Faculty of Informatics, Biotech Lab, 1034 B ́ecsi str. 96/b., Budapest, Hungaryy
MTA SZTAKI, 1518 Budapest, Hungary

Bibliografia

[1] Di S., Robert Y., Vivien F., Kondo D., Wang C.L., Cappello F.: Optimization of Cloud Task Processing with Checkpoint-restart Mechanism. Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis , SC ’13, pp. 64:1–64:12, ACM, New York, NY, USA, 2013, http: //doi.acm.org/10.1145/2503210.2503217.
[2] Garg R., Singh A.: Fault Tolerance in Grid Computing: State of the art and open issues. International Journal of Computer Science and Engineering Survey (IJCSES), vol. 2, p. 8897, 2011.
[3] Hwang S., Kesselman C.: Grid workflow: a flexible failure handling framework for the grid. High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on , pp. 126–137, 2003.
[4] Jhawar R., Piuri V., Santambrogio M.: Fault Tolerance Management in Cloud Computing: A System-Level Perspective. IEEE Systems Journal, vol. 7(2), pp. 288–297, 2013.
[5] Kail E., Kacsuk P., Kozlovszky M.: New aspect of investigating fault sensitivity of scientific workflows. Intelligent Engineering Systems (INES), 2015 IEEE 19th International Conference on , pp. 185–188, 2015.
[6] Meroufel B., Belalem G.: Adaptive time-based coordinated checkpointing for cloud computing workflows. Scalable Computing: Practice and Experience, vol. 15, 2014.
[7] Meroufel B., Belalem G.: Policy Driven Initiator in Coordination Checkpointing Strategies. Recent Advances in Telecommunications, Informatics And Educational Technologies, Proceeding of the 5th European Conference of Computer Science, p. 146153, WSEAS, 2014.
[8] Pietri I., Juve G., Deelman E., Sakellariou R.: A Performance Model to Estimate Execution Time of Scientific Workflows on the Cloud. Proceedings of the 9th Workshop on Workflows in Support of Large-Scale Science, WORKS ’14, pp. 11–19, IEEE Press, Piscataway, NJ, USA, 2014, http://dx.doi.org/10. 1109/WORKS.2014.12.
[9] Starlinger J., Cohen-Boulakia S., Khanna S., Davidson S., Leser U.: Layer Decomposition: An Effective Structure-based Approach for Scientific Workflow Similarity. Proc. of the 10th IEEE International Conference in eScience, 2014.
[10] Therasa.S A.L., Sumathi.G, Dalya.S A.: Article: Dynamic Adaptation of Check- points and Rescheduling in Grid Computing. International Journal of Computer Applications, vol. 2(3), pp. 95–99, 2010, published By Foundation of Computer Science.
[11] Young J.W.: A First Order Approximation to the Optimum Checkpoint Interval. Commun. ACM , vol. 17(9), pp. 530–531, 1974, http://doi.acm.org/10.1145/ 361147.361115.

Uwagi

Opracowanie ze środków MNiSW w ramach umowy 812/P-DUN/2016 na działalność upowszechniającą naukę.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-3dc05a0d-2841-4b8e-a9ab-9f27bb7f27fa