Assignment of tasks to machines under data replication with a tie to Steiner systems

Wojciechowski, Paweł; Kasprzak, Marta

doi:10.61822/amcs-2024-0019

Artykuł - szczegóły

Tytuł artykułu

Assignment of tasks to machines under data replication with a tie to Steiner systems

Autorzy

Wojciechowski Paweł , Kasprzak Marta

Treść / Zawartość

Pełne teksty:

07_wojciechowski_kasprzak_assignment_of_tasks_to_machines_under_dsta_replication_2024_2.pdf

Pobierz

Identyfikatory

DOI

10.61822/amcs-2024-0019

Warianty tytułu

Języki publikacji

Abstrakty

In the paper a problem of assignment of tasks to machines is formulated and solved, where a criterion of data replication is used and a large size of data imposes additional constraints. This problem is met in practice when dealing with large genomic files or other types of vast data. The necessity of comparing all pairs of files within a big set of DNA sequencing results, which we collected, maintained, and analyzed within a national genomic project, brought us to the proposed results. This problem resembles that of generating a particular Steiner system, and a mechanism observed there is employed in one of our algorithms. Based on the problem complexity, we propose two heuristic algorithms, which work very well even for instances with tight constraints and a heterogeneous environment defined. In addition, we propose a simplified method, nevertheless capable of finding very good solutions and surpassing the algorithms in some special cases. The methods are validated in tests on a wide set of instances, where values of parameters reflect our real-world application and where their usefulness is proven.

Słowa kluczowe

task assignment genomic data processing integer linear programming Steiner system heuristic algorithm

przydział zadań dane genomowe programowanie liniowe algorytm heurystyczny

Wydawca

Oficyna Wydawnicza Uniwersytetu Zielonogórskiego

Czasopismo

International Journal of Applied Mathematics and Computer Science

Rocznik

2024

Tom

Vol. 34, no. 2

Strony

263--275

Opis fizyczny

Bibliogr. 16 poz., rys., tab., wykr.

Twórcy

autor

Wojciechowski Paweł

pwojciechowski@cs.put.poznan.pl

Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznań, Poland
Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznań, Poland

autor

Kasprzak Marta

mkasprzak@cs.put.poznan.pl

Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznań, Poland

Bibliografia

[1] Babai, L. and Wilmes, J. (2013). Quasipolynomial-time canonical form for Steiner designs, Proceedings of the 45th Annual ACM Symposium on Theory of Computing (STOC’13), Palo Alto, USA, pp. 261-270.
[2] Berlin, K., Koren, S., Chin, C., Drake, J., Landolin, J. and Phillippy, A. (2015). Assembling large genomes with single-molecule sequencing and locality-sensitive hashing, Nature Biotechnology 33(6): 623-630.
[3] Blazewicz, J., Kasprzak, M., Kierzynka, M., Frohmberg, W., Swiercz, A., Wojciechowski, P. and Zurkowski, P. (2018). Graph algorithms for DNA sequencing - Origins, current models and the future, European Journal of Operational Research 264(3): 799-812.
[4] Bogdanowicz, D. and Giaro, K. (2013). On a matching distance between rooted phylogenetic trees, International Journal of Applied Mathematics and Computer Science 23(3): 669-684, DOI: 10.2478/amcs-2013-0050.
[5] Bose, R. (1939). On the construction of balanced incomplete block designs, Annals of Eugenics 9: 353-399.
[6] Briquet, C., Dalem, X., Jodogne, S. and de Marneffe, P.-A. (2007). Scheduling data-intensive bags of tasks in P2P grids with bittorrent-enabled data distribution, Proceedings of the 2nd Workshop on Use of P2P, GRID and Agents for the Development of Content Networks (UPGRADE’07), Monterey, USA, pp. 39-48.
[7] Gopalakrishnan, S. and Caccamo, M. (2006). Task partitioning with replication upon heterogeneous multiprocessor systems, Proceedings of the 12th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS’06), San Jose, USA, pp. 199-207.
[8] Grannell, M. and Griggs, T. (1994). An introduction to Steiner systems, Mathematical Spectrum 26(3): 74-80.
[9] Jain, C., Rodriguez-R, L., Phillippy, A., Konstantinidis, K. and Aluru, S. (2018). High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries, Nature Communications 9: 5114.
[10] Majdzik, P. (2022). A feasible schedule for parallel assembly tasks in flexible manufacturing systems, International Journal of Applied Mathematics and Computer Science 32(1): 51-63, DOI: 10.34768/amcs-2022-0005.
[11] Nukarapu, D., Tang, B., Wang, L. and Lu, S. (2011). Data replication in data intensive scientific applications with performance guarantee, IEEE Transactions on Parallel and Distributed Systems 22(8): 1299-1306.
[12] Nurk, S., Walenz, B., Rhie, A., Vollger, M., Logsdon, G., Grothe, R., Miga, K., Eichler, E., Phillippy, A. and Koren, S. (2020). HiCanu: Accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads, Genome Research 30(9): 1291-1305.
[13] Ondov, B., Treangen, T., Melsted, P., Mallonee, A., Bergman, N., Koren, S. and Phillippy, A. (2016). Mash: Fast genome and metagenome distance estimation using MinHash, Genome Biology 17: 132.
[14] Povoa, M. and Xavier, E. (2018). Approximation algorithms and heuristics for task scheduling in data-intensive distributed systems, International Transactions in Operational Research 25(5): 1417-1441.
[15] Reid, C. and Rosa, A. (2010). Steiner systems S(2, 4, v)-A survey, The Electronic Journal of Combinatorics #DS18: 1-34.
[16] Wojciechowski, P., Krause, K., Lukasiak, P. and Blazewicz, J. (2021). The correctness of large scale analysis of genomic data, Foundations of Computing and Decision Sciences 46(4): 423-436.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-dce487d3-0a70-4d44-89a3-e7178c1bcc8e