Identyfikatory
Warianty tytułu
Języki publikacji
Abstrakty
In the paper a problem of assignment of tasks to machines is formulated and solved, where a criterion of data replication is used and a large size of data imposes additional constraints. This problem is met in practice when dealing with large genomic files or other types of vast data. The necessity of comparing all pairs of files within a big set of DNA sequencing results, which we collected, maintained, and analyzed within a national genomic project, brought us to the proposed results. This problem resembles that of generating a particular Steiner system, and a mechanism observed there is employed in one of our algorithms. Based on the problem complexity, we propose two heuristic algorithms, which work very well even for instances with tight constraints and a heterogeneous environment defined. In addition, we propose a simplified method, nevertheless capable of finding very good solutions and surpassing the algorithms in some special cases. The methods are validated in tests on a wide set of instances, where values of parameters reflect our real-world application and where their usefulness is proven.
Rocznik
Tom
Strony
263--275
Opis fizyczny
Bibliogr. 16 poz., rys., tab., wykr.
Twórcy
autor
- Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznań, Poland
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznań, Poland
autor
- Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznań, Poland
Bibliografia
- [1] Babai, L. and Wilmes, J. (2013). Quasipolynomial-time canonical form for Steiner designs, Proceedings of the 45th Annual ACM Symposium on Theory of Computing (STOC’13), Palo Alto, USA, pp. 261-270.
- [2] Berlin, K., Koren, S., Chin, C., Drake, J., Landolin, J. and Phillippy, A. (2015). Assembling large genomes with single-molecule sequencing and locality-sensitive hashing, Nature Biotechnology 33(6): 623-630.
- [3] Blazewicz, J., Kasprzak, M., Kierzynka, M., Frohmberg, W., Swiercz, A., Wojciechowski, P. and Zurkowski, P. (2018). Graph algorithms for DNA sequencing - Origins, current models and the future, European Journal of Operational Research 264(3): 799-812.
- [4] Bogdanowicz, D. and Giaro, K. (2013). On a matching distance between rooted phylogenetic trees, International Journal of Applied Mathematics and Computer Science 23(3): 669-684, DOI: 10.2478/amcs-2013-0050.
- [5] Bose, R. (1939). On the construction of balanced incomplete block designs, Annals of Eugenics 9: 353-399.
- [6] Briquet, C., Dalem, X., Jodogne, S. and de Marneffe, P.-A. (2007). Scheduling data-intensive bags of tasks in P2P grids with bittorrent-enabled data distribution, Proceedings of the 2nd Workshop on Use of P2P, GRID and Agents for the Development of Content Networks (UPGRADE’07), Monterey, USA, pp. 39-48.
- [7] Gopalakrishnan, S. and Caccamo, M. (2006). Task partitioning with replication upon heterogeneous multiprocessor systems, Proceedings of the 12th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS’06), San Jose, USA, pp. 199-207.
- [8] Grannell, M. and Griggs, T. (1994). An introduction to Steiner systems, Mathematical Spectrum 26(3): 74-80.
- [9] Jain, C., Rodriguez-R, L., Phillippy, A., Konstantinidis, K. and Aluru, S. (2018). High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries, Nature Communications 9: 5114.
- [10] Majdzik, P. (2022). A feasible schedule for parallel assembly tasks in flexible manufacturing systems, International Journal of Applied Mathematics and Computer Science 32(1): 51-63, DOI: 10.34768/amcs-2022-0005.
- [11] Nukarapu, D., Tang, B., Wang, L. and Lu, S. (2011). Data replication in data intensive scientific applications with performance guarantee, IEEE Transactions on Parallel and Distributed Systems 22(8): 1299-1306.
- [12] Nurk, S., Walenz, B., Rhie, A., Vollger, M., Logsdon, G., Grothe, R., Miga, K., Eichler, E., Phillippy, A. and Koren, S. (2020). HiCanu: Accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads, Genome Research 30(9): 1291-1305.
- [13] Ondov, B., Treangen, T., Melsted, P., Mallonee, A., Bergman, N., Koren, S. and Phillippy, A. (2016). Mash: Fast genome and metagenome distance estimation using MinHash, Genome Biology 17: 132.
- [14] Povoa, M. and Xavier, E. (2018). Approximation algorithms and heuristics for task scheduling in data-intensive distributed systems, International Transactions in Operational Research 25(5): 1417-1441.
- [15] Reid, C. and Rosa, A. (2010). Steiner systems S(2, 4, v)-A survey, The Electronic Journal of Combinatorics #DS18: 1-34.
- [16] Wojciechowski, P., Krause, K., Lukasiak, P. and Blazewicz, J. (2021). The correctness of large scale analysis of genomic data, Foundations of Computing and Decision Sciences 46(4): 423-436.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-dce487d3-0a70-4d44-89a3-e7178c1bcc8e