PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Powiadomienia systemowe
  • Sesja wygasła!
Tytuł artykułu

Parallel cache-efficient code for computing the McCaskill partition functions

Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Konferencja
Federated Conference on Computer Science and Information Systems (14 ; 01-04.09.2019 ; Leipzig, Germany)
Języki publikacji
EN
Abstrakty
EN
We present parallel tiled optimized McCaskill's partition functions computation code. That CPU and memory intensive dynamic programming task is within computational biology. To optimize code, we use the authorial source-to-source TRACO compiler and compare obtained code performance with that generated with the state-of-the-art PluTo compiler based on the affine transformations framework (ATF). For the considered task, PluTo is able to generate only serial highly cache efficient code without any parallelism. A TRACO tiling and parallelizing strategy uses the transitive closure of a dependence graph to avoid affine function calculation. First, for each loop nest statement, rectangular tiles are formed. Then those tiles are corrected to be valid under lexicographical order if necessary. A correction is carried out by means of applying transitive closure. The validity of tiles guarantees that the inter-tile dependence graph is acyclic. So, a valid schedule for target tiles can be derived and applied to generate parallel tiled code. For this purpose, the ISL scheduler is used. An experimental study carried out on a multi-core computer demonstrates considerable speed-up of generated code for the larger number of threads. Generated parallel tiled code overcomes that generated with the PluTo compiler.
Rocznik
Tom
Strony
207--210
Opis fizyczny
Bibliogr. 22 poz., wz., tab., wykr.
Twórcy
  • West Pomeranian University of Technology in Szczecin, ul. Zolnierska 49, 71-210 Szczecin, Poland
  • West Pomeranian University of Technology in Szczecin, ul. Zolnierska 49, 71-210 Szczecin, Poland
Bibliografia
  • 1. U. Bondhugula, V. Bandishti, and I. Pananilath, “Diamond tiling: Tiling techniques to maximize parallelism for stencil computations,” IEEE Transactions on Parallel and Distributed Systems, vol. 28, no. 5, pp. 1285–1298, May 2017. http://dx.doi.org/10.1109/tpds.2016.2615094
  • 2. U. Bondhugula, A. Acharya, and A. Cohen, “The pluto+ algorithm: A practical approach for parallelization and locality optimization of affine loop nests,” ACM Trans. Program. Lang. Syst., vol. 38, no. 3, pp. 12:1–12:32, Apr. 2016. http://dx.doi.org/10.1145/2896389
  • 3. U. Bondhugula et al., “A practical automatic polyhedral parallelizer and locality optimizer,” SIGPLAN Not., vol. 43, no. 6, pp. 101–113, Jun. 2008. http://dx.doi.org/10.1145/1379022.1375595 Http://pluto-compiler.sourceforge.net/.
  • 4. M. Palkowski and W. Bielecki, “Parallel tiled Nussinov RNA folding loop nest generated using both dependence graph transitive closure and loop skewing,” BMC Bioinformatics, vol. 18, no. 1, p. 290, 2017. doi: 10.1186/s12859-017-1707-8
  • 5. D. Wonnacott, T. Jin, and A. Lake, “Automatic tiling of “mostly-tileable” loop nests,” in 5th International Workshop on Polyhedral Compilation Techniques, Amsterdam, 2015.
  • 6. R. T. Mullapudi and U. Bondhugula, “Tiling for dynamic scheduling,” in Proceedings of the 4th International Workshop on Polyhedral Compilation Techniques, Vienna, Austria, Jan. 2014.
  • 7. M. Fekete, I. L. Hofacker, and P. F. Stadler, “Prediction of rna base pairing probabilities on massively parallel computers,” Journal of Computational Biology, vol. 7, no. 1-2, pp. 171–182, 2000. doi: 10.1089/10665270050081441
  • 8. J. Li, S. Ranka, and S. Sahni, “Multicore and GPU algorithms for Nussinov RNA folding,” BMC Bioinformatics, vol. 15, no. 8, p. S1, 2014. http://dx.doi.org/10.1186/1471-2105-15-S8-S1. [Online]. Available: http://dx.doi.org/10.1186/1471-2105-15-S8-S1
  • 9. M. Palkowski and W. Bielecki, “Tuning iteration space slicing based tiled multi-core code implementing Nussinov’s rna folding,” BMC Bioinformatics, vol. 19, no. 1, p. 12, Jan 2018.
  • 10. C. Zhao and S. Sahni, “Cache and energy efficient algorithms for nussinov’s rna folding,” BMC Bioinformatics, vol. 18, no. 15, p. 518, Dec 2017.
  • 11. W.Bielecki and M. Palkowski, “Tiling of arbitrarily nested loops by means of the transitive closure of dependence graphs,” International Journal of Applied Mathematics and Computer Science (AMCS), vol. Vol. 26, no. 4, p. 919939, December 2016. http://dx.doi.org/10.1515/amcs-2016-0065
  • 12. S. Verdoolaege, “Integer set library - manual,” Tech. Rep., 2011. [Online]. Available: www.kotnet.org/~skimo//isl/manual.pdf,
  • 13. W. Bielecki and M. Palkowski, “A parallelizing and optimizing compiler - traco,” 2013. [Online]. Available: http://traco.sourceforge.net
  • 14. W. Pugh and D. Wonnacott, “An exact method for analysis of value-based array data dependences,” in Sixth Annual Workshop on Programming Languages and Compilers for Parallel Computing. Springer-Verlag, 1993.
  • 15. M. Raden, S. M. Ali, O. S. Alkhnbashi, A. Busch, F. Costa, J. A. Davis, F. Eggenhofer, R. Gelhausen, J. Georg, S. Heyne, M. Hiller, K. Kundu, R. Kleinkauf, S. C. Lott, M. M. Mohamed, A. Mattheis, M. Miladi, A. S. Richter, S. Will, J. Wolff, P. R. Wright, and R. Backofen, “Freiburg RNA tools: a central online resource for RNA-focused research and teaching,” Nucleic Acids Research, vol. 46, no. W1, pp. W25–W29, 2018. doi: 10.1093/nar/gky329
  • 16. J. S. McCaskill, “The equilibrium partition function and base pair binding probabilities for rna secondary structure,” Biopolymers, vol. 29, no. 6-7, pp. 1105–1119. http://dx.doi.org/10.1002/bip.360290621. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/bip. 360290621
  • 17. W. Kelly, V. Maslov, W. Pugh, E. Rosser, T. Shpeisman, and D. Wonnacott, New User Interface for Petit and Other Extensions, 1996.
  • 18. W. Bielecki, K. Kraska, and T. Klimek, “Using basis dependence distance vectors in the modified Floyd–Warshall algorithm,” Journal of Combinatorial Optimization, vol. 30, no. 2, pp. 253–275, 2014.
  • 19. P. Feautrier, “Some efficient solutions to the affine scheduling problem: I. one-dimensional time,” Int. J. Parallel Program., vol. 21, no. 5, pp. 313–348, 1992.
  • 20. —, “Some efficient solutions to the affine scheduling problem: II. multidimensional time,” Int. J. Parallel Program., vol. 21, no. 5, pp. 389–420, 1992.
  • 21. M. Wolfe, “Loops skewing: The wavefront method revisited,” International Journal of Parallel Programming, vol. 15, no. 4, pp. 279–293, 1986.
  • 22. OpenMP Architecture Review Board, “OpenMP application program interface version 4.0,” 2012. [Online]. Available: http://www.openmp.org/mp-documents/OpenMP4.0RC1 final.pdf.
Uwagi
1. Track 2: Computer Science & Systems
2. Technical Session: 12th Workshop on Computer Aspects of Numerical Algorithms
3. Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2020).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-2e9af22a-6bd1-4dce-84b6-81fcf39d8e54
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.