We present parallel tiled optimized McCaskill's partition functions computation code. That CPU and memory intensive dynamic programming task is within computational biology. To optimize code, we use the authorial source-to-source TRACO compiler and compare obtained code performance with that generated with the state-of-the-art PluTo compiler based on the affine transformations framework (ATF). For the considered task, PluTo is able to generate only serial highly cache efficient code without any parallelism. A TRACO tiling and parallelizing strategy uses the transitive closure of a dependence graph to avoid affine function calculation. First, for each loop nest statement, rectangular tiles are formed. Then those tiles are corrected to be valid under lexicographical order if necessary. A correction is carried out by means of applying transitive closure. The validity of tiles guarantees that the inter-tile dependence graph is acyclic. So, a valid schedule for target tiles can be derived and applied to generate parallel tiled code. For this purpose, the ISL scheduler is used. An experimental study carried out on a multi-core computer demonstrates considerable speed-up of generated code for the larger number of threads. Generated parallel tiled code overcomes that generated with the PluTo compiler.
2
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
In this paper, we present the application of three automatic source-to-source compilers to code implementing McCaskill's bioinformatics algorithm. It computes propabilities of various substructures for RNA prediction. McCaskill's algorithm is compute and data intensive and it is within dynamic programming. A corresponding programming code exposes non-uniform dependences that complicates tiling of that code. The corresponding code is represented within the polyhedral model. Its optimization is still a challenging task for optimizing compilers employing multi-threaded loop tiling. To generate optimized code, we used the popular PLuTo compiler that finds and applies affine transformations, the TRACO compiler based on calculating the transitive closure of loop dependence graphs, and the newest polyhedral tool DAPT implementing space-time tiling. An experimental study fulfilled on two multi-core machines: an AMD Epyc with 64 threas and a 2x Intel Xeon Platinum 9242 with 192 threads demonstrates considerable speedup, high locality, and scalability for various problem sizes and the number of threads of generated codes by means of space-time tiling.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.