GPU-based massively parallel implementation of metaheuristic algorithms

Nowotniak, R.; Kucharski, J.

Artykuł - szczegóły

Tytuł artykułu

GPU-based massively parallel implementation of metaheuristic algorithms

Autorzy

Nowotniak R. , Kucharski J.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

Warianty tytułu

Implementacja metaheurystyk przeszukiwania w środowisku obliczeń masowo równoległych na kartach graficznych

Języki publikacji

Abstrakty

In this paper, implementation of Quantum-Inspired Genetic Algorithm(QIGA) in massively parallel environment (Graphics Processing Units) has been presented. Contrary to many recent papers concerning parallel implementation of evolutionary algorithms, in this paper a novel approach has been taken. QIGA algorithm has been implemented entirely as a computational kernel. Parallelization of the algorithm has been performed on two levels: In a block of threads, each thread transforms a separate individual or different gene; In each block, separate populations with same or different parameters are evolved. Finally, the computations have been distributed to eight GPU devices, and over 400x speedup has been gained in comparison to sequential implementation of the algorithm in ANSI C on one Intel Core i7 2.93 GHz CPU core. Correctness of the results has been verified in statistical analysis. The presented approach can be applied to experimentation with a broad class of metaheuristics.

W artykule zostały przedstawione szczegóły implementacji kwantowo inspirowanego algorytmu genetycznego (QIGA) w środowisku obliczeń masowo równoległych na procesorach kart graficznych. W odróżnieniu od wielu dotychczasowych opracowań, prezentujących implementacje algorytmów ewolucyjnych w środowiskach obliczeń równoległych, w niniejszym artykule zostało zaproponowane nowatorskie podejście do implementacji algorytmu ewolucyjnego. Zrównoleglenie algorytmu zostało wykonane na dwóch poziomach: poszczególne osobniki w populacji lub poszczególne geny są przetwarzane przez osobne wątki w blokach, a w poszczególnych blokach przeprowadzany jest proces ewolucji populacji o tych samych lub różnych parametrach. Obliczenia zostały rozdzielone na osiem jednostek GPU, co pozwoliło na uzyskanie ponad 400-krotnego przyśpieszenia algorytmu w stosunku do sekwencyjnej implementacji w języku ANSI C na pojedynczym rdzeniu procesora Intel Core i7 2,93 GHz. Poprawność implementacji została zweryfikowana poprzez analizę statystyczną otrzymanych wyników. Zaproponowane podejście pozwala przyśpieszyć badanie dowolnych metaheurystyk przeszukiwania.

Słowa kluczowe

evolutionary computing genetic algorithms parallel algorithms GPGPU

obliczenia ewolucyjne algorytmy genetyczne algorytmy równoległe GPGPU

Wydawca

Wydawnictwa AGH

Czasopismo

Automatyka / Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie

Rocznik

2011

Tom

T. 15, z. 3

Strony

595--611

Opis fizyczny

Bibliogr. 44 poz., rys., wykr., tab.

Twórcy

autor

Nowotniak R.

Computer Engineering Department, Technical University of Lodz, Poland

autor

Kucharski J.

Computer Engineering Department, Technical University of Lodz, Poland

Bibliografia

[1] Banzhaf W., Harding S., Accelerating evolutionary computation with graphics processing units. Proceedings of the llth Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers, 2009.
[2] Bratley P., Fox B.L., ALGORITHM 659: implementing Sobols ąuasirandom seąuence generator. ACM Transactions on Mathematical Software (TOMS), ACM, 14, 1988, 88-100.
[3] Buck I., Foley T., Horn D., Sugerman J., Fatahalian K., Houston M., Hanrahan P., Brook for GPUs: stream computing on graphics hardware. ACM Transactions on Graphics (TOG), 2004.
[4] Cabido R., Montemayor A.S., Pantrigo J.J., High performance memetic algorithm particle filter for multiple object tracking on modern GPUs. Soft Computing-A Fusion of Foundations, Metho-dologies and Applications, Springer, 1-14.
[5] Fernando R. (ed.), GPU Gems: Programming Techniąues, Tips and Tricks for Real-Time Graphics. Pearson Higher Education, 2004.
[6] Fialka O., Cadik M., FFTand convolutionperformance in image filtering on GPU. Tenth International Conference on Information Visualization, 2006. IV 2006.
[7] Fok K.L., Wong T.T., Wong M.L., Evolutionary computing on consumer graphics hardware. In-telligent Systems, IEEE, 22, 2007, 69-78.
[8] Han K.H., Kim J.H., Genetic ąuantum algorithm and its application to combinatorial optimiza-tion problem. Proceedings of the 2000 Congress on Evolutionary Computation, 2000.
[9] Harris M.J., Fast fluid dynamics simulation on the GPU. GPU Gems, Citeseer, 1, 2004, 637-665.
[10] Kriiger F., Maitre O., Jimenez S., Baumes L., Collet P., Speedups between *70 and x 120 for a generic local search (memetic) algorithm on a single GPGPU chip. Applications of Evolutiona-ry Computation, Springer, 2010, 501-511.
[11] Kriiger J., Westermann R., Linear algebra operators for GPU implementation of numerical algorithms. ACM SIGGRAPH 2005 Courses, 2005.
[12] Lee M., Jeon J., Bae J., Jang H. S., Parallel implementation of a financial application on a GPU. Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human, 2009.
[13] Li J., Zhang L., Liu L., A parallel immune algorithm based on fine-grained model with GPU-acceleration. Fourth International Conference on Innovative Computing, Information and Control (ICICIC), 2009.
[14] Li J.M., Wang X.J., He R.S., Chi Z.X.: An efficientfine-grainedparallelgenetic algorithm based on gpu-accelerated. IEEE Computer Society, 2007.
[15] Manavski S., Valle G., CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman seąuence alignment. BMC Bioinformatics, BioMed Central Ltd, 9, 2008, SIO.
[16] Marsaglia G., Xorshift rngs. Journal of Statistical Software, 8, 2003, 1-6.
[17] Matsumoto M., Nishimura T., Mersenne twister: a 623-dimensionally eąuidistributed uniform pseudo-random number generator. ACM Transactions on Modeling and Computer Simulation (TOMACS), ACM, 8, 1998, 3-30.
[18] Moreland K., Angel E., The FFT on a GPU. Proc. ofthe ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware 2003.
[19] Nickolls J., Buck I., Garland M., Skadron K., Scalableparallelprogramming with CUDA. Queue, ACM, 6, 2008, 40-53.
[20] Nvidia: Compute Unified Device Architecture Programming Guide. NVIDIA, Santa Clara, CA, 2007.
[21] Oh K.S., Jung K., GPU implementation of neural networks. Pattern Recognition, Elsevier, 37, 2004, 1311-1314.
[22] Owens J., GPU architecture overview. ACM SIGGRAPH, 2007.
[23] Owens J.D., Luebke D., Govindaraju N., Harris M., Kriiger J., Lefohn A.E., Purcell T.J., A Survey of General-Purpose Computation on Graphics Hardware. Computer Graphics Forum, 2007.
[24] Panneton F., L'ecuyer P., On the xorshift random number generators. ACM Transactions on Modeling and Computer Simulation (TOMACS), ACM, 15, 2005, 346-361.
[25] Pedersen M.E.H., Tuning & Simplifying Heuristical Optimization. University of Southampton, School of Engineering Sciences, 2010.
[26] Preis T., Virnau P., Paul W., Schneider J.J., Acceleratedfluctuation analysis by graphic cards and complex pattern formation in financial markets. New Journal of Physics, IOP Publishing, 11, 2009, 093024.
[27] Preis T., Virnau R, Paul W., Schneider J.J., GPU accelerated Monte Carlo simulation ofthe 2D and 3D Ising model. Journal of Computational Physics, Elsevier, 228, 2009, 4468-4477.
[28] Robilliard D., Marion-Poty V., Fonlupt C, Population parallel GP on the G80 GPU. Genetic Programming, Springer, 2008, 98-109.
[29] Stallman R., Pesch R. H., Shebs S. (ed.), Debugging with GDB: The GNU source-level debugger. Free Software Foundation, 1995.
[30] Stone J.E., Gohara D., Shi G., OpenCL: A parallel programming standard for heterogeneous computing systems. Computing in Science and Engineering, 12, 2010, 66-73.
[31] Tomov S., Dongarra J., Baboulin M., Towards dense linear algebra for hybrid GPU accelerated manycore systems. Parallel Computing, Elsevier, 36, 2010, 232-240.
[32] Tsuchiyama R., Nakamura T., Iizuka T., Asahara A., Miki S., The OpenCL Programming Book. Group, Fixstars Corporation, 2009.
[33] Wong M.L., Parallel multi-objective evolutionary algorithms on graphics processing units. Proc. of the llth Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers, 2009.
[34] Yang J., Wang Y., Chen Y., GPU accelerated molecular dynamics simulation of thermal conductirities. Journal of Computational Physics, Elsevier, 221, 2007, 799-804.
[35] Zeller A., Liitkehaus D., DDD - a free graphical front-end for UNIX debuggers. ACM Sigplan Notices, ACM, 31, 1996, 22-27.
[36] Zhang G., Quantum-inspired evolutionary algorithms: a survey and empirical study. Journal of Heuristics, Springer, 2010, 1-49.
[37] http://developer.download.nvidia.com/compute/cuda/4_0/toolkit/docs/CURAND_Library.pdf [2011-05-31].
[38] http://developer.download.nvidia.com/compute/cuda/4_0/toolkit/docs/CUDA_C_Programming_Guide.pdf [2011-05-31].
[39] http://www.lunarc.lu.se/Documents/nvidia-workshop/files/tutorial/CUDA_C_QuickRef.pdf [2011-05-31].
[40] http://developer.nvidia.com/nvidia-parallel-nsight [2011-05-31].
[41] http://developer.nvidia.com/nvidia-gpu-computing-documentation [2011-05-31].
[42] http://developer.nvidia.com/cuda-toolkit-40 [2011-05-31].
[43] http://www.hpca.uji.es/rCUDA [2011-05-31].
[44] http://www.netfort.gr.jp/~dancer/software/dsh.html.en [2011-05-31].

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-AGH1-0028-0136