Porównanie wydajności i produktywności algorytmu tworzenia drzew decyzyjnych zaimplementowanego w środowiskach SPARK oraz GASPI

Wyrzykowski, R.; Karoń, T.

Artykuł - szczegóły

Tytuł artykułu

Porównanie wydajności i produktywności algorytmu tworzenia drzew decyzyjnych zaimplementowanego w środowiskach SPARK oraz GASPI

Autorzy

Wyrzykowski R. , Karoń T.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

Warianty tytułu

Performance and productivity comparison of algorithm of decision tree generation implemented in SPARK and GASPI environments

Języki publikacji

Abstrakty

W pracy zbadano wydajność i produktywność programistyczną wykorzystania chmur obliczeniowych oraz dwu odmiennych środowisk programistycznych, a mianowicie SPARK i GASPI, do równoległej implementacji algorytmów eksplorujących duże zbiory danych na przykładzie algorytmu ID3 tworzenia drzew decyzyjnych. Implementacje uruchomiono na platformie Google Compute Engine.

In this paper, the performance and programming productivity of cloud computing is explored for two different programming environments (SPARK and GASPI) applied to parallel implementation of big data problems. The ID3 algorithm of decision tree generation is selected as a test case. All the experiments are performed on the Google Compute Engine platform.

Słowa kluczowe

chmura obliczeniowa drzewo decyzyjne tworzenie drzew decyzyjnych algorytm ID3 środowisko GASPI SPARK porównanie Google Compute Engine

Cloud computing big data decision tree generation ID3 algorithm SPARK GASPI environments comparison Google Compute Engine

Wydawca

Warszawska Wyższa Szkoła Informatyki

Czasopismo

Zeszyty Naukowe Warszawskiej Wyższej Szkoły Informatyki

Rocznik

2016

Tom

nr 15

Strony

79--121

Opis fizyczny

Bibliogr. 51 poz., tab., wykr.

Twórcy

autor

Wyrzykowski R.

tkaron@icis.pcz.pl

Politechnika Częstochowska, Instytut Informatyki Teoretycznej i Stosowanej

autor

Karoń T.

tkaron@icis.pcz.pl

Politechnika Częstochowska, Instytut Informatyki Teoretycznej i Stosowanej

Bibliografia

[1] Czech Z., Wprowadzenie do obliczeń równoległych, wyd. pierwsze,Warszawa 2010: Wydawnictwa Naukowe PWN
[2] Dotsenko Y., Mellor-Crummey J., Cantonnet F., El-Ghazawi T., Mohanti A., Yao Y., Chavarría-Miranda D., Coarfa C., An evaluation of global address space languages: co-array fortran and unified parallel C, w: Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, Chicago 2005
[3] Callahan D., Zima H.P., Chamberlain B.L., Parallel programmability and the chapel language, “International Journal of High Performance Computing Applications” 2007, Vol. 21, No. 3
[4] Colella P., Gay D., Graham S., Hilfinger P., Krishnamurthy A., Liblit B., Pike C.M., Semenzato G.L., Aiken A., Titanium: A high-performance Java dialect, ACM 1998 Workshop on Java for High-Performance Network Computing, 1997, http://pages.cs.wisc.edu/~liblit/titanium/titanium.pdf
[5] Herta B., Cunningham D., Grove D., Kambadur P.n, Saraswat V., Shinnar A., M. Takeuchi, M. Vaziri, O. Tardieu, X10 and APGAS at Petascale, w: Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Orlando 2014
[6] Bonachea D., Chen W., Colella P., Datta K., Duell J., Graham S.L., Hargrove P., Hilfinger P., Husbands P., Iancu C., Kamil A., Nishtala R., Su J., Welcome M., Wen T., Yelick K., Productivity and performance using partitioned global address space languages, w: Proceedings of the 2007 International Workshop on Parallel Symbolic Computation, Waterloo 2007
[7] Almasi G., Bikshandi G., Cascaval C., Grove D., Cunningham D., Tardieu O., Peshansky I., Kodali S., Saraswat V., The Asynchronous Partitioned Global Address Space Model, w: Proceedings of The First Workshop on Advances in Message Passing, Toronto 2010
[8] X10: Performance and Productivity at Scale, http://x10-lang.org/document ation/practical-x10-programming/performance-tuning.html
[9] MPI: A Message - Passing Interface Standard version 2.2, http://www.mpiforum.org/docs/mpi22-report.pdf
[10] United States Census Bureau, ACS Public Use Microdata Sample (PUMS) File, http://www2.census.gov/acs2010_5yr/pums/csv_pus.zip
[11] Larose D.T., Odkrywanie wiedzy z danych: wprowadzenie do eksploracji danych, Warszawa 2013: Wydawnictwo Naukowe PWN
[12] Ranka S., Singh V., Alsabti K., An efficient k-means clustering algorithm, w: Proceedings of the 1st Workshop on High Performance Data Mining, Orlando 1998
[13] Modha D.S., Dhillon I.S., A data - clustering algorithm on distributed memory multiprocessors, w: Large-Scale Parallel Data Mining, Berlin Heidelberg 2000: Springer
[14] Fayyad U.M., Bradley P.S., Refining Initial Points for K-means Clustering, w: Proceedings of the Fifteenth International Conference on Machine Learning, San Francisco 1998: ICML
[15] Elkan Ch., Using the triangle inequality to accelerate k-means, w: Twentieth International Conference on Machine Learning, Washington 2003
[16] Moore A., Pelleg D., Accelerating exact k-means algorithms with geometric reasoning, w: Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego 1999
[17] Rebolledo D., Chan E., Campbell R.H., Farivar R., A Parallel Implementation of K-Means Clustering on GPUs, w: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, Las Vegas 2008
[18] Meila M., The uniqueness of a good optimum for k-means, w: Proceedings of the 23rd International Conference on Machine Learning, New York 2006
[19] Tajunisha S., Performance analysis of k-means with different initialization methods for high diemnsional data, “Interantional Journal of Artifical Intelligence & Applications” 2010, Vol. 1, No. 4
[20] Kamber M., Pei J., Han J., Data Mining: Concepts and Techniques, Burlington 2006: Morgan Kaufmann
[21] Dimov S.S., Nguyen C.D., Pham D.T., Selection of K in K-means clustering, Proceedings of the Institution of Mechanical Engineers, Part C, “Journal of Mechanical Engineering Science” 2005, Vol. 219, No. 1
[22] Język programowania X10, http://x10-lang.org/home/introduction.html
[23] X10 Programming Language, http://x10.sourceforge.net/documentation/ languagespec/x10-latest.pdf
[24] Tardieu O., Cunningham D., Herta B., Peshansky I., Saraswat V., Grove D., A performance model for X10 applications: what's going on under the hood?, w: Proceedings of the 2011 ACM SIGPLAN X10 Workshop, San Jose 2011
[25] Otto S., Huss-Lederman S., Walker D., Dongarra J., Snir M., MPI: The Complete Reference, Cambridge 1996: MIT Press
[26] JTC1/SC22/WG21 - The C++ Standards Committee, http://www.open-std. org/JTC1/SC22/WG21/
[27] Williams A., Język C++ i przetwarzanie współbieżne w akcji, Gliwice 2013: Wydawnictwo HELION
[28] C++ Reference, http://www.cplusplus.com/reference/thread/thread/hardware _concurrency/
[29] Kwiatkowski J., Parallel Applications Performance Evaluation Using the Concept of Granularity, LNCS 2014, Vol. 8385
[30] Bisgin H., Parallel clustering algorithms with application to climatology, Informatics Institute, Istanbul 2008: Istambul Technical University
[31] Woodall T.S., Graham R.L., Maccabe A.B., Bridges P.G., Shipman G.M., Infiniband scalability in Open MPI, w: 20th International Parallel and Distributed Processing Symposium, Rhodes Island 2006: IPDPS
[32] Gopalakrishnan S., Hyun-Wook J., Panda D.K., Huang W., Scheduling of MPI-2 one sided operations over InfiniBand, w: Proceedings of the 19th IEEE Int. Parallel and Distributed Processing Symposium, Denver 2005
[33] Jiuxing L., Hyun-Wook J., Panda D.K., Gropp W., Thakur R., Weihang J., High performance MPI-2 one-sided communication over InfiniBand, w: IEEE Int. Symposium on Cluster Computing and the Grid, Chicago 2004
[34] Nieplocha J., Panda D., Tipparaju V., Fast collective operations using shared and remote memory access protocols on clusters, w: Proceedings of the Int. Symp. Parallel and Distributed Processing, Nice 2003
[35] Komputery Dużej Mocy w ACK CYFRONET AGH, https://kdm.cyfronet.pl/ portal/Zeus
[36] Specyfikacja procesora Intel® Xeon® Processor X5650, http://ark.intel.com /pl/products/47922/Intel-Xeon-Processor-X5650-12M-Cache-2_66-GHz-6_4 0-GTs-Intel-QPI#@specification
[37] Infmiband Technology Specification, http://www.ieee802.org/3/
[38] Karpusenko V., Yoo T., Vladimirov A., File I/O on Intel Xeon Phi Coprocessors: RAM disks, VirtIO, NFS and Lustre, http://hgpu.org, 2014
[39] Oprogramowanie - Komputery Dużej Mocy w ACK CYFRONET AGH, https://kdm .cyfronet.pl/portal/Oprogramowanie
[40] GCC 4.8.2 manuals - GNU Project - Free Software Foundation (FSF), https://gcc.gnu.org/onlinedocs/4.8.2/
[41] Apache Ant, http://ant.apache.org/
[42] Open MPI: Version 1.6.5, Open MPI Software, http://www.open-mpi. org/software/ompi/v1.6/
[43] Kuah K., Motion Estimation with Intel® Streaming SIMD Extensions 4 (Intel® SSE4), https://software.intel.com/en-us/articles/motion-estimatio n-with-intel-streaming-simd-extensions-4-intel-sse4/?iid=2121 &wapkw=sse4
[44] Using the GNU Compiler Collection (GCC), Optimize Options, https://gcc. gnu.org/onlinedocs/gcc/Optimize-Options.html
[45] Anderson T.A., Liu H., Glew N., Petersen L., Measuring the Haskell gap, w: Proceedings of the Int. 25th Symposium on Implementation and Application of Functional Languages, Nijmegen 2013
[46] United States Census Bureau, 2006 - 2010 ACSPUMSDATA DICTIONARY, http://www.census.gov/acs/www/Downloads/data_documentation/pums/Dat aDictZPUMS_Data_Dictionary_2006-2010.pdf
[47] Źródła języka X10, http ://x10-lang.org/software/download-x10/release.html
[48] Implementacje biblioteki X10RT, http://x10-lang.org/documentation/practic al-x10-programming/x10rt-implementations.html
[49] Building X10 from source, http://x10-lang.org/X10-development/building-X 10-from-source.html
[50] Li X.S., Bailey D.H., Hida Y., Algorithms for quad-double precision floating point arithmetic, http://www.escholarship.org/uc/item/69q5t2mj
[51] Brezin J., Swart C.B., Halverson Ch.A., Richards J.T., A decade of progress in parallel programming productivity, “Communications of the ACM” 2014, Vol. 57, No. 11

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-dcba30fd-5e55-444f-8a8f-836bf6459a43