Probability Density Functions for Calculating Approximate Aggregates

Gramacki, J.; Gramacki, A.

Artykuł - szczegóły

Tytuł artykułu

Probability Density Functions for Calculating Approximate Aggregates

Autorzy

Gramacki J. , Gramacki A.

Wybrane pełne teksty z tego czasopisma

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

In the paper we show how one can use probability density function (PDF) for calculating approximate aggregates. The aggregates can be obtained very quickly and efficiently and there is no need to look through the large amount of data, as well as creating a sort of materialized aggregates (usually implemented as materialized views). Although the final results are only approximate, the method is extremely fast and can be successively used during initial phase of data exploration. We include simple experimental results which proof effectiveness of the method, especially if PDFs are typical, for example similar to Gaussian normal ones. If the PDFs differ from a normal distribution, one can consider making a proper preliminary transformation of the input variables or estimate PDFs by some nonparametric methods, for example using the so called kernel estimators. The later is used in the paper. To accelerate calculations, one can consider a usage of graphics processing unit (GPU). We point out this approach in the last section of the paper and give some preliminary results which are very promising.

Słowa kluczowe

approximate query processing approximate aggregates probability density function kernel density estimation graphics processing unit (GPU) general processing on GPUs (GPGPU) CUDA platform

Wydawca

Wydawnictwo Politechniki Poznańskiej

Czasopismo

Foundations of Computing and Decision Sciences

Rocznik

2010

Tom

Vol. 35, No. 4

Strony

223--240

Opis fizyczny

Bibliogr. 31

Twórcy

autor

Gramacki J.

autor

Gramacki A.

Poznan University of Technology, Institute of Computer Science, Piotrowo 2, 60-965, Poznań, a.gramacki@iie.uz.zgora.pl

Bibliografia

[1] W. Andrzejewski and R. Wrembel. GPU-PLWAH: GPU-based implementation of the plwah algorithm for compressing bitmaps. In Proceedings of the 3rd National Scientific Conference" Data Processing Technologies (III Krajowa Konferencja Naukowa: Technologie Przetwarzania Danych, KKNTPD ' 10), pages 56-70. WNT, 2010.
[2] W. Andrzejewski and R. Wrembel. GPU-WAH: Applying GPUs to compressing bitmap indexes with word aligned hybrid. In Proceedings of the 21st International Conference (DEXA '10) Part II, volume LNCS 6262, pages 315-329. Springer, 2010.
[3] Chum dataset. http: //www. sgi.com/tech/mlc/db/.
[4] T. Duong, ks: Kernel density estimation and kernel discriminant analysis for multivariate data in R. Journal of Statistical Software, 21(7), 2007.
[5] T. Duong and M. Hazelton. Cross-validation bandwidth matrices for multivariate kernel density estimation. Scandinavian Journal of Statistics, 32:485-506, 2005.
[6] J. Galindo. Handbook of Research on Fuzzy Information Processing in Databases. Hershey, PA, USA, 2008.
[7] N. K. Govindaraju, J. Gray, R. Kumar, and D. Manocha. Gputerasort: High performance graphics coprocessor sorting for large database management. In ACM SIGMOD International Conference on Management of Data, Chicago, United States, June 2006.
[8] N. K. Govindaraju, B. Lloyd, W. Wang, M. Lin, and D. Manocha. Fast computation of database operations using graphics processors. In Proceedings of the 2004 ACM SIGMOD international conference on Management of data (SIGMOD '04), pages 215-226, New York, NY, USA, 2004. ACM Press.
[9] A. Greβ and G. Zachmann. Gpu-abisort: Optimal parallel sorting on stream architectures. In The 20th IEEE International Parallel and Distributed Processing Symposium, page 45, April 2006.
[10] J. Han and M. Kamber. Data Mining: Concepts and Techniques. The Morgan Kaufmann Series in Data Management Systems, 2006.
[11] M. Harris. Optimizing Parallel Reduction in CUDA. http://developer. download.nvidia.com/compute/cuda/l_l/Website/projects/reduction/doc/reduction.pdf.
12] T. Hayfield and J. S. Racine, np: Nonparametric kernel smoothing methods for mixed data types, http://cram.r-project.org/web/packages/np/index. html.
[13] Y. Ioannidis and V. Poosala. Histogram-based approximation of set-valued query answers. In Proceedings of the 25th International Conference on Very Large Data Bases, pages 174-185, 1999.
[14] M. Jarke, M. Lenzerini, Y. Vassiliou, and V. P. Fundamentals of Data Warehouses. Springer-Verlag, 2003.
[15] R. Kimball and R. Margy. The Data Warehouse Toolkit, Second Edition. John Wiley and Sons, Inc., 2002.
[16] Z. Królikowski. Hurtownie danych, logiczne i fizyczne struktury danych (in Polish). Wydawnictwo Politechniki Poznańskiej, Poznań, 2007.
[17] D. Larose. Discovering Statistic. W.H. Freeman, 2009.
[18] T. Lauer, A. Datta, Z. Khadikov, and C. Anselm. Exploring graphics processing units as parallel coprocessors for online aggregation. In Proceedings of the ACM Thirteenth International Workshop On Data Warehousing and OLAP (DOLAP '10), 2010.
[19] M. Lu, B. He, and Q. Luo. Supporting extended precision on graphics processors. In DaMoN '10: Proceedings of the Sixth International Workshop on Data Management on New Hardware, pages 19-26, New York, NY, USA, 2010. ACM.
[20] P. Płaszewski, P. Macioł, and K. Banaś. 3d finite element numerical integration on gpus. In Proceedings of Xth International Conference, ICCS 2010, 2010.
[21] P. Płaszewski, P. Macioł, and K. Banaś. Finite element numerical integration on gpus. In Proceedings of the 8th International Conference on Parallel Processing and Applied Mathematics, volume 6067. Springer, 2010.
[22] The R Project for Statistical Computing, http://www.r-project.org/.
[23] J. Shanmugasundaram, U. Fayyad, and P. Bradley. Compressed data cubes for olap aggregate query approximation on continuous dimensions. In Proceedings of the Fifth A CM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 223-232, 1999.
[24] T. Shimobaba, T. Ito, N. Masuda, Y. Abe, Y. Ichihashi, H. Nakayama, N. Takada, A. Shiraki, and T. Sugie. Numerical calculation library for diffraction integrals using the graphic processing unit: the gpu-based wave optics library. Journal of Optics A: Pure and Applied Optics, 2008.
[25] B. Silverman. Density Estimation For Statistics And Data Analysis. Chapman and Hall / Monographs on Statistics and Applied Probability, London, 1986.
[26] J. Simonoff. Smoothing Methods in Statistics. Springer Series in Statistics, 1996.
[27] C. Sun, D. Agrawal, and A. E. Abbadi. Hardware acceleration for spatial selections and joins. In Proceedings of the 2003 ACM SIGMOD international conference on Management of data (SIGMOD '03), pages 455-466, New York, NY, USA, 2003. ACM Press.
[28] J. Vitter and M. Wang. Approximate computation of multidimensional aggregates of sparse data using wavelets. In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, pages 193-204, 1999.
[29] M. Wand. Kernel Smoothing. Chapman Hall / Monographs on Statistics and Applied Probability, London, 1995.
[30] Wine Quality dataset. http: //archive. ics.uci.edu/ml/datasets/Wine+Quality.
[31] P. Zinterhof and P. Zinterhof jun. Gridication of the solution of highdimensional improper integration and integral equations. Technical report, Salzburg University, 2010.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BPP2-0019-0047