Tytuł artykułu
Wybrane pełne teksty z tego czasopisma
Warianty tytułu
Języki publikacji
Focusing on novel database application scenarios, where data sets arise more and more in uncertain and imprecise formats, in this paper we propose a novel decomposition framework for efficiently computing and querying multidimensional OLAP data cubes over probabilistic data, which well-capture previous kind of data. Several models and algorithms supported in our proposed framework are formally presented and described in details, based on well-understood theoretical statistical/ probabilistic tools, which converge to the definition of the so-called probabilistic OLAP data cubes, the most prominent result of our research. Finally, we complete our analytical contribution by introducing an innovative Probability Distribution Function (PDF)-based approach, which makes use of well-known probabilistic estimators theory, for efficiently querying probabilistic OLAP data cubes, along with a comprehensive experimental assessment and analysis over synthetic probabilistic databases.
Opis fizyczny
Bibliogr. 56 poz., tab., wykr.
- ICAR-CNR and University of Calabria, 87036 Rende, Cosenza, Italy
- Department of Informatics and Telecommunications, University of Athens, 15784 Ilisia, Greece
- [1] Agarwal, S., Agrawal, R., Deshpande, P., Gupta, A., Naughton, J. F., Ramakrishnan, R., Sarawagi, S.: On the Computation of Multidimensional Aggregates, Proceedings of VLDB, 1996.
- [2] Agrawal, P., Benjelloun, O., Sarma, A. D., Hayworth, C., Nabar, S. U., Sugihara, T., Widom, J.: Trio: A System for Data, Uncertainty, and Lineage, Proceedings of VLDB, 2006.
- [3] Alfredo Cuzzocrea, F. F., Masciari, E., Saccà, D., Sirangelo, C.: Approximate query answering on sensor network data streams, GeoSensor Networks, 2004, 49.
- [4] Barbará, D., Garcia-Molina, H., Porter, D.: The Management of Probabilistic Data, IEEE Transactions on Knowledge and Data Engineering, 4(5), 1992, 487–502.
- [5] Benjelloun, O., Sarma, A. D., Halevy, A. Y., Theobald, M., Widom, J.: Databases with uncertainty and lineage, VLDB Journal, 17(2), 2008, 243–264.
- [6] Bonifati, A., Cuzzocrea, A.: Efficient Fragmentation of Large XML Documents, Proceedings of DEXA, 2007.
- [7] Bonnet, P., Gehrke, J., Seshadri, P.: Towards Sensor Database Systems, Proceedings of MDM, 2001.
- [8] Burdick, D., Deshpande, P. M., Jayram, T. S., Ramakrishnan, R., Vaithyanathan, S.: Efficient Allocation Algorithms for OLAP Over Imprecise Data, Proceedings of VLDB, 2006.
- [9] Burdick, D., Deshpande, P. M., Jayram, T. S., Ramakrishnan, R., Vaithyanathan, S.: OLAP over uncertain and imprecise data, VLDB Journal, 16(1), 2007, 123–144.
- [10] Burdick, D., Doan, A., Ramakrishnan, R., Vaithyanathan, S.: OLAP over Imprecise Data with Domain Constraints, Proceedings of VLDB, 2007.
- [11] Chen, A. L. P., Chiu, J.-S., Tseng, F. S.-C.: Evaluating Aggregate Operations Over Imprecise Data, IEEE Transactions on Knowledge and Data Engineering, 8(2), 1996, 273–284.
- [12] Cheng, R., Kalashnikov, D. V., Prabhakar, S.: Evaluating Probabilistic Queries over Imprecise Data, Proceedings of SIGMOD, 2003.
- [13] Cheng, R., Singh, S., Prabhakar, S., Shah, R., Vitter, J. S., Xia, Y.: Efficient join processing over uncertain data, Proceedings of CIKM, 2006.
- [14] Colliat, G.: OLAP, Relational, and Multidimensional Database Systems, SIGMOD Record, 25(3), 1996, 64–69.
- [15] Cormode, G., Garofalakis, M. N.: Sketching probabilistic data streams, Proceedings of SIGMOD, 2007.
- [16] Cuzzocrea, A.: Overcoming Limitations of Approximate Query Answering in OLAP, Proceedings of IDEAS, 2005.
- [17] Cuzzocrea, A.: Providing probabilistically-bounded approximate answers to non-holistic aggregate range queries in OLAP, Proceedings of DOLAP, 2005.
- [18] Cuzzocrea, A.: Improving range-sum query evaluation on data cubes via polynomial approximation, Data and Knowledge Engineering, 56(2), 2006, 85–121.
- [19] Cuzzocrea, A.: Retrieving Accurate Estimates to OLAP Queries over Uncertain and Imprecise Multidimensional Data Streams, Proceedings of SSDBM, 2011.
- [20] Cuzzocrea, A.: Approximate OLAP Query Processing over Uncertain and Imprecise Multidimensional Data Streams, Proceedings of DEXA, 2013.
- [21] Cuzzocrea, A., Furfaro, F., Greco, S., Masciari, E., Mazzeo, G. M., Saccà, D.: A Distributed System for Answering Range Queries on Sensor Network Data, Proceedings of PerCom Workshops, 2005.
- [22] Cuzzocrea, A., Mansmann, S.: OLAP Visualization, in: Encyclopedia of Data Warehousing and Mining, 2009, 1439–1446.
- [23] Cuzzocrea, A., Russo, V., Saccá, D.: A Robust Sampling-Based Framework for Privacy Preserving OLAP, Proceedings of DaWaK, 2008.
- [24] Cuzzocrea, A., Saccà, D., Serafino, P.: A Hierarchy-Driven Compression Technique for Advanced OLAP Visualization of Multidimensional Data Cubes, Proceedings of DaWaK, 2006.
- [25] Cuzzocrea, A., Serafino, P.: LCS-Hist: taming massive high-dimensional data cube compression, Proceedings of EDBT, 2009.
- [26] Cuzzocrea, A., Wang, W.: Approximate range-sum query answering on data cubes with probabilistic guarantees, Journal of Intelligent Information Systems, 28(2), 2007, 161–197.
- [27] Dalvi, N. N., Suciu, D.: Efficient query evaluation on probabilistic databases, VLDB Journal, 16(4), 2007, 523–544.
- [28] Dalvi, N. N., Suciu, D.: Management of probabilistic data: foundations and challenges, Proceedings of PODS, 2007.
- [29] Davey, B. A., Priestley, H. A.: Introduction to Lattices and Order (2. ed.), Cambridge University Press, 2002, ISBN 978-0-521-78451-1.
- [30] Deligiannakis, A., Garofalakis, M. N., Roussopoulos, N.: Extended wavelets for multiple measures, ACM Transactions on Database Systems, 32(2), 2007, 10.
- [31] Fink, R., Han, L., Olteanu, D.: Aggregation in Probabilistic Databases via Knowledge Compilation, PVLDB, 5(5), 2012, 490–501.
- [32] Ganti, V., Lee, M.-L., Ramakrishnan, R.: ICICLES: Self-Tuning Samples for Approximate Query Answering, Proceedings of VLDB, 2000.
- [33] Gibbons, P. B., Matias, Y.: New Sampling-Based Summary Statistics for Improving Approximate Query Answers, Proceedings of SIGMOD, 1998.
- [34] Golub, G. H., Loan, C. F., Eds.: Matrix Computation, Johns Hopkins University Press, 1989.
- [35] Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., Pirahesh, H.: Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross-Tab, and Sub Totals, Data Mining and Knowledge Discovery, 1(1), 1997, 29–53.
- [36] Han, J., Kamber, M.: Data Mining: Concepts and Techniques, Morgan Kaufmann, 2000, ISBN 1-55860- 489-8.
- [37] Harinarayan, V., Rajaraman, A., Ullman, J. D.: Implementing Data Cubes Efficiently, Proceedings of SIGMOD, 1996.
- [38] Hellerstein, J. M., Haas, P. J., Wang, H. J.: Online Aggregation, Proceedings of SIGMOD (J. Peckham, Ed.), 1997.
- [39] Ho, C.-T., Agrawal, R., Megiddo, N., Srikant, R.: Range Queries in OLAP Data Cubes, Proceedings of SIGMOD (J. Peckham, Ed.), 1997.
- [40] Hua, M., Pei, J., Zhang, W., Lin, X.: Ranking queries on uncertain data: a probabilistic threshold approach, Proceedings of SIGMOD, 2008.
- [41] Ioannidis, Y. E., Poosala, V.: Histogram-Based Approximation of Set-Valued Query-Answers, Proceedings of VLDB, 1999.
- [42] Jayram, T. S., McGregor, A., Muthukrishnan, S., Vee, E.: Estimating statistical aggregates on probabilistic data streams, ACM Transactions on Database Systems, 33(4), 2008.
- [43] Jin, R., Glimcher, L., Jermaine, C., Agrawal, G.: New Sampling-Based Estimators for OLAP Queries, Proceedings of ICDE, 2006.
- [44] Kimelfeld, B., Sagiv, Y.: Maximally joining probabilistic data, Proceedings of PODS, 2007.
- [45] Lian, X., Chen, L.: Probabilistic ranked queries in uncertain databases, Proceedings of EDBT, 2008.
- [46] McClean, S. I., Scotney, B. W., Shapcott, M.: Aggregation of Imprecise and Uncertain Information in Databases, IEEE Transactions on Knowledge and Data Engineering, 13(6), 2001, 902–912.
- [47] Papoulis, A., Ed.: Probability, Random Variables, and Stochastic Processes, McGraw-Hill, 1984.
- [48] Pei, J., Yuan, Y., Lin, X., Jin, W., Ester, M., Liu, Q., Wang, W., Tao, Y., Yu, J. X., Zhang, Q.: Towards multidimensional subspace skyline analysis, ACM Transactions on Database Systems, 31(4), 2006, 1335– 1381.
- [49] Poosala, V., Ganti, V.: Fast Approximate Query Answering Using Precomputed Statistics, Proceedings of ICDE, 1999.
- [50] Ré, C., Suciu, D.: Approximate lineage for probabilistic databases, PVLDB, 1(1), 2008, 797–808.
- [51] Ross, R. B., Subrahmanian, V. S., Grant, J.: Aggregate operators in probabilistic databases, Journal of the ACM, 52(1), 2005, 54–101.
- [52] Sarma, A. D., Theobald, M., Widom, J.: Exploiting Lineage for Confidence Computation in Uncertain and Probabilistic Databases, Proceedings of ICDE, 2008.
- [53] Soliman, M. A., Ilyas, I. F., Chang, K. C.-C.: Probabilistic top-k and ranking-aggregate queries, ACM Transactions on Database Systems, 33(3), 2008.
- [54] Timko, I., Dyreson, C. E., Pedersen, T. B.: Pre-aggregation with probability distributions, Proceedings of DOLAP, 2006.
- [55] Vassiliadis, P., Sellis, T. K.: A Survey of Logical Models for OLAP Databases, SIGMOD Record, 28(4), 1999, 64–69.
- [56] Yi, K., Li, F., Kollios, G., Srivastava, D.: Efficient Processing of Top-k Queries in Uncertain Databases with x-Relations, IEEE Transactions on Knowledge and Data Engineering, 20(12), 2008, 1669–1682.
Typ dokumentu
Identyfikator YADDA