Database query optimization with soft constraints

Gryz, J.

Artykuł - szczegóły

Tytuł artykułu

Database query optimization with soft constraints

Autorzy

Gryz J.

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

Relational database systems became the predominant technology for storing, handling, and quering data only after great improvement in the efficiency of query evaluation in such systems. The key factor in this improvement was the introduction and development of a number of query optimization techniques. Query optimizers draw upon many sources of information about the database to optimize queries. Among these sources, they employ integrity constraints in the query rewrite process. These rewrites have been seen to offer tremendous cost improvements for certain types of queries in standard, common workloads and databases. A disadvantage of these techniques though is that the semantic characterizations they require are not always available as integrity constraints associated with the database. Our key objective in this work is to discover regularities in stored data using data mining techniques, and then to extract and use them for the purpose of query optimization. We call such regularities soft constraints. Soft constraints are not meant to protect the integrity of the database as do integrity constraints; but like integrity constraints, they do semantically characterize the database. As certain types of integrity constraints are now used in query optimization, soft constraints can be used in the optimizer in the same way. If there are any usefull characterizations of the database valid with respect to the current state of the database and useful for the optimizer with respect to the workload, but which are not truly integrity constraints (that is, the database designer has no reason to specify these as rules), then these could be expressed as soft constraints.

Relacyjne bazy danych stały się dominująca technologią służącą przechowywaniu i przetwarzaniu danych dopiero wówczas, gdy znacząco usprawniono wykonywanie zapytań w takich systemach. Głównym czynnikiem postępu w tej dziedzinie było wprowadzenie i usprawnienie dużej liczby technik służących optymalizacji zapytań. Jedną z takich technik jest semantyczna optymalizacja zapytań. Polega ona na wykorzystaniu informacji semantycznej dostępnej w postaci ograniczeń integralnościowych. Główną przeszkodą, w drodze do pełnego wykorzystania możliwości zawartych w ograniczeniach w procesie optymalizacji jest problem braku ich specyfikacji w rzeczywistych bazach danych. W niniejszej pracy proponujemy zastosowanie w procesie optymalizacji nowego typu ograniczeń integralnościowych, tak zwanych miękkich ograniczeń integralnościowych. Ograniczeniami miękkimi nazywamy ograniczenia wykryte w rzeczywistych bazach danych przy użyciu technik eksploracji danych. Miękkie ograniczenia integralnościowe podobne są w swej formie do tradycyjnych ograniczeń integralnościowych, ale pełnią inną rolę. Nie specyfikują one formalnie legalnych stanów bazy danych i mogą zostać unieważnione przez kolejne operacje aktualizacji bazy danych. Główną ideą tej pracy jest teza, że tak zdefiniowane miękkie ograniczenia integralnościowe mogą być z powodzeniem wykorzystywane w procesie optymalizacji pytań.

Słowa kluczowe

database systems data mining query optimization

systemy baz danych eksploracja danych ekstrakcja danych wydobywanie danych optymalizacja kwerend

Wydawca

Oficyna Wydawnicza Politechniki Warszawskiej

Czasopismo

Prace Naukowe Politechniki Warszawskiej. Elektronika

Rocznik

2008

Tom

z. 163

Strony

1--111

Opis fizyczny

Bibliogr. 85 poz., tab., rys., wykr.

Twórcy

autor

Gryz J.

Wydział Elektroniki i Techniki Informacyjnych, Politechnika Warszawska

Bibliografia

[1] Swarup Acharya, Phillip B. Gibbons, Viswanath Poosala, and Sridhar Ramaswamy. Join synopses for approximate query answering. In Proceedings SIGMOD, Philadelphia, Pennsylvania, USA, pages 275-286, 1999.
[2] S. Agrawal, S. Chaudhuri, and V.R. Narasayya. Automated selection of materialized views and indexes in sql database. In Proc. of VLDB, pages 496-505, Cairo, Egypt, 2000.
[3] M.J. Atallah and Fredrickson G.N. A note on finding a maximum empty rectangle. Discrete Applied Mathematics, (13):87-91, 1986.
[4] R.G. Bello, K. Dias, A. Downing, J. Feenan Jr., W.D. Norcott, H. Sun, A. Witkowski, and M. Ziauddin. Materialized views in Oracle. In Proceedings of 24th VLDB, pages 659-664, 1998.
[5] Dina Bitton, Jeffrey Millman, and Solveig Torgersen. A feasibility and performance study of dependency inference. In Proceedings of the Fifth ICDE, pages 635-641, Los Angeles, California, USA, 1989. IEEE Computer Society.
[6] S. Ceri, P. Fraternali, S. Paraboschi, and L. Tanca. Automatic generation of production rules for integrity maintenance. TODS, 19(3):367-422, 1994.
[7] S. Ceri and J. Widom. Deriving production rules for constraint maintenance. In Proceedings of the. 16th VLDB, pages 577-589, Brisbane, Australia, 1990.
[8] U. Chakravarthy. J. Grant, and J. Minker. Logic-based approach to semantic query optimization. ACM TODS, 15(2):162-207, June 1990.
[9] S. Chaudhuri, R. Krishnamurthy, S. Potamianos, and K. Shim. Optimizing queries with materialized views. In Proceedings of the 11th ICDE, pages 190-200, Taipei, Taiwan, 1995. IEEE Computer Society.
[10] S. Chaudhuri and K. Shim. Including group-by in query optimization. In Proc. of VLDB, pages 354-366, 1994.
[11] Bernard Chazelle, Robert L. (Scot) Drysdale III, and D.T. Lee. Computing the largest empty rectangle. SIAM J. Comput., 15(1):550-555, 1986.
[12] I-Min. A. Chen and R.C. Lee. An approach to deriving object hierarchies from database schema and contents. In Proceedings of the 6th ISMIS, pages 112-121, Charlotte, NC, 1991.
[13] Q. Cheng, J. Gryz, F. Koo, C. Leung, L. Liu, X. Qian, and B. Schiefer. Implementation of two semantic query optimization techniques in DB2 UDB. In Proc. of the 25th VLDB, pages 687-698, Edinburgh, Scotland, 1999.
[14] W. Chu, R.C. Lee, and Q. Chen. Using type inference and iuduced rules to provide intensional answers. In Proceedings of the 7th ICDE, pages 396-403, Kobe, Japan, 1991.
[15] OLAP Council. APB-1 OLAP Benchmark Release II, November 1998. (www.olapcouncil.org).
[16] S. Dar, M. Franklin, B. Jonsson, D. Srivastava, and M. Tan. Semantic data caching and replacement. In Proceedings of 22nd VLDB, pages 330-341, Bombay, India, 1996. Morgan Kaufmann.
[17] IBM DB2 Universal Database. Administration Guide. 1998. Version 5.2 S10J-8157-01.
[18] J. Edmonds, J. Gryz. D. Liang, and R.J. Miller. Mining for Empty Rectangles in Large Data Sets (Extended Version). Technical Report CSRG-410, Department of Computer Science, University of Toronto, 2000.
[19] J. Edmonds, J. Gryz, D. Liang, and R.J. Miller. Mining for empty rectangles in large data sets. In Proceedings of the 8th ICDT. pages 174-188, London. UK, 2001.
[20] T. Feder and R. Motwani. Clique partitions, graph compressions and speeding-up algorithins. In Procecdings of the ACM Symposium on Theory of Computing, pages 123-133, 1991.
[21] M.R. Garey and D.S. Johnson. Computers and Intractability. W.H. Freeman and Company, New York, 1979.
[22] P. Godfrey, J. Grant, J. Gryz, and J. Minker. Integrity constraints: Semantics and applications. In G. Saake and J. Chomicki, editors, Logics for Databases and Information Systems, pages 265-307. Kluwer Academic, 1998.
[23] P. Godfrey and J. Gryz. A framework for intensional query optimization. In Proccedings of DDLP'96, pages 57-68, Bonn, Germany, September 1996.
[24] P. Godfrey and J. Gryz. Overview of dynamic query evaluation in intensional query optimization. In Proceedings of 5th DOOD, pages 425 426, Montreux, Switzerland, December 1997.
[25] P. Godfrey and J. Gryz. Answering queries by semantic caches. In Proceedigs of 10th DEXA. Florence, Italy, August 1999.
[26] P. Godfrey, J. Gryz, and J. Minker. Semantic query optimization for bottom-up evaluation. In Z. Ras and M. Michalewicz, editors, Proc. of the 9th ISMIS, pages 561-571, Zakopane, Poland, June 1996.
[27] P. Godfrey, J. Gryz, and C. Zuzarte. Exploiting constraint-like data characterizations in query optimization. In Proceedings of Sigmod, pages 582-592, Santa Barbara, CA, 2001.
[28] Parke Godfrey. An Architecture and Implementation for a Cooperative Database System. PhD thesis, University of Maryland at College Park, College Park, Maryland 20742, 1997.
[29] Parke Godfrey and Jarek Gryz. Partial evaluation of views. Journal of Intelligent Information Systems (JIIS), 16(1):21-40. Jan/Feb 2001.
[30] William H. Greene. Econometric Analysis. Prentice Hall, 1999.
[31] J. Gryz, L. Liu, and X. Qian. Semantic query optimization in DB2: Initial results. Technical Report CS-1999-01, Department of Computer Science, York University, Toronto, Canada, 1999.
[32] J. Gryz, B. Schiefer, J. Zheng, and C. Zuzarte. Discovery and application of check constraints in DB2. In Proceedings of ICDE, pages 551-556, Heidelberg, Germany, 2001.
[33] A. Gupta and I.S. Mumick. Maintenance of materialized views: Problems, techniques, and applications. Data Engineering Bulletin. 18(2):3-18. 1995.
[34] L.M. Haas et al. Starburst Mid-Flight: As the Dust Clears. IEEE TKDE, pages 143-160, March 1990.
[35] P. Haas, J. Naughton, S. Seshadri, and L. Stokes. Sampling-based estimation of the number of distinct values of an attribute. In Proceedings of VLDB, pages 311-322, Zurich, Switzerland, 1995.
[36] Alon Y. Halevy. Answering queries using views: A survey. VLDB J., 10(4):270-294, 2001.
[37] M.T. Hammer and S.B. Zdonik. Knowledge-based query processing. Proc. 6th VLDB, pages 137-147, October 1980.
[38] J. Han, Y. Cai, and N. Cercone. Knowledge discovery in databases: An attribute-oriented approach. In Proceedings of the 18th VLDB,. pages 547-559, Vancouver, Canada, 1992.
[39] C.N. Hsu and C.A. Knoblock. Using inductive learning to generate rules for semantic query optimization. In Advances in Knowledge Discovery and Data Mining, pages 425-445. AAAI/MIT Press, 1996.
[40] Y. Huhtala, J. Karkkainen, P. Porkka, and H. Toivonen. Efficient discovery of functional and approximate dependencies using partitions. In Proceedings of 14th CDE, pages 392-401, Orlando, FL, February 1998.
[41] Y.E. Ioannidis. Universality of serial histograms. In Proceedings of 19th VLDB, pages 256-267, Dublin, Ireland, 1993.
[42] Y.E. Ioannidis and S. Christodoulakis. Optimal histograms for limiting worst-case error propagation in the size of the join results. TODS, 18(4):709-748, 1993.
[43] Yannis E. Ioannidis and Viswanath Poosala. Balancing histogram optimality and practicality for query result size estimation. In Proceedings of the SIGMOD, San Jose, California, pages 233-244, 1995.
[44] M. Jarke, J. Clifford, and Y. Vassiliou. An optimizing PROLOG front-end to a relational query system. In SIGMOD. pages 296-306, 1984.
[45] J.J. King. Quist: A system for semantic query optimination in relational databases. In Proc. 7th VLDB, pages 510-517, Cannes, France, September 1981.
[46] L.V.S. Lakshmanan and H.J. Hernandez. Structural query optimization - a uniform framework for semantic query optimization in deductive databases. In Proc. PODS, pages 102-114, 1991.
[47] L.V.S. Lakshmanan and R. Missaoui. Pushing semantics inside recursion: A general framework for semantic optimization of recursive queries. In Proc. ICDE, pages 211 220, 1995.
[48] Ju-Hong Lee, Deok-Hwan Kim, and Chin-Wan Chung. Multi-dimensional selectivity estimation using compressed histogram information. In Proceedings SIGMOD, Philadelphia, Pennsylvania, USA, pages 205-214, 1999.
[49] S. Lee, L.J.Henschen, and G.Z. Qadah. Semantic query reformulation in deductive databases. In Proc. ICDE, pages 232-239, 1991.
[50] A.Y. Levy. A.O. Mendelzon, Y. Sagiv, and D. Srivastava. Answering queries using views. In Proceedings of the 14th PODS, pages 95-104, San Jose, California, 1995. ACM Press.
[51] A.Y. Levy, I. Mumick, and Y. Sagiv. Query optimization by predicate move-around. In Proc. of VLDB, pages 96-108, 1994.
[52] R. Lipton, J. Naughton, and D. Schneider. Practical selectivity estimation through adaptive sampling. In Proc. of Sigmod., pages 40 46, 1990.
[53] B. Liu, K. Wang, L.-F. Mun, and X.-Z. Qi. Using Decision Tree Induction for Discovering Holes in Data. In 5th Pacific Rim International Conference on Artificial Intelligence, pages 182-193, 1998.
[54] Bing Liu, Liang-Ping Ku, and Wynne Hsu. Discovering interesting holes in data. In Proceedings of IJCAI, pages 930-935, Nagoya, Japan, 1997. Morgan Kaufmann.
[55] S. Lopes. J.M. Petit, and L. Lakhal. Efficient discovery of functional dependencies and armstrong relations. In Proceedings of the 6th International Conference on Extending Database Technology, pages 350-364, Konstanz, Germany, 2000. Springer.
[56] H. Mannila and K-J Raiha. Algorithms for inferring functional dependencies from relations. Data and Knowledge Engineering, 12(l):83-89, 1994.
[57] M. Mannino, P. Chu, and T. Sager. Statistical profile estimation in database. ACM Computing Surveys, 20(3):191-221, 1988.
[58] Yossi Matias, Jeffrey Scott Vitter, and Min Wang. Wavelet-based histograms for selectivity estimation. In Proceedings SIGMOD, Seattle, Washington, USA, pages 448-459, 1998.
[59] I. Mumick and H. Pirahesh. Implementation of magic sets in Starburst. In Proc. SIGMOD, 1994.
[60] M. Muralikrishna and David J. DeWitt. Equi-depth histograms for estimating selectivity factors for multi-dimensional queries. In Proceedings of SIGMOD, Chicago, Illinois, pages 28 36, 1988.
[61] A. Namaad, W.L. Hsu, and D.T. Lee. On the maximum empty rectangle problem. Applied Discrete Mathematics, (8):267-277, 1984.
[62] N. Novelli and R. Cicchetti. Fun: An efficient algorithm for mining functional and embedded dependencies. In Proceedings of the 8th ICDT, pages 189-203, London, UK, 2001.
[63] SQL Reference Manual, Oracle 8i, Realease 8.1.5. 500 Oracle Parkway, Redwood City, CA 94065, 1999.
[64] M. Orlowski. A New Algorithm for the Largest Empty Rectangle Problem. Algorithmica, 5(1):65-73, 1990.
[65] G.N. Paulley and Per-Ake Larson. Exploiting uniqueness in query optimization. In Proceedings of the 10th ICDE, pages 68-79, 1994.
[66] H. Pirahesh, J.M. Hellerstein, and W. Hasan. Extensible/rule based query rewrite optimization in Starburst. In Proc. SIGMOD, pages 39-48, 1992.
[67] H. Pirahesh, T.Y.C. Leung, and W. Hasan. A rule engine for query transformation in Starburst and IBM DB2 C/S DBMS. In Proc. ICDE, pages 391-400, 1997.
[68] V. Poosala, Y. Ioannidis, P. Haas, and E. Shekita. Improved histograms for selectivity estimation of range predicates. In Proceedings of SIGMOD, pages 294-305, Montreal, Canada, 1996.
[69] Viswanath Poosala and Yannis E. Ioannidis. Selectivity estimation without the attribute value independence assumption. In VLDB'97, pages 480-495, 1997.
[70] I. Savnik and P. Flach. Bottom-up induction of functional dependencies from relations. In G. Piatetsky-Shapiro, editor, Knowledge Discovery in Databases, pages 284 290. Morgan Kaufman Pub., 1993.
[71] P.G. Selinger, M.M. Astrahan, D.D. Chainberlin, R.A. Lorie and T.G. Price. Access path election in a relational database management system. Proc. ACM-SIGMOD International Conference on Management of Data, pages 23-34, May 1979.
[72] T. Sellis and S. Ghosh. On the multiple-query optimization problem. TKDE, 2(2):262-266, June 1990.
[73] S. Seshadri et al. Cost-based optimization for magic: Algebra and implementation. In SIGMOD, pages 435 446, 1996.
[71] S. Shekar, B. Hamidzadeh, A. Kohli, and M. Coyle. Learning transformation rules for semantic query optimization. TKDE, 5(6):950-964, December 1993.
[75] S. Shekar, J. Srivastava, and S. Dutta. A formal model of trade-off between optimization and execution costs in semantic query optimization. In Proc. 14th VLDB, pages 457-467, Los Angeles, CA, 1988.
[70] S.T. Shenoy and Z.M. Ozsoyoglu. Design and implementation of a semantic query optimizer. IEEE Transactions on Knowledge and Data Engineering, 1(3):344 361, September 1989.
[77] M.D. Siegel. Automatic rule derivation for semantic query optimization. In Proceedings of the 2nd International Conference on Expert Database Systems, pages 371-386, Vienna, Virginia, 1988.
[78] D. Simmen, E. Shekita, and T. Malkems. Fundamental techniques for order optimization. In Proceedings of SIGMOD, pages 57-67, 1996.
[79] D. Srivastava, S. Dar, H.Y. Jagadish, and A. Levy. Answering queries with aggregation using views. In Proceedings of the 22nd VLDB, pages 318-329, Bombay, India, 1996.
[80] Nitin Thaper, Sudipto Guha, Piotr Indyk, and Nick Koudas. Dynamic multidimensional histograms. In Proceedings of SIGMOD, pages 428-439, 2002.
[81] B. Thuraisingham and W. Ford. Security constraint processing in a multilevel secure distributed database management system. TKDE, 7(2):274-293, April 1995.
[82] Transaction Processing Performance Council, 777 No. First Street, Suite 600, San Jose, CA 95112-6311, www.tpc.org TPC Benchmark D, 1.3.1 edition, February 1998.
[83] J. D. Ullman. Principles of Database and Knowledge-Base Systems. Principles of Computer Science Series. Computer Science Press, Rockville, Maryland 20850, 1988.
[84] Clement T. Yu and Wei Sun. Automatic knowledge acquisition and maintenance for semantic query optimization. IEEE Transactions on Knowledge and Data Engineering, 1(3):362-375, September 1989.
[85] D. Zilio, C. Zuzarte, S. Lightstone, W. Ma, G. Lohman, R. Cochrane, H. Pirahesh, L. Colby, J. Gryz, E. Alton, D. Liang, and G. Valentin. Recommending materialized views and indexes with ibm db2 design advisor. In Proceedings of 1st International Conference on Autonomic Computing, pages 180-188, New York, 2004.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-PWA9-0029-0020