Privacy Preserving Database Generation for Database Application Testing

Wu, X.; Wang, Y.; Guo, S.; Zheng, Y.

Artykuł - szczegóły

Tytuł artykułu

Privacy Preserving Database Generation for Database Application Testing

Autorzy

Wu X. , Wang Y. , Guo S. , Zheng Y.

Wybrane pełne teksty z tego czasopisma

https://fi.episciences.org/

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

Testing of database applications is of great importance. Although various studies have been conducted to investigate testing techniques for database design, relatively few efforts have been made to explicitly address the testing of database applications which requires a large amount of representative data available. As testing over live production databases is often infeasible in many situations due to the high risks of disclosure of confidential information or incorrect updating of real data, in this paper we investigate the problem of generating synthetic databases based on a-priori knowledge about production databases. Our approach is to fit the general location model using various characteristics (e.g., constraints, statistics, rules) extracted from a production database and then generate synthetic data using model learned. The generated data is valid and similar to real data in terms of statistical distribution, hence it can be used for functional and performance testing. As characteristics extracted may contain information which may be used by attackers to derive some confidential information about individuals, we present our disclosure analysis method which applies cell suppression technique for identity disclosure and perturbation for value disclosure analysis.

Słowa kluczowe

data generation disclosure analysis statistical database modeling

Wydawca

IOS Press

Czasopismo

Fundamenta Informaticae

Rocznik

2007

Tom

Vol. 78, nr 4

Strony

595--612

Opis fizyczny

bibliogr. 25 poz., tab.

Twórcy

autor

Wu X.

autor

Wang Y.

autor

Guo S.

autor

Zheng Y.

Department of Computer Science, University of North Carolina at Charlotte, 9201 University City Blvd.Charlotte, NC 28223, USA, xwu@uncc.edu

Bibliografia

[1] Adam, N. R., Wortman, J. C.: Security-control methods for statistical databases, ACM Computing Surveys, 21(4), Dec 1989, 515-556.
[2] Agrawal, D., Agrawal, C.: On the design and quantification of privacy preserving data mining algorithms, Proceedings of the 20th Symposiumon Principles of Database Systems, Santa Barbara, California,May 2001.
[3] Agrawal, R., Srikant, R.: Privacy-preserving data mining, Proceedings of the ACM SIGMOD International Conference on Management of Data, Dallas, Texas, May 2000.
[4] chays, D., Dan, S., Frankl, P., Vokolos, F., Weyuker, E.: A framework for testing database applications, Proceedings of the ISSTA, Portland, Oregon, 2000.
[5] Chays, D., Deng, Y., Frankl, P., Dan, S., Vokolos, F., Weyuker, E.: AGENDA: a test generator for relational database applications, Technical report, Polytechnic University, 2002.
[6] Dobra, A., Fienberg, S. E.: Bounds for cell entries in contingency tables given marginal totals and decomposable graphs, PNAS, 97(22), 2000, 11885-11892.
[7] Dobra, A., Fienberg, S. E.: Bounds for cell entries in contingency tables induced by fixed marginal totals with applications to disclosure limitation, Statistical Journal of the United Nations ECE, 18, 2001, 363-371.
[8] Domingo-Ferrer, J.: Current directions in statistical data protection, Proceedings of the Statistical Data Protection, 1998.
[9] Fagan, J.: Cell suppression problem formulations- exact solution and heuristics, August 2001.
[10] Gray, J., Sundaresan, P., Englert, S., Baclawaski, K., Weinberger, P. J.: Quickly generating billion-record synthetic databases, Proceedings of the ACM SIGMOD International Conference on Management of Data, June 1994.
[11] Grotschel, M., Lovasz, L., Schrijver, A.: Geometric algorithms and combinatorial optimization, Springer, New York, 1988.
[12] Henrion, D., Tarbouriech, S., Arzelier, D.: LMI approximations for the radius of the intersection of ellipsoids: a survey, Journal of Optimization Theory and Applications, 108(1), 2001, 1-28.
[13] Johnson, R., Wichern, D.: Applied Multivariate Statistical Analysis, Prentice Hall, 1998.
[14] Leutenegger, S., Dias, D.: A modeling study of the TPC-C Benchmark, Proceedings of the ACM SIGMOD Conference on Management of Data,Washington, D.C., May 1993.
[15] Niagara: http://www.cs.wisc.edu/niagara/datagendownload.html.
[16] Poess, M., Stephens, J.: Generating thousand benchmark queries in seconds, Proceedings of the 30th VLDB Conference, 2004.
[17] Quest: http://www.quest.com/datafactory.
[18] Rizvi, S., Haritsa, J.: Privacy preserving association rule mining, Proceedings of the 28th International Conference on Very Large Data Bases, August 2002.
[19] Samarati, P.: Protecting respondents' identities in microdata release, IEEE Transaction on Knowledge and Data Engineering, 13(6), 2001, 1010-1027.
[20] Schafer, J.: Analysis of Incomplete Multivariate Data, Chapman Hall, 1997.
[21] Stephens, J., Poess, M.: MUDD: A multi-dimensional data generator, Proceedings of the 4th International Workshop on Software and Performance, 2004.
[22] Wu, X., Sanghvi, C., Wang, Y., Zheng, Y.: Privacy aware data generation for testing database applications, Proc. of the 9th International Database Engineering and Application Symposium, July 2005.
[23] Wu, X., Wang, Y., Zheng, Y.: Privacy preserving database application testing, Proceedings of the ACM Workshop on Privacy in Electronic Society, 2003.
[24] Wu, X.,Wang, Y., Zheng, Y.: Statistical database modeling for privacy preserving database generation, Proc. of the 15th International Symposium on Methodologies for Intelligent Systems, May 2005.
[25] Zheng, Z., Kohavi, R., Mason, L.: Real world performance of association rule algorithms, Proceedings of the 7th ACM-SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, August 2001.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BUS5-0010-0046