PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Privacy Preserving Database Generation for Database Application Testing

Autorzy
Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Testing of database applications is of great importance. Although various studies have been conducted to investigate testing techniques for database design, relatively few efforts have been made to explicitly address the testing of database applications which requires a large amount of representative data available. As testing over live production databases is often infeasible in many situations due to the high risks of disclosure of confidential information or incorrect updating of real data, in this paper we investigate the problem of generating synthetic databases based on a-priori knowledge about production databases. Our approach is to fit the general location model using various characteristics (e.g., constraints, statistics, rules) extracted from a production database and then generate synthetic data using model learned. The generated data is valid and similar to real data in terms of statistical distribution, hence it can be used for functional and performance testing. As characteristics extracted may contain information which may be used by attackers to derive some confidential information about individuals, we present our disclosure analysis method which applies cell suppression technique for identity disclosure and perturbation for value disclosure analysis.
Wydawca
Rocznik
Strony
595--612
Opis fizyczny
bibliogr. 25 poz., tab.
Twórcy
autor
autor
autor
autor
  • Department of Computer Science, University of North Carolina at Charlotte, 9201 University City Blvd.Charlotte, NC 28223, USA, xwu@uncc.edu
Bibliografia
  • [1] Adam, N. R., Wortman, J. C.: Security-control methods for statistical databases, ACM Computing Surveys, 21(4), Dec 1989, 515-556.
  • [2] Agrawal, D., Agrawal, C.: On the design and quantification of privacy preserving data mining algorithms, Proceedings of the 20th Symposiumon Principles of Database Systems, Santa Barbara, California,May 2001.
  • [3] Agrawal, R., Srikant, R.: Privacy-preserving data mining, Proceedings of the ACM SIGMOD International Conference on Management of Data, Dallas, Texas, May 2000.
  • [4] chays, D., Dan, S., Frankl, P., Vokolos, F., Weyuker, E.: A framework for testing database applications, Proceedings of the ISSTA, Portland, Oregon, 2000.
  • [5] Chays, D., Deng, Y., Frankl, P., Dan, S., Vokolos, F., Weyuker, E.: AGENDA: a test generator for relational database applications, Technical report, Polytechnic University, 2002.
  • [6] Dobra, A., Fienberg, S. E.: Bounds for cell entries in contingency tables given marginal totals and decomposable graphs, PNAS, 97(22), 2000, 11885-11892.
  • [7] Dobra, A., Fienberg, S. E.: Bounds for cell entries in contingency tables induced by fixed marginal totals with applications to disclosure limitation, Statistical Journal of the United Nations ECE, 18, 2001, 363-371.
  • [8] Domingo-Ferrer, J.: Current directions in statistical data protection, Proceedings of the Statistical Data Protection, 1998.
  • [9] Fagan, J.: Cell suppression problem formulations- exact solution and heuristics, August 2001.
  • [10] Gray, J., Sundaresan, P., Englert, S., Baclawaski, K., Weinberger, P. J.: Quickly generating billion-record synthetic databases, Proceedings of the ACM SIGMOD International Conference on Management of Data, June 1994.
  • [11] Grotschel, M., Lovasz, L., Schrijver, A.: Geometric algorithms and combinatorial optimization, Springer, New York, 1988.
  • [12] Henrion, D., Tarbouriech, S., Arzelier, D.: LMI approximations for the radius of the intersection of ellipsoids: a survey, Journal of Optimization Theory and Applications, 108(1), 2001, 1-28.
  • [13] Johnson, R., Wichern, D.: Applied Multivariate Statistical Analysis, Prentice Hall, 1998.
  • [14] Leutenegger, S., Dias, D.: A modeling study of the TPC-C Benchmark, Proceedings of the ACM SIGMOD Conference on Management of Data,Washington, D.C., May 1993.
  • [15] Niagara: http://www.cs.wisc.edu/niagara/datagendownload.html.
  • [16] Poess, M., Stephens, J.: Generating thousand benchmark queries in seconds, Proceedings of the 30th VLDB Conference, 2004.
  • [17] Quest: http://www.quest.com/datafactory.
  • [18] Rizvi, S., Haritsa, J.: Privacy preserving association rule mining, Proceedings of the 28th International Conference on Very Large Data Bases, August 2002.
  • [19] Samarati, P.: Protecting respondents' identities in microdata release, IEEE Transaction on Knowledge and Data Engineering, 13(6), 2001, 1010-1027.
  • [20] Schafer, J.: Analysis of Incomplete Multivariate Data, Chapman Hall, 1997.
  • [21] Stephens, J., Poess, M.: MUDD: A multi-dimensional data generator, Proceedings of the 4th International Workshop on Software and Performance, 2004.
  • [22] Wu, X., Sanghvi, C., Wang, Y., Zheng, Y.: Privacy aware data generation for testing database applications, Proc. of the 9th International Database Engineering and Application Symposium, July 2005.
  • [23] Wu, X., Wang, Y., Zheng, Y.: Privacy preserving database application testing, Proceedings of the ACM Workshop on Privacy in Electronic Society, 2003.
  • [24] Wu, X.,Wang, Y., Zheng, Y.: Statistical database modeling for privacy preserving database generation, Proc. of the 15th International Symposium on Methodologies for Intelligent Systems, May 2005.
  • [25] Zheng, Z., Kohavi, R., Mason, L.: Real world performance of association rule algorithms, Proceedings of the 7th ACM-SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, August 2001.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-article-BUS5-0010-0046
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.