PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Similarity-Based Classification in Relational Databases

Autorzy
Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
In this paper, we introduce a method for measuring similarity of objects of a relational database (relational objects, in short). We also propose and investigate an algorithm SC for classification of relational objects. The task of classification is carried out based on similarity of the objects to predefined classes. An object to be classified is assigned to the class to which it is most similar. A similarity of an object to a class is understood as its similarity to a class representative. Severalmethods for computing the class representative are proposed. We test the algorithm on real and artificial databases. We compare results obtained by the algorithm with those obtained by other algorithms known from the literature. We also present our approach in the context of granular computing.
Wydawca
Rocznik
Strony
187--213
Opis fizyczny
Bibliogr. 54 poz., tab., wykr.
Twórcy
autor
  • Department of Computer Science, Białystok University of Technology, Wiejska 45A, 15-351 Białystok, Poland, p.honko@pb.edu.pl
Bibliografia
  • [1] Agrawal, R., Srikant, R.: Fast algorithms for mining association rules, Proc. 20th International Conference on Very Large Data Bases (VLDB '94) (J. B. Bocca, M. Jarke, C. Zaniolo, Eds.), Morgan Kaufmann, San Francisco, 1994.
  • [2] Aha, D., Kibler, D., Albert, M.: Instance-based learning algorithms, Machine Learning, 6, 1991, 37-66.
  • [3] Alfred, R., Kazakov, D.: Discretization numbers for multiple-instances problem in relational database, Proc. 11th East-European Conference on Advances in Databases and Information Systems (ADBIS 2007), Lecture Notes in Artificial Intelligence 4690, Springer, 2007.
  • [4] Blockeel, H., Bruynooghe, M., Džeroski, S., Ramon, J., Struyf, J.: Hierarchical multi-classification, Proc. the ACM SIGKDD 2002 workshop on multi-relational data mining (MRDM 2002) (S. Dzeroski, L. D. Raedt, S. Wrobel, Eds.), University of Alberta, Edmonton, 2002.
  • [5] Blockeel, H., De Raedt, L.: Top-down induction of first-order logical decision trees, Artificial Intelligence, 101(1-2), 1998, 285-297.
  • [6] Bohnebeck, U., Horvath, T.,Wrobel, S.: Term comparisons in first-order similarity measures, ILP '98: Proc. the 8th International Workshop on Inductive Logic Programming (D. Page, Ed.), Lecture Notes in Artificial Intelligence 1446, Springer-Verlag, Berlin, 1998.
  • [7] Carmagnac, F., Héroux, P., Trupin, E.: Distance based strategy for supervised document image classification, International Workshops on Statistical Pattern Recognition : SPR 2004, Lecture Notes in Computer Science 3138, Springer-Verlag, Berlin, 2004.
  • [8] Connoly, T., Begg, C.: Database Systems: A Practical Approach to Design, Implementation, and Management, Fourth edition, Addison-Wesley, 2005.
  • [9] Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods, Cambridge University Press, Cambridge, 2000.
  • [10] Database Document Understanding: http://archive.ics.uci.edu/ml/datasets/Document+Understanding.
  • [11] Database Family: ftp://ftp.cs.utexas.edu/pub/mooney/forte.
  • [12] Database Mutagenesis: http://www.doc.ic.ac.uk/_shm/mutagenesis.html.
  • [13] Debnath, A. K., Lopez de Compadre, R. L., Debnath, G., Shusterman, A. J., Hansch, C.: Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. Correlation with molecular orbital energies and hydrophobicity, Journal of Medicinal Chemistry, 34(2), 1991, 786-797.
  • [14] Demir, B., Ert ürk, S.: Phase correlation based supervised classification of hyperspectral images using multiple class representatives, IEEE International Conference on Geoscience and Remote Sensing Symposium, 2007.
  • [15] Džeroski, S.: Relational data mining applications: An overview, in: [17], 339-364.
  • [16] Džeroski, S.: Multi-relational data mining: An introduction, SIGKDD Explorations Newsletter, 5(1), 2003, 1-16.
  • [17] Džeroski, S., Lavrač, N., Eds.: Relational Data Mining, Springer, Berlin, 2001.
  • [18] Edelstein, H.: Introduction to Data Mining and Knowledge Discovery, Third edition, Two Crows Corporation, Potomac, 1999.
  • [19] Egghe, L., Michel, C.: Strong similarity measures for ordered sets of documents in information retrieval, Information Processing and Management, 38(6), 2002, 823-848.
  • [20] Emde,W., Wettschereck, D.: Relational instance-based learning, Proc. the 13th International Conference on Machine Learning (L. Saitta, Ed.), Morgan Kaufmann, San Francisco, 1996.
  • [21] Esposito, F., Malerba, D., Semeraro, G., Pazzani,M.: A machine learning approach to document understanding, Proc. the 2nd International Workshop on Multistrategy Learning (R. S. Michalski, G. Tecuci, Eds.), 1993.
  • [22] Estruch, V., Ferri, C., Hernandez-Orallo, J., Ramırez-Quintana, M.: Similarity functions for structured data. An application to decision trees, Inteligencia Artificial, Revista Iberoamericana de IA, 10(29), 2006, 109-121.
  • [23] Fayyad U., Piatetsky-Shapiro G., S. P.: From data mining to knowledge discovery in databases, AI Magazine, 17(3), 1996, 37-54.
  • [24] Gärtner, T., Lloyd, J., Flach, P.: Kernels and distances for structured data, Machine Learning, 57(3), 2004, 205-232.
  • [25] Getoor, L., Friedman, N., Koller, D., Pfeffer, A.: Learning probabilistic relational models, in: [17], 307-335.
  • [26] Hońko, P.: Classification of complex structured objects on the base of similarity degrees, RSEISP '07: Proc. the International Conference on Rough Sets and Intelligent Systems Paradigms (M. Kryszkiewicz, J. F. Peters, H. Rybinski, A. Skowron, Eds.), Lecture Notes in Computer Science 4585, Springer-Verlag, Berlin- Heidelberg, 2007.
  • [27] Hońko, P.: Description and classification of complex structured objects by applying similarity measures, International Journal of Approximate Reasoning, 49(3), 2008, 539-554.
  • [28] Japkowicz, N.: Supervised learning with unsupervised output separation, Proc. the IASTED International Conference on Artificial Intelligence and Soft Computing (ASC 2002) (H. Leung, Ed.), ACTA Press, Anaheim, Calgary, Zurich, 2002.
  • [29] Kirsten, M., Wrobel, S., Horvath, T.: Distance based approaches to relational learning and clustering, in: [17], 213-230.
  • [30] Knobbe, A. J.: The Safarii multi-relational data mining environment, Proc. the 19th Belgian-Dutch Conference on Artificial Intelligence (BNAIC 2007) (M. M. Dastani, E. D. de Jong, Eds.), 2007.
  • [31] Knobbe, A. J., Ho, E. K. Y.: Numbers in multi-relational data mining, Proc. 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2005), Lecture Notes in Computer Science 3721, Springer, 2005.
  • [32] Kramer, S., Widmer, G.: Inducing classification and regression trees in first order logic, in: [17], 140-159.
  • [33] Kriegel, H.-P., Schubert, M.: Classification of websites as sets of feature vectors, Proc. International Conference on Databases and Applications (DBA 2004) (M. H. Hamza, Ed.), IASTED/ACTA Press, 2004.
  • [34] Lavrač, N., Džeroski, S.: Inductive Logic Programming: Techniques and Applications, Ellis Horwood, New York, 1994.
  • [35] Lin, T. Y.: Introduction to special issues on data mining and granular computing, International Journal of Approximate Reasoning, 40(1-2), 2005, 1-2.
  • [36] Lin, T. Y., Zadeh, L. A.: Special issue on granular computing and data mining, International Journal of Intelligent Systems, 19(7), 2004, 565-566.
  • [37] Maimon, O., Rokach, L. E.: The Data Mining and Knowledge Discovery Handbook, 2005.
  • [38] Muggleton, S.: Inverse entailment and Progol, New Generation Computing, 13(3-4), 1995, 245-286.
  • [39] Neri, F.: Evolutive modeling of TCP/IP network traffic for intrusion detection, Lecture Notes in Computer Science, 1803, 2000, 214-223.
  • [40] Page, D., Craven, M.: Biological applications of multi-relational data mining, SIGKDD Explorations Newsletter, 5(1), 2003, 69-79.
  • [41] Pedro, D.: Prospects and challenges for multi-relational data mining, SIGKDD Explorations Newsletter, 5(1), 2003, 80-83.
  • [42] Pedrycz,W., Skowron, A., Kreinovich, V. E.: Handbook of Granular Computing, Wiley & Sons, New York, 2008.
  • [43] Perlich, C., Provost, F.: ACORA: Distribution-based aggregation for relational learning from identifier attributes, Technical report ceder working paper ceder-04-04, Stern School of Business, New York University, 2004.
  • [44] Quinlan, J. R., Cameron-Jones, R. M.: FOIL: A midterm report, Proc. the 6th European Conference on Machine Learning (P. Brazdil, Ed.), Lecture Notes in Artificial Intelligence 667, Springer-Verlag, Berlin-Heidelberg, 1993.
  • [45] Ramon, J., Bruynooghe, M.: A polynomial time computable metric between point sets, Acta Informatica, 37(10), 2001, 765-780.
  • [46] Richards, B. L., Mooney, R. J.: Automated refinement of first-order Horn-clause domain theories, Machine Learning, 19(2), 1995, 95-131.
  • [47] Stepaniuk, J.: Rough-Granular Computing in Knowledge Discovery and Data Mining, Studies in Computational Intelligence 152, Springer-Verlag, Berlin-Heidelberg, 2008.
  • [48] Stepaniuk, J., Hońko, P.: Learning first-order rules: A rough set approach, Fundamenta Informaticae, 61(2), 2004, 139-157.
  • [49] Van Assche, A., Vens, C., Blockeel, H., Džeroski, S.: First order random forests: Learning relational classifiers with complex aggregates, Machine Learning, 64(1-3), 2006, 149-182.
  • [50] Woźnica, A., Kalousis, A., Hilario, M.: Distance-based learning over extended relational algebra structures (as late breaking papers), Proc. the 15th International Conference on Inductive Logic Programming (ILP 2005), 2005.
  • [51] Wróblewski, J.: Analyzing relational databases using rough set based methods, Proc. of 8th Information Processing and Management of Uncertainty in Knowledge-Based Systems Conference (IPMU 2000), 1, 2000.
  • [52] Yao, J. T.: Information granulation and granular relationships, Proc. the IEEE Conference on Granular Computing (X. Hu, Q. Liu, A. Skowron, T. Y. Lin, R. R. Yager, B. Zhang, Eds.), IEEE, 2005.
  • [53] Yao, Y. Y.: Granular computing: Basic issues and possible solutions, Proc. the 5th Joint Conference on Information Sciences, 2000.
  • [54] Zadeh, L. A.: Towards a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic, Fuzzy Sets and Systems, 90(2), 1997, 111-127.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-article-BUS8-0010-0068
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.