Frequent Subtree Mining : An Overview

Chi, Y.; Muntz, R. R.; Nijssen, S.; Kok, J.N.

Ten serwis zostanie wyłączony 2025-02-11.

Nowa wersja platformy, zawierająca wyłącznie zasoby pełnotekstowe, jest już dostępna.
Przejdź na https://bibliotekanauki.pl

Artykuł - szczegóły

Czasopismo

Fundamenta Informaticae

2005 | Vol. 66, nr 1,2 | 161--198

Tytuł artykułu

Frequent Subtree Mining : An Overview

Autorzy

Chi, Y. , Muntz, R. R. , Nijssen, S. , Kok, J.N.

Wybrane pełne teksty z tego czasopisma

https://fi.episciences.org/

Warianty tytułu

Języki publikacji

Abstrakty

Mining frequent subtrees from databases of labeled trees is a new research field that has many practical applications in areas such as computer networks, Web mining, bioinformatics, XML document mining, etc. These applications share a requirement for the more expressive power of labeled trees to capture the complex relations among data entities. Although frequent subtree mining is a more difficult task than frequent itemset mining, most existing frequent subtree mining algorithms borrow techniques from the relatively mature association rule mining area. This paper provides an overview of a broad range of tree mining algorithms. We focus on the common theoretical foundations of the current frequent subtree mining algorithms and their relationship with their counterparts in frequent itemset mining. When comparing the algorithms, we categorize them according to their problem definitions and the techniques employed for solving various subtasks of the subtree mining problem. In addition, we also present a thorough performance study for a representative family of algorithms.

Słowa kluczowe

frequent subtree mining canonical representation a priori enumeration tree subtree isomorphism

Wydawca

Czasopismo

Fundamenta Informaticae

Rocznik

2005

Tom

Vol. 66, nr 1,2

Strony

161--198

Opis fizyczny

Bibliogr. 50 poz., rys., tab.

Twórcy

autor

Chi, Y.

Department of Computer Science, University of California, Los Angeles, Los Angeles, CA 90095, USA, ychi@cs.ucla.edu

autor

Muntz, R. R.

Department of Computer Science, University of California, Los Angeles, Los Angeles, CA 90095, USA, muntz@cs.ucla.edu

autor

Nijssen, S.

Leiden Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, 2333 CA, Leiden, The Netherlands, snijssen@liacs.nl

autor

Kok, J.N.

Leiden Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, 2333 CA, Leiden, The Netherlands, joost@liacs.nl

Bibliografia

[1] Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules, Proc. of the 20th Intl. Conf. on Very Large Databases (VLDB’94), September 1994.
[2] Aho, A. V., Hopcroft, J. E., Ullman, J. E.: The Design and Analysis of Computer Algorithms, Addison-Wesley, 1974.
[3] Aldous, J. M., Wilson, R. J.: Graphs and Applications, An Introductory Approach, Springer, 2000.
[4] Asai, T., Abe, K., Kawasoe, S., Arimura, H., Satamoto, H., Arikawa, S.: Efficient Substructure Discovery from Large Semi-Structured Data, 2nd SIAM Int. Conf. on Data Mining, April 2002.
[5] Asai, T., Arimura, H., Uno, T., Nakano, S.: Discovering Frequent Substructures in Large Unordered Trees, The 6th International Conference on Discovery Science, October 2003.
[6] Beyer, T., Hedetniemi, S. M.: Constant Time Generation of Rooted Trees, SIAM J. Computing, 9(4), 1980, 706–711.
[7] Chalmers, R., Almeroth, K.: Modeling the branching characteristics and efficiency gains of global multicast trees, Proceedings of the IEEE INFOCOM’2001, April 2001.
[8] Chalmers, R., Almeroth, K.: On the topology of multicast trees, Technical Report, UCSB, March 2002.
[9] Chi, Y., Xia, Y., Yang, Y., Muntz, R. R.: Mining Closed and Maximal Frequent Subtrees from Databases of Labeled Rooted Trees, IEEE Transactions on Knowledge and Data Engineering, (To appear).
[10] Chi, Y., Yang, Y., Muntz, R. R.: Mining Frequent Rooted Trees and Free Trees Using Canonical Forms, Knowledge and Information Systems, (To appear).
[11] Chi, Y., Yang, Y., Muntz, R. R.: Indexing and Mining Free Trees, Proceedings of the 2003 IEEE International Conference on Data Mining (ICDM’03), November 2003.
[12] Chi, Y., Yang, Y., Muntz, R. R.: HybridTreeMiner: An Efficient Algorithm for Mining Frequent Rooted Trees and Free Trees Using Canonical Forms, The 16th International Conference on Scientific and Statistical Database Management (SSDBM’04), June 2004.
[13] Chi, Y., Yang, Y., Xia, Y., Muntz, R. R.: CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees, The Eighth Pacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD’04),May 2004.
[14] Chung, M. J.: 0(n2,5)_ Time Algorithm for Subgraph Homeomorphism Problem on Trees, Journal of Algorithms, 8, 1987, 106–112.
[15] Cui, J., Kim, J., Maggiorini, D., Boussetta, K., Gerla, M.: Aggregated Multicast–A Comparative Study, Proceedings of IFIP Networking 2002,May 2002.
[16] Garey, M. R., Johnson, D. S.: Computers and Intractability–A Guide to the Theory of NP-Completeness, W. H. Freeman And Company, New York, 1979.
[17] Gusfield, D.: Algorithms on strings, trees, and sequences: computer science and computational biology, Cambridge University Press, 1997.
[18] Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation, 2000 ACM SIGMOD Intl. Conference on Management of Data, May 2000.
[19] Huan, J., Wang, W., Prins, J.: Efficient Mining of Frequent Subgraph in the Presence of Isomorphism, Proc. 2003 Int. Conf. on Data Mining (ICDM’03), 2003.
[20] Inokuchi, A., Washio, T., Motoda, H.: An Apriori-based Algorithm for Mining Frequent Substructures from Graph Data, Proc. of the 4th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD’00), September 2000.
[21] Inokuchi, A., Washio, T., Nishimura, K., Motoda, H.: A Fast Algorithm for Mining Frequent Connected Subgraphs, Technical Report, IBM Research, Tokyo Research Laboratory, 2002.
[22] Kilpelainen, P.: Tree Matching Problems with Applications to Structured Text Databases, Department of Computer Science, University of Helsinki, 1992.
[23] Kudo, T.: FREQT: An implementation of FREQT, 2003, http://chasen.org/ taku/software/freqt/.
[24] Kuramochi, M., Karypis, G.: Frequent Subgraph Discovery, Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM’01), November 2001.
[25] Kuramochi,M., Karypis, G.: An Efficient Algorithm for Discovering Frequent Subgraphs, Technical Report, University of Minesota, 2002.
[26] Luccio, F., Enriquez, A. M., Rieumont, P. O., Pagli, L.: Exact Rooted Subtree Matching in Sublinear Time, Technical Report TR-01-14, Universita Di Pisa, 2001.
[27] Luccio, F., Enriquez, A. M., Rieumont, P. O., Pagli, L.: Bottom-up Subtree Isomorphism for Unordered Labeled Trees, Technical Report TR-04-13, Universita Di Pisa, 2004.
[28] Mannila, H., Toivonen, H., Inkeri Verkamo, A.: Discovery of Frequent Episodes in Event Sequences, Data Mining and Knowledge Discovery, 1(3), 1997.
[29] Matula, D.: Subtree isomorphism in, 0(n5/2) Ann. Discrete Math., 2, 1978, 91–106.
[30] Nakano, S., Uno, T.: Efficient Generation of Rooted Trees, Technical Report NII-2003-005e, National Institute of Informatics, 2003.
[31] Nakano, S., Uno, T.: A Simple Constant Time Enumeration Algorithm for Free Trees, PSJ SIGNotes ALgorithms, number 091–002, 2003.
[32] Nijssen, S., Kok, J. N.: Efficient Discovery of Frequent Unordered Trees, First International Workshop on Mining Graphs, Trees and Sequences, 2003.
[33] Nijssen, S., Kok, J. N.: A Quickstart in Frequent StructureMining Can Make a Difference, Proc. of the 2004 Int. Conf. Knowledge Discovery and Data Mining (SIGKDD’04), August 2004.
[34] Reyner, S. W.: An analysis of a good algorithm for the subtree problem, SIAM J. Computing, 6(4), 1977, 730–732.
[35] Rückert, U., Kramer, S.: Frequent Free Tree Discovery in Graph Data, Special Track on Data Mining, ACM Symposium on Applied Computing (SAC’04), 2004.
[36] Shamir, R., Tsur, D.: Faster Subtree Isomorphism, Journal of Algorithms, 33, 1999, 267–280.
[37] Shasha, D., Wang, J. T. L., Shan, H., Zhang, K.: ATreeGrep: Approximate Searching in Unordered Trees, 14th International Conference on Scientific and Statistical Database Management (SSDBM’02), July 2002.
[38] Shasha, D., Wang, J. T. L., Zhang, S.: Unordered Tree Mining with Applications to Phylogeny, 20th International Conference on Data Engineering, 2004.
[39] Srikant, R., Agrawal, R.: Mining Sequential Patterns: Generalizations and Performance Improvements, Proc. 5th Int. Conf. Extending Database Technology, EDBT’96, 1996.
[40] Termier, A., Rousset, M.-C., Sebag, M.: TreeFinder: a First Step towards XML Data Mining, Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02), 2002.
[41] Valiente, G.: Algorithms on Trees and Graphs, Springer, 2002.
[42] Wang, K., Liu, H.: Discovering Typical Structures of Documents: A Road Map Approach, 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1998.
[43] Wright, R., Richmond, B., Odzlyzko, A., McKay, B.: Constant Time Generation of Free Trees, SIAM J. Computing, 15(4), 1986, 540–548.
[44] Xiao, Y., Yao, J.-F., Li, Z., Dunham, M.: Efficient Data Mining for Maximal Frequent Subtrees, Proceedings of the 2003 IEEE International Conference on Data Mining (ICDM’03), November 2003.
[45] Yan, X., Han, J.: gSpan: Graph-Based Substructure Pattern Mining, Proc. 2002 Int. Conf. on Data Mining (ICDM’02), 2002.
[46] Yan, X., Han, J.: CloseGraph: Mining Closed Frequent Graph Patterns, Proc. 2003 Int. Conf. Knowledge Discovery and Data Mining (SIGKDD’03), 2003.
[47] Yang, L. H., Lee, M. L., Hsu, W., Achary, S.: Mining Frequent Quer Patterns from XML Queries, Eighth International Conference on Database Systems for Advanced Applications (DASFAA ’03), 2003.
[48] Zaki, M. J.: Efficiently Mining Frequent Trees in a Forest, Proc. of the 2002 Int. Conf. Knowledge Discovery and Data Mining (SIGKDD’02), July 2002.
[49] Zaki, M. J.: Fast Vertical Mining Using Diffsets, Proc. of the 2003 Int. Conf. Knowledge Discovery and Data Mining (SIGKDD’03), 2003.
[50] Zaki, M. J., Aggarwal, C. C.: XRules: An Effective Structural Classifier for XML Data, Proc. of the 2003 Int. Conf. Knowledge Discovery and Data Mining (SIGKDD’03), 2003.

Typ dokumentu

Bibliografia

Identyfikatory

Identyfikator YADDA

bwmeta1.element.baztech-article-BUS2-0007-0025