Efficiently Mining Frequent Embedded Unordered Trees

Zaki, M. J.

Powiadomienia systemowe

Sesja wygasła!
Sesja wygasła!

Artykuł - szczegóły

Tytuł artykułu

Efficiently Mining Frequent Embedded Unordered Trees

Autorzy

Zaki M. J.

Wybrane pełne teksty z tego czasopisma

https://fi.episciences.org/

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

Mining frequent trees is very useful in domains like bioinformatics, web mining, mining semi-structured data, and so on. In this paper we introduce SLEUTH, an efficient algorithm for mining frequent, unordered, embedded subtrees in a database of labeled trees. The key contributions of our work are as follows: We give the first algorithm that enumerates all embedded, unordered trees. We propose a new equivalence class extension scheme to generate all candidate trees. We extend the notion of scope-list joins to compute frequency of unordered trees. We conduct performance evaluation on several synthetic and real datasets to show that SLEUTH is an efficient algorithm, which has performance comparable to TreeMiner, that mines only ordered trees.

Słowa kluczowe

tree mining embedded trees unordered trees

Wydawca

IOS Press

Czasopismo

Fundamenta Informaticae

Rocznik

2005

Tom

Vol. 66, nr 1,2

Strony

33--52

Opis fizyczny

Bliogr. 26 poz., wykr.

Twórcy

autor

Zaki M. J.

Computer Science Department Rensselaer Polytechnic Institute Troy NY 12180, USA, zaki@cs.rpi.edu

Bibliografia

[1] Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A. I.: Fast Discovery of Association Rules, Advances in Knowledge Discovery and Data Mining (U. Fayyad, et al, Eds.), AAAI Press, Menlo Park, CA, 1996.
[2] Agrawal, R., Srikant, R.: Mining Sequential Patterns, 11th Intl. Conf. on Data Engg., 1995.
[3] Asai, T., Abe, K., Kawasoe, S., Arimura, H., Satamoto, H., Arikawa, S.: Efficient Substructure Discovery from Large Semi-structured Data, 2nd SIAM Int’l Conference on Data Mining, April 2002.
[4] Asai, T., Arimura, H., Uno, T., Nakano, S.: Discovering Frequent Substructures in Large Unordered Trees, 6th Int’l Conf. on Discovery Science, October 2003.
[5] Chi, Y., Yang, Y., Muntz, R. R.: Indexing and Mining Free Trees, 3rd IEEE International Conference on Data Mining, 2003.
[6] Chi, Y., Yang, Y., Muntz, R. R.: Hybrid Tree Miner: An Efficient Algorihtm for Mining Frequent Rooted Trees and Free Trees Using Canonical Forms, 16th International Conference on Scientific and Statistical Database Management, 2004.
[7] Chi, Y., Yang, Y., Xia, Y.,Muntz, R. R.: CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees, 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2004.
[8] Cole, R., Hariharan, R., Indyk, P.: Tree pattern matching and subset matching in deterministic O(nlog3n)_-time, 10th Symposium on Discrete Algorithms, 1999.
[9] Cook, D., Holder, L.: Substructure discovery using minimal description length and background knowledge, Journal of Artificial Intelligence Research, 1, 1994, 231–255.
[10] Dehaspe, L., Toivonen, H., King, R.: Finding frequent substructures in chemical compounds, 4th Intl. Conf. Knowledge Discovery and Data Mining, August 1998.
[11] Huan, J.,Wang,W., Prins, J.: EfficientMining of Frequent Subgraphs in the Presence of Isomorphism, IEEE Int’l Conf. on Data Mining, 2003.
[12] Inokuchi, A., Washio, T., Motoda, H.: An apriori-based algorithm for mining frequent substructures from graph data, 4th European Conference on Principles of Knowledge Discovery and Data Mining, September 2000.
[13] Kilpelainen, P., Mannila, H.: Ordered and unordered tree inclusion, SIAM J. of Computing, 24(2), 1995, 340–356.
[14] Kramer, S., Raedt, L. D., Helma, C.: Molecular Feature Mining in HIV data, Int’l Conf. on Knowledge Discovery and Data Mining, 2001.
[15] Kuramochi,M., Karypis, G.: Frequent Subgraph Discovery, 1st IEEE Int’l Conf. on Data Mining, November 2001.
[16] Morell, V.: Web-Crawling up the Tree of Life, Science, 273(5275), aug 1996, 568–570.
[17] Nijssen, S., Kok, J. N.: Efficient Discovery of Frequent Unordered Trees, 1st Int’l Workshop on Mining Graphs, Trees and Sequences, 2003.
[18] Shamir, R., Tsur, D.: Faster Subtree Isomorphism, Journal of Algorithms, 33, 1999, 267–280.
[19] Termier, A., Rousset, M.-C., Sebag, M.: TreeFinder: a First Step towards XML Data Mining, IEEE Int’l Conf. on Data Mining, 2002.
[20] Wang, K., Liu, H.: Discovering Typical Structures of Documents: A Road Map Approach, ACM SIGIR Conference on Information Retrieval, 1998.
[21] Xiao, Y., Yao, J.-F., Li, Z., Dunham, M. H.: Efficient Data Mining for Maximal Frequent Subtrees, International Conference on Data Mining, 2003.
[22] Yan, X., Han, J.: gSpan: Graph-based substructure pattern mining, IEEE Int’l Conf. on Data Mining, 2002.
[23] Yan, X., Han, J.: CloseGraph: Mining Closed Frequent Graph Patterns, ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, August 2003.
[24] Yoshida, K., Motoda, H.: CLIP: Concept Learning from Inference Patterns, Artificial Intelligence, 75(1), 1995, 63–92.
[25] Zaki, M. J.: Efficiently Mining Frequent Trees in a Forest, 8th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining, July 2002.
[26] Zaki, M. J., Aggarwal, C.: Xrules: An effective structural classifier for XML data, 9th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining, August 2003.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BUS2-0007-0020