XCleaner: A new method for clustering XML documents by structure

Brzeziński, D.; Leśniewska, A.; Morzy, T.; Piernik, M.

Artykuł - szczegóły

Tytuł artykułu

XCleaner: A new method for clustering XML documents by structure

Autorzy

Brzeziński D. , Leśniewska A. , Morzy T. , Piernik M.

Treść / Zawartość

Pełne teksty:

httpwww_bg_utp_edu_plartcc2011brzezinski-et-al.pdf

Pobierz

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

With the vastly growing data resources on the Internet, XML is one of the most important standards for document management. Not only does it provide enhancements to document exchange and storage, but it is also helpful in a variety of information retrieval tasks. Document clustering is one of the most interesting research areas that utilize semi-structural nature of XML. In this paper, we put forward a new XML clustering algorithm that relies solely on document structure. We propose the use of maximal frequent subtrees and an operator called Satisf/Violate to divide documents into groups. The algorithm is experimentally evaluated on real and synthetic data sets with promising results.

Słowa kluczowe

XML clustering patterns

Wydawca

Systems Research Institute, Polish Academy of Sciences

Czasopismo

Control and Cybernetics

Rocznik

2011

Tom

Vol. 40, no. 3

Strony

877--891

Opis fizyczny

Bibliogr. 20 poz., il.

Twórcy

autor

Brzeziński D.

autor

Leśniewska A.

autor

Morzy T.

autor

Piernik M.

Institute of Computing Science, Poznań University of Technology Piotrowo 2, 60-965 Poznań, Poland

Bibliografia

Aggarwal, C.C., Ta, N., Wang, J., Feng, J. and Zaki, M.J. (2007) Xproj: a framework for projected structural clustering of xml documents. In: P. Berkhin, R. Caruana and X. Wu, eds., KDD. ACM, 46-55.
Barbosa, D., Mendelzon, A.O., Keenleyside, J. and Lyons, K.A. (2002) Toxgene: a template-based data generator for xml. In: M.J. Franklin, B. Moon and A. Ailamaki, eds., SIGMOD Conference. ACM, 616.
Candillier, L., Tellier, I.and Torre, F. (2005) Transforming xml trees for efficient classification and clustering. In: N. Fuhr, M. Lalmas, S. Malik and G. Kazai, eds., INEX. LNCS 3977, Springer, 469-480.
Chawathe, S.S. (1999) Describing and manipulating xml data. IEEE Data Eng. Bull., 22 (3), 3-9.
Chi, Y., Xia, Y., Yang, Y.and Muntz, R.R. (2005) Mining closed and maximal frequent subtrees from databases of labeled rooted trees. IEEE Trans. Knowl. Data Eng., 17 (2), 190-202.
Costa, G., Manco, G., Ortale, R. and Tagarelli, A. (2004) A tree-based approach to clustering xml documents by structure. In: J.-F. Boulicaut, F. Esposito, F. Giannotti and D. Pedreschi, eds., PKDD. LNCS 3202, Springer, 137-148.
Dalamagas, T., Cheng, T., Winkel, K.-J. and Sellis, T.K. (2004) Clustering xml documents by structure. In: G.A. Vouros and T. Panayiotopoulos, eds., SETN. LNCS 3025, Springer, 112-121.
Doucet, A. andd Ahonen-Myka, H. (2002) Naïve clustering of a large xml document collection. In: N. Fuhr, N. Gövert, G. Kazai and M. Lalmas, eds., INEX Workshop. ERCIM, European Research Consortium for Informatics and Mathematics, 81-87.
Florescu, D. and Kossmann, D. (1999) Storing and querying xml data using an rdmbs. IEEE Data Eng. Bull., 22 (3), 27-34.
Jain, A.K., Murty, M.N. and Flynn, P.J. (1999) Data clustering: A review. ACM Comput. Surv., 31 (3), 264-323.
Johnson, S. (1967) Hierarchical clustering schemes. Psychometrika, 32 (3), 241-254.
Lee, M.-L., Yang, L.H., Hsu, W. and Yang, X. (2002) Xclust: clustering xml schemas for effective integration. In: CIKM. ACM, 292-299.
Lesniewska, A. (2009) Clustering xml documents by structure. In: J. Grundspenkis, M. Kirikova, Y. Manolopoulos and L. Novickis, eds., ADBIS (Workshops). LNCS 5968, Springer, 238-246.
Lian,W., Cheung, D.W.-L., Mamoulis, N. and Yiu, S.-M. (2004) An efficient and scalable algorithm for clustering xml documents by structure. IEEE Trans. Knowl. Data Eng., 16 (1), 82-96.
Milligan, G. and Cooper, M. (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50, 159-179. 10.1007/BF02294245.
Nayak, R. and Iryadi, W. (2006) Xmine: A methodology for mining xml structure. In: X. Zhou, J. Li, H.T. Shen, M. Kitsuregawa and Y. Zhang, eds., APWeb. LNCS 3841, Springer, 786-792.
Tan, P.-N., Steinbach, M. and Kumar, V. (2005) Introduction to Data Mining. Addison Wesley.
Tran, T., Nayak, R. and Bruza, P. (2008) Combining structure and content similarities for xml document clustering. In: J.F. Roddick, J. Li, P. Christen and P.J. Kennedy, eds., AusDM. CRPIT 87, Australian Computer Society, 219-226.
Widom, J. (1999) Data management for xml: Research directions. IEEE Data Eng. Bull., 22 (3), 44-52.
Zaki, M.J. (2002) Efficiently mining frequent trees in a forest. In: KDD. ACM, 71-80.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BATC-0009-0016