Clustering collections of XML documents having different structure types

Kozielski, M.

Artykuł - szczegóły

Tytuł artykułu

Clustering collections of XML documents having different structure types

Autorzy

Kozielski M.

Identyfikatory

Warianty tytułu

Grupowanie kolekcji dokumentów XML o różnych typach struktury

Języki publikacji

Abstrakty

The paper presents comparison of application of several clustering algorithms and XML structure encoding methods to clustering XML documents having different structure types. Quality of the clustering is evaluated regarding the application of the resulting partitions to acceleration of the selective queries execution on XML collections. The results show that application of multilevel clustering algorithm to analysis of XML documents having complex structure gives the partition of better quality.

Praca przedstawia porównanie zastosowania różnych algorytmów grupowania i kodowania do analizy dokumentów XML o różnym typie struktury. Jakość grupowania jest oceniana względem zastosowania uzyskanego podziału do przyspieszania realizacji zapytań selektywnych na kolekcji dokumentów XML. Otrzymane wyniki pokazują, że zastosowanie metody grupowania wielopoziomowego do analizy dokumentów XML o złożonej strukturze daje podział na grupy o lepszej jakości w porównaniu do tradycyjnych metod grupowania.

Słowa kluczowe

clustering XML documents clustering

grupowanie grupowanie dokumentów XML

Wydawca

Wydawnictwo Politechniki Śląskiej

Czasopismo

Studia Informatica

Rocznik

2009

Tom

Vol. 30, nr 2A

Strony

229--243

Opis fizyczny

Bibliogr. 20 poz.

Twórcy

autor

Kozielski M.

Instytut Informatyki Politechnika Śląska, 44-100 Gliwice, ul. Akademicka 16 tel. (032) 237-21-51, michal.kozielski@polsl.pl

Bibliografia

1. Bairoch A., Apweiler R., Wu C. H., Barker W. C, Boeckmann B., Ferro S., Gasteiger E., Huang H., Lopez R., Magrane M., Martin M. J., Natale D. A., O'Donovan C, Redaschi N., Yeh L. S.: The Universal Protein Resource (UniProt), Nucleic Acids Res. 33: D154-D159, http://www.uniprot.org/database/download.shtml (2005).
2. Bouchon-Meunier B., Rifqi M., Bothorel S.: Towards general measures of comparison of objects, Fuzzy Sets and Systems, 1996, Vol. 84, p. 143-153.
3. Bourret R.: XML and Databases, http://www.rpbourret.com/xml/XMLAndDatabases-.htm, (2005).
4. Bray T., Paoli J., Spcrberg-McQueen C. M., Maler E., Yergeau F. (ed.): Extensible Markup Language (XML) 1.0 (Fourth Edition), W3C Recommendation 16 August 2006, edited in place 29 September 2006, http://www.w3.org/TR/2006/REC-xml-20060816/(20.12.2007).
5. Ceravolo P., Nocerino M. C, Viviani M.: Knowledge Extraction from Semi-structured Data Based on Fuzzy Techniques, Knowledge-Based Intelligent Information and Engineering Systems, Lecture Notes in Computer Science, 2004, Vol. 3215, p. 328-334.
6. Denoyer L., Galliari P.: Dataset used in the experiment, http://xmlmining.lip6.fr (2006).
7. Ester M., Kriegel H. P., Sander J., Xu X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise, Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD'96), 1996, p. 226-231.
8. Flesca S., Manco G, Masciari E., Pontieri L., Pugliese A.: Fast Detection of XML Structural Similarity, IEEE Transactions on Knowledge and Data Engineering, 2004, Vol. 17, No. 2, p. 160-175.
9. Han J., Kamber M.: Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, Academic Press, San Francisco 2001.
10. Hand D., Mannila H, Smyth P.: Principles of Data Mining, WNT, Warszawa, 2005.
11. Jain A. K., Murty M. N., Flynn P. J.: Data Clustering: A review, ACM Computing Surveys, 1999, Vol. 31, No. 3, p. 264-323.
12. Kozielski M.: Multilevel Conditional Fuzzy C-Means Clustering of XML Documents, Lecture Notes in Artificial Intelligence, Springer-Verlag, 2007, Vol. 4702, p. 532-539.
13. Kozielski M.: Application of Different Clustering Algorithms to Multilevel Clustering of XML Documents, TPD 2007 Conference Proceedings, Wydawnictwo Politechniki Poznańskiej, 2007, p. 59-70.
14. Lian W., Cheung D. W., Mamoulis N., Yiu A. M.: An Efficient and Scalable Algorithm for Clustering XML Documents by Structure, IEEE Transactions on Knowledge and Data Engineering, 2004, Vol. 16, No. 1, p. 82-96.
15. Łęski J.: Generalized Weighted Conditional Fuzzy Clustering, IEEE Transactions on Fuzzy Systems, 2003, Vol. 11, No. 6, p. 1-7.
16. Nayak R.: Fast and effective clustering of XML data using structural information, Knowl. Inf. Syst., 2008 , Vol. 14, No. 2, p. 197-215.
17. Nierman A., Jagadish H. V.: Evaluating Structural Similarity in XML Documents, Fifth International Workshop on the Web and Databases (WebDB 2002), 2002.
18. Pedrycz W.: Conditional Fuzzy C-Means, Pattern Recognition Letters, Vol. 17, 1996, p. 625-631.
19. Rocacher D.: On fuzzy bags and their application to flexible querying, Fuzzy Sets and Systems, 2003, Vol. 140, No. 1, p. 93-110.
20. Yoon J. P., Raghavan V., Chakilam V.: Bitmap Indexing-based Clustering and Retrieval of XML Documents, Proceedings of ACM SIGIR Workshop on Mathematical/Formal Methods in Information Retrieval, 2001.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BSL9-0027-0022