Czasopismo
2009
|
Vol. 97, nr 1/2
|
215-234
Tytuł artykułu
Autorzy
Wybrane pełne teksty z tego czasopisma
Warianty tytułu
Języki publikacji
Abstrakty
We have made a case here for utilizing tensor framework for hypertext mining. Tensor is a generalization of vector and tensor framework discussed here is a generalization of vector space model which is widely used in the information retrieval and web mining literature. Most hypertext documents have an inherent internal tag structure and external link structure that render the desirable use of multidimensional representations such as those offered by tensor objects. We have focused on the advantages of Tensor Space Model, in which documents are represented using sixth-order tensors. We have exploited the local-structure and neighborhood recommendation encapsulated by the proposed representation. We have defined a similarity measure for tensor objects corresponding to hypertext documents, and evaluated the proposed measure for mining tasks. The superior performance of the proposed methodology for clustering and classification tasks of hypertext documents have been demonstrated here. The experiment using different types of similarity measure in the different components of hypertext documents provides the main advantage of the proposed model. It has been shown theoretically that, the computational complexity of an algorithm performing on tensor framework using tensor similarity measure as distance is at most the computational complexity of the same algorithmperforming on vector space model using vector similarity measure as distance.
Słowa kluczowe
Czasopismo
Rocznik
Tom
Strony
215-234
Opis fizyczny
Bibliogr. 33 poz., tab.
Twórcy
autor
autor
autor
- Center for Soft Computing Research, Indian Statistical Institute, India, ssaha_r@isical.ac.in
Bibliografia
- [1] Ralitsa Angelova and GerhardWeikum. Graph-based text classification: learn from your neighbors. In SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 485-492, New York, NY, USA, 2006. ACM.
- [2] Steffen Bickel and Tobias Scheffer. Multi-view clustering. In ICDM '04: Proceedings of the Fourth IEEE International Conference on Data Mining, pages 19-26, Washington, DC, USA, 2004. IEEE Computer Society.
- [3] A. I. Borisenko and I. E. Tarapov. Vector and Tensor Analysis with Applications. Dover Publications, 1979.
- [4] Sergey Brin and Lawrence Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1-7):107-117, 1998.
- [5] Deng Cai, Xiaofei He, , and Jiawei Han. Beyond streams and graphs: Dynamic tensor analysis. In International Conference on Knowledge Discovery and Data Mining (SIGKDD'06), pages 374 - 383, New York, NY, USA, 2006. ACM.
- [6] Deng Cai, Xiaofei He, , and Jiawei Han. Tensor space model for document analysis. In Proceedings of ACM SIGIR06 conference, pages 625 - 626, New York, NY, USA, 2006. ACM.
- [7] Soumen Chakrabarti, Byron Dom, and Piotr Indyk. Enhanced hypertext categorization using hyperlinks. In SIGMOD '98: Proceedings of the 1998 ACM SIGMOD international conference on Management of data, pages 307-318, New York, NY, USA, 1998. ACM.
- [8] Soumen Chakrabarti, Martin van den Berg, and Byron Dom. Focused crawling: a new approach to topicspecific Web resource discovery. Computer Networks (Amsterdam, Netherlands: 1999), 31(11-16):1623-1640, 1999.
- [9] W. Cohen. Improving a page classifier with anchor extraction and link analysis, 2002.
- [10] Zifeng Cui, Baowen Xu, Weifeng Zhang, and Junling Xu. Web documents clustering with interest links. In SOSE '05: Proceedings of the IEEE International Workshop, pages 119-124,Washington, DC, USA, 2005. IEEE Computer Society.
- [11] J. Furnkranz. Web mining. The DataMining and Knowledge Discovery Handbook, pages 899- 920. Springer, 2005.
- [12] X. He, H. Zha, C. Ding, and H. Simon. Web document clustering using hyperlink structures, 2001.
- [13] A. Hotho, S. Staab, and G. Stumme. Explaining text clustering results using semantic structures, 2003.
- [14] Jingyu Hou and Yanchun Zhang. Utilizing hyperlink transitivity to improve web page clustering. In ADC '03: Proceedings of the 14th Australasian database conference, pages 49-57, Darlinghurst, Australia, Australia, 2003. Australian Computer Society, Inc.
- [15] Min-Yen Kan and Hoang Oanh Nguyen Thi. Fast web page classification using url features. In CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management, pages 325-326, New York, NY, USA, 2005. ACM.
- [16] Tamara G. Kolda, Brett W. Bader, and Joseph P. Kenny. Higher-order web link analysis using multilinear algebra. In International Conference on Data Mining. IEEE press, 2005.
- [17] Ning Liu, Benyu Zhang, Jun Yan, Zheng Chen, Wenyin Liu, Fengshan Bai, and Leefeng Chien. Text representation: From vector to tensor. In International Conference on Data Mining, Lecture Notes in Computer Science. IEEE Computer Society, 2005.
- [18] A.McCallum and K. Nigam. A comparison of event models for naive bayes text classification. In In AAAI-98 Workshop on Learning for Text Categorization, 1998.
- [19] J. Neville and D. Jensen. Iterative classification in relational data. In Proc. AAAI-2000Workshop on Learning Statistical Models from Relational Data, pages 13-20. AAAI Press, 2000.
- [20] Spyridon Plakias and Efstathios Stamatatos. Tensor space models for authorship identification. In John Darzentas, George A. Vouros, Spyros Vosinakis, and Argyris Arnellos, editors, SETN, Lecture Notes in Computer Science, pages 239-249. Springer, 2008.
- [21] Philip Resnik. Signal processing based on multilinear algebra. PhD thesis, Katholieke, University of Leuven, Belgium, 1997.
- [22] Suman Saha, C. A. Murthy, and Sankar K. Pal. Classification of web services using tensor space model and rough ensemble classifier. In Aijun An, Stan Matwin, Zbigniew W. Ras, and Dominik Slezak, editors, ISMIS, volume 4994 of Lecture Notes in Computer Science, pages 508-513. Springer, 2008.
- [23] Dou Shen, Jian-Tao Sun, Qiang Yang, and Zheng Chen. A comparison of implicit and explicit links for web page classification. In WWW '06: Proceedings of the 15th international conference on World Wide Web, pages 643-650, New York, NY, USA, 2006. ACM.
- [24] Herve Utard and Johannes Furnkranz. Link-local features for hypertext classification. In EWMF/KDO, volume 4289 of Lecture Notes in Computer Science, pages 51-64. Springer, 2005.
- [25] C. J. Van Rijsbergen. Information Retrieval, 2nd edition. Dept. of Computer Science, University of Glasgow, 1979.
- [26] M. A. O. Vasilescu and D. Terzopoulos. Multilinear analysis of image ensembles: Tensorfaces. In In ECCV, 2002.
- [27] Jidong Wang, Hua-Jun Zeng, Zheng Chen, Hongjun Lu, Li Tao, and Wei-Ying Ma. Recom: reinforcement clustering of multi-type interrelated data objects. In SIGIR, pages 274-281, 2003.
- [28] S. K. M. Wong and Vijay V. Raghavan. Vector space model of information retrieval: a reevaluation. In Proceedings of the 7th annual international ACMSIGIR conference on Research and development in information retrieval, pages 167-185, Swinton, UK, 1984. British Computer Society.
- [29] Zenglin Xu, Irwin King, and Michael R. Lyu. Web page classification with heterogeneous data fusion. In WWW '07: Proceedings of the 16th international conference on World Wide Web, pages 1171-1172, New York, NY, USA, 2007. ACM.
- [30] Yiming Yang, Sean Slattery, and Rayid Ghani. A study of approaches to hypertext categorization. Journal of Intelligent Information Systems, 18(2-3):219-241, 2002.
- [31] Yiming Yang, Se´an Slattery, and Rayid Ghani. A study of approaches to hypertext categorization. J. Intell. Inf. Syst., 18(2-3):219-241, 2002.
- [32] Oren Zamir and Oren Etzioni.Web document clustering: A feasibility demonstration. In SIGIR '98: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, August 24-28 1998, Melbourne, Australia, pages 46-54. ACM, 1998.
- [33] Xiaojun Zong, Yi Shen, and Xiaoxin Liao. Improvement of hits for topic-specific web crawler. In Advances in Intelligent Computing, Lecture Notes in Computer Science, Springer Berlin, pages 524-532, September 16 2005.
Typ dokumentu
Bibliografia
Identyfikatory
Identyfikator YADDA
bwmeta1.element.baztech-article-BUS8-0008-0070