Metody grupowania dokumentów tekstowych

Gołębski, R.; Bembenik, R.; Chrabąszcz, M.

Artykuł - szczegóły

Tytuł artykułu

Metody grupowania dokumentów tekstowych

Autorzy

Gołębski R. , Bembenik R. , Chrabąszcz M.

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

Niniejsza praca traktuje o metodach grupowania pojęciowego dokumentów. Przedstawiono sposoby reprezentacji dokumentów, tak aby mogły być one grupowane z wykorzystaniem klasycznych algorytmów DM, przegląd algorytmów grupujących oraz specjalizowane algorytmy grupowania tekstu. Artykuł zawiera także propozycje nowych reprezentacji dokumentów, jak i nowych specjalizowanych algorytmów klasteryzujących dokumenty tekstowe.

Słowa kluczowe

data mining text mining klasteryzacja dokumentów metody reprezentacji dokumentów tekstowych

Wydawca

Wydawnictwo Politechniki Częstochowskiej

Czasopismo

Informatyka Teoretyczna i Stosowana

Rocznik

2003

Tom

R. 3, nr 4

Strony

179--196

Opis fizyczny

Bibliogr. 29 poz., 1 tab.

Twórcy

autor

Gołębski R.

Instytut Informatyki, Politechnika Warszawska ul. Nowowiejska 15119, 00-665 Warszawa

autor

Bembenik R.

R.Bembenik@ii.pw.edu.pl

Instytut Informatyki, Politechnika Warszawska ul. Nowowiejska 15119, 00-665 Warszawa

autor

Chrabąszcz M.

Instytut Informatyki, Politechnika Warszawska ul. Nowowiejska 15119, 00-665 Warszawa

Bibliografia

[1] Ahonen H., Heinonen O., Klemettinen M., Verkamo I., Applying data mining techniques in text analysis, Raport instytutowy C-1997-23, 1997.
[2] Allison L., Trie - definition. http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Tree/Trie.html.
[3] Botafogo R.A., Cluster analysis for hypertext systems, 16th Annual ACM SIGIR Conference of Res. and Dev. in Info. Retrieval 1993, 116-125.
[4] Can F., Ozkarhan E.A., Dynamic cluster maintenance, 1989.
[5] Cutting D.R., Pedersen J.O., Karger D., Tukey J.W., Scatter/gather: A cluster-based approach to browsing large document collections, Proceedings of the Fifteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1992, 318-329.
[6] Dubes R.C., Handbook of Pattern Recognition & Computer Vision, rozdz. Cluster analysis and related issues, World Scientific Publishing Co., Inc., River Edge, NJ, 1993, 3-32.
[7] Everitt B., Cluster Analysis, Halsted Press, New York 1980.
[8] Gawrysiak P., Automatyczna kategoryzacja dokumentów, Praca doktorska, Politechnika Warszawska, Warszawa 2001.
[9] Qin He., A review of clustering algorithms as applied in ir, 1999, Graduate School of Library and Information Science, Univeristy of Illinois at Urbana, Champaign.
[10] Hatzivassiloglou V., Gravano L., Maganti A., An investigation of linguistic features and clustering algorithms for topical document clustering, Proceedings of the 23rd annual international ACM SIGIR conference on Research and Development in Information Retrieval, ACM Press 2000, 224-231.
[11] Jain A.K., Bubes R.C., Algorithms for Clustering Data, Prentice-Hall advanced reference series, Prentice-HaII, Inc., Upper Saddle River, NJ, 1988.
[12] Jain AK., Murty M.N., Flynn P.J., Data clustering: a review, ACM Computing Surveys (CSUR), 31(3), 1999, 264-323.
[13] Thorsten Joachims, A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization, ed. Douglas H. Fisher, Proceedings of ICML-97, 14th International Conference on Machine Learning, Nashville, US, 1997. Morgan Kaufmann Publishers, San Francisco, 143-151.
[14] Lovins J.B., Development of stemming algorithm, Mechanical Translation and Computational Linguistics, 1968.
[15] Mannila H., Data mining: Machine learning, statistics, and databases, Statistical and Scientific Database Management, 1996, 2-9.
[16] Mlademic D., Machine Learning on non-homogenous, distributed text data, Uniwersytet w Lubijanie, 1997, Praca doktorska.
[17] Dharmendra S. Modha, W. Scott Spangler, Clustering hypertext with applications to web searching, ACM Conference on Hypertext, 2000, 143-152.
[18] Michalski R., Stepp R.E., Diday E., Automated construction of classifications: conceptual clustering versus numerical taxonomy, IEEE Trans. Pattern Anal. Mach. Interll. PAMI-5, 1983, 396-409.
[19] Heikki Mannila, Hannu Toivonen, Discovering generalized episodes using minimal occurrences, Knowledge Discovery and Data Mining, 1996, 146-151.
[20] Rasmussen E., Clusterin algorithms, Information Retrieval: Data Structures and Algorithms, Prentice Hall, 1992.
[21] Ruger S., Gauch S., Feature reduction for document clustering and classification.
[22] Shuetze H., Manning C., Foundations of Statistical Natural Language Processing, MIT Press, Cambridge 2000.
[23] Wong W., Fu A., Incremental document clustering for web page classification, 2000.
[24] Willet P., Recent tends in hierarchic document clustering: a critical review, Information Processing & Management 1988.
[25] Wulfekuhler M.R., Punch W.F., Finding salient features fur personal Web page categories, Computer Networks and ISDN Systems, 1997, 29(8-13): 1147-1156.
[26] Weiss S.M., White B.F., Chidanand Apte. Lightweight document clustering, ibm research report rc-21684.
[27] Weiss S.M., White B.F., Apte Ch., Damerau F., Lightweight document matching for help-desk applications, IEEE Intelligent Systems 1999, 15(2), 57-61.
[28] Oren Zamir, Oren Etzioni, Web document clustering: A feasibility demonstration. Research and Development in Information Retrieval, 1998, 46-54.
[29] Zipf G., Human Behavior and the Principle of Least Effort, Cambridge 1949.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BPG5-0015-0055