PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Cluster analysis of medical text documents by using semi-clustering approach based on graph representation

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
The development of Internet resulted in an increasing number of online text repositories. In many cases, documents are assigned to more than one class and automatic multi-label classification needs to be used. When the number of labels exceeds the number of the documents, effective label space dimension reduction may significantly improve classification accuracy, what is a major priority in the medical field. In the paper, we propose document clustering for label selection. We use semiclustering method, by considering graph representation, where documents are represented by vertices and edge weights are calculated according to their mutual similarity. Assigning documents to semi-clusters helps in reducing number of labels, further used in multi-label classification process. The performance of the method is examined by experiments conducted on real medical datasets.
Słowa kluczowe
Rocznik
Strony
213--224
Opis fizyczny
Bibliogr. 14 poz., rys., tab.
Twórcy
autor
  • Institute of Information Technology, Lodz University of Technology
  • Institute of Information Technology, Lodz University of Technology
  • Institute of Information Technology, Lodz University of Technology
Bibliografia
  • [1] Tsoumakas G., Katakis I., Vlahavas I. (2008) Effective and Efficient Multilabel Classification in Domains with Large Number of Labels, Proceedings of ECML/PKDD Workshop on Mining Multidimensional Data, MMD’08, 30-44.
  • [2] Balasubramanian K., Lebanon G. (2012) The Landmark Selection Method for Multiple Output Prediction, Proceedings of the 29th International Conference on Machine Learning, Edinburgh, Scotland, UK, 983-990.
  • [3] Read J., Pfahringer B., Holmes G. (2008) Multi-label Classification Using Ensembles of Pruned Sets, Proceedings of 8th IEEE International Conference on Data Mining, 995-1000.
  • [4] Bi W., Kwok J. (2013) Efficient Multi-label Classification with Many Labels, Proceedings of the 30th International Conference on International Conference on Machine Learning 28, Atlanta, Georgia, USA, III-405-III-413.
  • [5] Hsu D,. Kakade S.M., Langford J., Zhang T. (2009) Multi-label Prediction via Compressed Sensing, Bengio Y., Schuurmans D., Lafferty J.D., Williams C.K.I., Culotta A. [eds]: Advances in Neural Information Processing Systems 22, Curran Associates Inc., 772-780.
  • [6] Lin Z., Ding G., Hu M., Wang J. (2014) Multi-label Classification via Feature-aware Implicit Label Space Encoding, Proceedings of the 31st International Conference on International Conference on Machine Learning 32, Beijing, China, II-325-II-333.
  • [7] Chen Y.-N., Lin H.-T. (2012) Feature-aware Label Space Dimension Reduction for Multi-label Classification, Proceedings of the 25th International Conference on Neural Information Processing Systems 1, Nevada, USA, 1529-1537.
  • [8] Herrera F., Charte F. Rivera A. J., del Jesus M.J. (2016) Multilabel Classification. Problem Analysis, Metrics and Techniques, Springer Switzerland.
  • [9] Hangal S., MacLean D., Lam M.S., Heer J. (2010) All Friends are Not Equal: Using Weights in Social Graphs to Improve Search, Proceedings of the 4th ACM Workshop on Social Network Mining and Analysis, Washington, USA, 1-7.
  • [10] Andersen J.S., Zukunft O. (2016) Semi–Clustering that Scales: An Empirical Evaluation of GraphX, Proceedings of the 2016 IEEE International Congress on Big Data, San Francisco, USA, 333-336.
  • [11] Malewicz G., Austern M.H., Bik A.J.C., Dehnert J.C., Horn I., Leiser N., Czajkowski G. (2010) Pregel: A System for Large-Scale Graph Processing, Proceedings of the 2010 International Conference on Management of Data, New York, USA, 135-146.
  • [12] http://disi.unitn.it/moschitti/corpora.htm (accessed November 20, 2017)
  • [13] http://grafos.ml/okapi.html (accessed November 20, 2017)
  • [14] Boring C.C., Squires T.S., Tong T. (1991) Cancer statistics, 1991, CA: A Cancer Journal for Clinicians, 41(6), 19-36.
Uwagi
Opracowanie rekordu w ramach umowy 509/P-DUN/2018 ze środków MNiSW przeznaczonych na działalność upowszechniającą naukę (2019).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-4363a028-f244-463e-8906-f30149e96458
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.