A semi-automated approach to building text summarisation classifiers

Garcia-Constantino, M.; Coenen, F.; Noble, P. J.; Radford, A.; Setzkorn, C.

Artykuł - szczegóły

Tytuł artykułu

A semi-automated approach to building text summarisation classifiers

Autorzy

Garcia-Constantino M. , Coenen F. , Noble P. J. , Radford A. , Setzkorn C.

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

An investigation into the extraction of useful information from the free text element of questionnaires, using a semi-automated summarisation extraction technique, is described. The summarisation technique utilises the concept of classification but with the support of domain/human experts during classifier construction. A realisation of the proposed technique, SARSET (Semi-Automated Rule Summarisation Extraction Tool), is presented and evaluated using real questionnaire data. The results of this evaluation are compared against the results obtained using two alternative techniques to build text summarisation classifiers. The first of these uses standard rule-based classifier generators, and the second is founded on the concept of building classifiers using secondary data. The results demonstrate that the proposed semi-automated approach outperforms the other two approaches considered.

Słowa kluczowe

questionnaire data mining text summarisation text classification

Wydawca

Komisja Informatyki Polskiej Akademii Nauk, Oddział w Gdańsku

Czasopismo

Journal of Theoretical and Applied Computer Science

Rocznik

2012

Tom

Vol. 6, nr 4

Strony

7--23

Opis fizyczny

Bibliogr. 29 poz., rys., tab.

Twórcy

autor

Garcia-Constantino M.

autor

Coenen F.

autor

Noble P. J.

autor

Radford A.

autor

Setzkorn C.

Department of Computer Science, The University of Liverpool, United Kingdom, fmattgc,coenen,rtnorle,alanrad,c.setzkorng@liverpool.ac.uk

Bibliografia

[1] Abd-Elrahman, A., Andreu, M., Abbott, T.: Using text data mining techniques for understanding free-style question answers in course evaluation forms. Research in Higher Education Journal. Vol. 9, pp. 11-21, 2010.
[2] Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487-499, 1994.
[3] Baeza-Yates, R., Ribeiro-Neto, B.: Modern information retrieval. ACM press New York, Vol. 463, 1999.
[4] Chen, Y. L., Weng, C. H.: Mining fuzzy association rules from questionnaire data. Knowledge-Based Systems, Vol. 22, pp. 46-56, 2009.
[5] Coenen, F.: The LUCS-KDD TFP Association Rule Mining Algorithm. http://www.csc.liv.ac. uk/frans/KDD/Software/ Apriori TFP/aprioriTFP.html Department of Computer Science, The University of Liverpool, UK, 2004.
[6] Coenen, F.: The LUCS-KDD TFPC Classification Association Rule Mining Algorithm. http://www.csc.liv.ac.uk/frans/KDD/Software/ Apriori TFPC/aprioriTFPC.html Department of Computer Science, The University of Liverpool, UK, 2004.
[7] Cohen, W. W., Singer, Y.: A simple, fast, and effective rule learner. Proceedings of the National Conference on Artificial Intelligence, pp. 335-342, 1999. 22 Matias Garcia-Constantino, Frans Coenen, P-J Noble, Alan Radford, Christian Setzkorn
[8] Garcia-Constantino, M. F., Coenen, F., Noble, P., Radford, A., Setzkorn, C., Tierney, A.: An Investigation Concerning the Generation of Text Summarisation Classifiers using Secondary Data. Seventh International Conference on Machine Learning and Data Mining. Springer, pp. 387-398, 2011.
[9] Garcia-Constantino, M. F., Coenen, F., Noble, P., Radford, A., Setzkorn, C.: A Semi-Automated Approach to Building Text Summarisation Classifiers. Eight International Conference on Machine Learning and Data Mining. Springer, pp. 495-509, 2012.
[10] Hand, D. J., Till, R. J.: A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems. Machine Learning. Vol. 45, pp. 171-186, 2001.
[11] Hersh,W., Buckley, C., Leone, T. J., Hickam, D.: OHSUMED: an interactive retrieval evaluation and new large test collection for research. Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval. Springer-Verlag, pp. 192-201, 1994.
[12] Hiramatsu, A., Oiso, H., Tamura, S., Komoda, N.: Support system for analyzing open-ended questionnaires data by culling typical opinions. 2004 IEEE International Conference on Systems, Man and Cybernetics. Vol. 2, pp. 1377-1382, 2004.
[13] Hirasawa, S.: Analyses of Student Questionnaires for Faculty Developments. A Short Course at Tamkang University Taipei, Taiwan, R.O.C., March 7-9, 2006, 2006.
[14] Hirasawa, S., Chu, W. W.: Knowledge acquisition from documents with both fixed and free formats. 2003 IEEE International Conference on Systems, Man and Cybernetics. Vol. 5, pp. 4694-4699, 2003.
[15] Hiroko, I., Masao, U., Hitoshi, I.: Criterion for judging request intention in response texts of open-ended questionnaires. Proceedings of the second international workshop on Paraphrasing. Association for Computational Linguistics, pp. 49-56, 2003.
[16] Jing, L. P., Huang, H. K. and Shi, H. B.: Improved feature selection approach TFIDF in text mining. Proceedings of the First International Conference on Machine Learning and Cybernetics, pp. 944-946, 2002.
[17] John, G. H., Langley, P. Estimating Continuous Distributions in Bayesian Classifiers. Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338-345. Morgan Kaufmann Publishers Inc., 1995.
[18] Joshi, A. K.: Natural language processing. Science. Vol. 253, pp. 1242, 1991.
[19] McCallum, A.: Information extraction: Distilling structured data from unstructured text. ACM Queue. Vol. 3, pp. 48-57, 2005.
[20] Morinaga, S., Yamanishi, K., Tateishi, K., Fukushima, T.: Mining product reputations on the web. Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 341-349, 2002.
[21] Nagamachi, M.: Kansei engineering: a new ergonomic consumer-oriented technology for product development. International Journal of industrial ergonomics. Vol. 15, pp. 3-11, 1995.
[22] Platt, J. C.: Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines, 1998.
[23] Quinlan, J. R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., 1993.
[24] Radford, A., Noble, P. J., Coyne, K. P., Gaskell, R. M., Jones, P. H., Bryan, J. G. E., Setzkorn, C., Tierney, A´ ., Dawson, S.: Antibacterial prescribing patterns in small animal veterinary practice identified via SAVSNET: the small animal veterinary surveillance network. Veterinary Record. Vol. 169, pp. 310-318, 2011.
[25] Rosell, M., Velupillai, S.: Revealing relations between open and closed answers in questionnaires through text clustering evaluation. Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC08), pp. 1716-1722, 2008.
[26] Sv´atek, V.: Ontologies, Questionnaires and (Mining) Tabular Data. In the 3rd European Semantic Web Conference (ESWC 2006), 2006. A semi-automated approach to building text summarisation classifiers 23
[27] Uchida, Y., Yoshikawa, T., Furuhashi, T., Hirao, E., Iguchi, H.: Extraction of important keywords in free text of questionnaire data and visualization of relationship among sentences. IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2009), pp. 1604-1608, 2009.
[28] Willett, P.: The Porter stemming algorithm: then and now. Program: electronic library and information systems, Vol. 40, pp. 219-223, 2006.
[29] Yamanishi, K., Li, H.: Mining open answers in questionnaire data. IEEE Intelligent Systems, pp. 58-63, 2002.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BPS3-0025-0129