Order estimation of japanese paragraphs by supervised machine learning and various textual features

Murata, M.; Ito, S.; Tokuhisa, M.; Ma, Q.

doi:10.1515/jaiscr-2015-0033

Artykuł - szczegóły

Tytuł artykułu

Order estimation of japanese paragraphs by supervised machine learning and various textual features

Autorzy

Murata M. , Ito S. , Tokuhisa M. , Ma Q.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.1515/jaiscr-2015-0033

Warianty tytułu

Języki publikacji

Abstrakty

In this paper, we propose a method to estimate the order of paragraphs by supervised machine learning. We use a support vector machine (SVM) for supervised machine learning. The estimation of paragraph order is useful for sentence generation and sentence correction. The proposed method obtained a high accuracy (0.84) in the order estimation experiments of the first two paragraphs of an article. In addition, it obtained a higher accuracy than the baseline method in the experiments using two paragraphs of an article. We performed feature analysis and we found that adnominals, conjunctions, and dates were effective for the order estimation of the first two paragraphs, and the ratio of new words and the similarity between the preceding paragraphs and an estimated paragraph were effective for the order estimation of all pairs of paragraphs.

Słowa kluczowe

supervised machine learning estimate paragraph vector machine SVM feature analysis

nadzorowane uczenie maszynowe oszacowanie paragraf maszyna wektorów nośnych SVM analiza funkcji

Wydawca

University of Social Sciences

Czasopismo

Journal of Artificial Intelligence and Soft Computing Research

Rocznik

2015

Tom

Vol. 5, No. 4

Strony

247--255

Opis fizyczny

Bibliogr. 16 poz., rys.

Twórcy

autor

Murata M.

Department of Information and Electronics, Tottori University, 4-101 Koyama-Minami, Tottori 680-8552, Japan

autor

Ito S.

Department of Information and Electronics, Tottori University, 4-101 Koyama-Minami, Tottori 680-8552, Japan

autor

Tokuhisa M.

Department of Information and Electronics, Tottori University, 4-101 Koyama-Minami, Tottori 680-8552, Japan

autor

Ma Q.

Department of Applied Mathematics and Informatics, Ryukoku University Seta, Otsu, Shiga 520-2194, Japan

Bibliografia

[1] Rakesh Agrawal and Ramakrishnan Srikant. Mining sequential patterns. Proceedings of ICDEf95, pages 3–14, 1995.
[2] Danushka Bollegala, Naoaki Okazaki, and Mitsuru Ishizuka. A bottom-up approach to sentence ordering for multi-document summarization. Proceedings of the 44th Annual Meeting of the Association of Computational Linguistics, pages 385–392, 2006.
[3] Jaime Carbonell and Jade Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 335–336, 1998.
[4] Nello Cristianini and John Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, 2000.
[5] Fosca Giannotti, Mirco Nanni, and Dino Pedreschi. Efficient mining of temporally annotated sequences. Proceedings of the 2006 SIAM International Conference on Data Mining, pages 348–359, 2006.
[6] Yuya Hayashi, Masaki Murata, Liangliang Fan, and Masato Tokuhisa. Japanese sentence order estimation using supervised machine learning with rich linguistic clues. In Proceedings of the 14th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2013), pages 1–12, 2013.
[7] Nikiforos Karamanis and Hisar Maruli Manurung. Stochastic text structuring using the principle of continuity. In Proceedings of the second International Natural Language Generation Conference (INLGf02), pages 81–88, 2002.
[8] Taku Kudoh. TinySVM: Support Vector Machines. http://cl.aist-nara.ac.jp/ taku-ku// software/TinySVM/ index.html, 2000.
[9] Mirella Lapata. Probablistic text structuring: Experiments with sentence ordering. Proceedings of the 41st Annual Meeting of the Association of Computational Linguistics, pages 542–552, 2003.
[10] William C. Mann and Sandra A. Thompson. Rhetorical structure theory: Toward a functional theory of text organization. Text, 8(3):243–281, 1988.
[11] Yuji Matsumoto, Akira Kitauchi, Tatsuo Yamashita, Yoshitaka Hirano, Hiroshi Matsuda, and Masayuki Asahara. Japanese morphological analysis system ChaSen version 2.0 manual 2nd edition. 1999.
[12] Kathleen R. McKeown, Judith L. Klavans, Vasileios Hatzivassiloglou, Regina Barzilay, and Eleazar Eskin. Towards multidocument summarization by reformulation: Progress and prospects. In Proceedings of AAAI/IAAI, pages 453–460, 1999.
[13] Masaki Murata and Hitoshi Isahara. Automatic detection of mis-spelled Japanese expressions using a new method for automatic extraction of negative examples based on positive examples. IEICE Transactions on Information and Systems, E85–D(9):1416–1424, 2002.
[14] Masaki Murata, Satoshi Ito, Masato Tokuhisa, and Qing Ma. Order estimation of Japanese paragraphs by supervised machine learning. Proceedings of SCIS-ISIS 2014, pages 1096–1101, 2014.
[15] Naoaki Okazaki, Yutaka Matsuo, and Mitsuru Ishizuka. Improving chronological sentence ordering by precedence relation. In Proceedings of the 20th International Conference on Computational Linguistics (COLING 04), pages 750–756, 2004.
[16] Kiyotaka Uchimoto, Masaki Murata, Qing Ma, Satoshi Sekine, and Hitoshi Isahara. Word order acquisition from corpora. In COLING ’2000, pages 871–877, 2000.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-d5286793-0704-4626-b30a-6ec7569639fa