Survey of scientific document summarization methods

Kurian, Sheena K.; Mathew, Sheena

doi:10.7494/csci.2020.21.2.3356

Artykuł - szczegóły

Tytuł artykułu

Survey of scientific document summarization methods

Autorzy

Kurian Sheena K. , Mathew Sheena

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.7494/csci.2020.21.2.3356

Warianty tytułu

Języki publikacji

Abstrakty

The number of research papers published every year is growing at an exponential rate, which has led to intensive research in scientific document summarization. The different methods commonly used in automatic text summarization research are discussed in this paper, along with their pros and cons. Commonly used evaluation techniques and datasets in this field are also discussed. Rouge and Pyramid scores are tabulated for easy comparison of the results of various summarization methods.

Słowa kluczowe

document summarization abstractive summarization extractive summarization

Wydawca

Wydawnictwa AGH

Czasopismo

Computer Science

Rocznik

2020

Tom

T. 21 (2)

Strony

141--177

Opis fizyczny

Bibliogr. 86 poz., tab.

Twórcy

autor

Kurian Sheena K.

sheenakuriank@gmail.com

Cochin University of Science and Technology, School of Engineering

autor

Mathew Sheena

sheenamathew@cusat.ac.in

Cochin University of Science and Technology, School of Engineering

Bibliografia

[1] Abu-Jbara A., Radev D.: Reference Scope Identification in Citing Sentences. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 80–90, Association for Computational Linguistics, 2012.
[2] AbuRa’ed A.G.T., Chiruzzo L., Saggion H., Accuosto P., Bravo A.: ` LaSTUS/TALN @ CLSciSumm-17: Cross-document Sentence Matching and Scientific Text Summarization Systems. In: BIRNDL@SIGIR, 2017.
[3] Azam N., Ahmad A.: Text summarization using rough sets. In: 2016 International Conference on Computing, Electronic and Electrical Engineering (ICE Cube), pp. 90–94, 2016.
[4] Baralis E., Cagliero L., Mahoto N.A., Fiori A.: GraphSum: Discovering correlations among multiple terms for graph-based summarization, Information Sciences, vol. 249, pp. 96–109, 2013.
[5] Barzilay R., Elhadad M.: Using Lexical Chains for Text Summarization. In: Intelligent Scalable Text Summarization, 1997.
[6] Chen J., Zhuge H.: Summarization of scientific documents by detecting common facts in citations, Future Generation Computer Systems, vol. 32(C), pp. 246–252, 2014.
[7] Chopra S., Auli M., Rush A.M.: Abstractive Sentence Summarization with Attentive Recurrent Neural Networks. In: HLT-NAACL, 2016.
[8] Cohan A., Goharian N.: Scientific Article Summarization Using Citation-Context and Article’s Discourse Structure, EMNLP, pp. 390–400, 2015.
[9] Cohan A., Goharian N.: Revisiting Summarization Evaluation for Scientific Articles. In: ArXiv, vol. abs/1604.00400, 2016.
[10] Collins E., Augenstein I., Riedel S.: A Supervised Approach to Extractive Summarisation of Scientific Papers. In: CoNLL, pp. 195–205, 2017.
[11] Conroy J.M., Davis S.: Vector Space Models for Scientific Document Summarization. In: VS@HLT-NAACL, pp. 186–191, 2015.
[12] Conroy J.M., O’Leary D.P.: Text summarization via hidden Markov models. In: SIGIR ’01, pp. 406–407, 2001.
[13] Deerwester S.C., Dumais S.T., Landauer T.K., Furnas G.W., Harshman R.A.: Indexing by Latent Semantic Analysis, JASIS, vol. 41, pp. 391–407, 1990.
[14] Edmundson H.: New Methods in Automatic Extracting, Journal of ACM, vol. 16(2), pp. 264–285, 1969.
[15] Erkan G., Radev D.R.: LexRank: Graph-based Lexical Centrality as Salience in Text Summarization, Journal of Artificial Intelligence Research, vol. 22, pp. 457–479, 2004.
[16] Fattah M.A., Ren F.: GA, MR, FFNN, PNN and GMM based models for automatic text summarization, Computer Speech & Language, vol. 23(1), pp. 126–144, 2009.
[17] Ganesan K., Zhai C., Han J.: Opinosis: A Graph-Based Approach to Abstractive Summarization of Highly Redundant Opinions. In: Proceedings of the 23rd International Conference on Computational Linguistics, COLING ’10, pp. 340–348, Association for Computational Linguistics, 2010.
[18] Genest P.-E, Lapalme G.: Text Generation for Abstractive Summarization. In: Proceedings of the Third Text Analysis Conference, National Institute of Standards and Technology, 2010.
[19] Gong Y., Liu X.: Generic text summarization using relevance measure and latent semantic analysis. In: SIGIR ’01, pp. 19–25, 2001.
[20] Goyal P., Behera L., Mcginnity T.M.: A Context-Based Word Indexing Model for Document Summarization, IEEE Transactions on Knowledge and Data Engineering, vol. 25, pp. 1693–1705, 2013.
[21] Greenbacker C.F.: Towards a Framework for Abstractive Summarization of Multimodal Documents. In: Proceedings of the ACL 2011 Student Session, HLT-SS’11, pp. 75–80, Association for Computational Linguistics, 2011.
[22] He Z., Chen C., Bu J., Wang C., Zhang L., Cai D., He X.: Unsupervised document summarization from data reconstruction perspective, Neurocomputing, vol. 157, pp. 356–366, 2015.
[23] Hirao T., Nishino M., Yoshida Y., Suzuki J., Yasuda N., Nagata M.: Summarizing a Document by Trimming the Discourse Tree, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23(11), pp. 2081–2092, 2015.
[24] Hovy E.: Text Summarization. In: R. Mitkov (ed.), The Oxford Handbook of Computational Linguistics, chap. 32, pp. 583–598, Oxford University Press, 2003.
[25] Jafari M., Wang J., Qin Y., Gheisari M., Shahabi A.S., Tao X.: Automatic text summarization using fuzzy inference. In: 2016 22nd International Conference on Automation and Computing (ICAC), pp. 256–260, 2016.
[26] Jaidka K., Chandrasekaran M.K., Rustagi S., Kan M.Y.: Insights from CL-SciSumm 2016: the faceted scientific document summarization shared task, International Journal on Digital Libraries, vol. 19(2-3), pp. 163–171, 2018.
[27] Jaidka K., Yasunaga M., Chandrasekaran M.K., Radev D.R., Kan M.Y.: The CL-SciSumm Shared Task 2018: Results and Key Insights. In: BIRNDL@SIGIR, 2018.
[28] Jha R., Abu-Jbara A., Radev D.: A System for Summarizing Scientific Topics Starting from Keywords. In: Proceedings of 51st Annual Meeting of the ACL, pp. 572–577, 2013.
[29] Kaikhah K.: Automatic text summarization with neural networks, 2004 2nd International IEEE Conference on ‘Intelligent Systems’. Proceedings (IEEE Cat. No.04EX791), vol. 1, pp. 40–44, 2004.
[30] Kallimani J.S., Srinivasa K.G., Reddy B.E.: Statistical and Analytical Study of Guided Abstractive Text Summarization, Current Science, vol. 110(1), 2016.
[31] Khan A., Salim N.: A Review on Abstractive Summarization Methods, Journal of Theoretical and Applied Information Technology, vol. 59(1), pp. 64–72, 2014.
[32] Kupiec J., Pedersen J.O., Chen F.: A Trainable Document Summarizer. In: SIGIR ’95, pp. 68–73, 1995.
[33] Lauscher A., Glavas G., Eckert K.: University of Mannheim @ CLSciSumm-17: Citation-Based Summarization of Scientific Articles Using Semantic Textual Similarity. In: BIRNDL@SIGIR, 2017.
[34] Le H.T., Le T.M.: An approach to abstractive text summarization. In: 2013 International Conference on Soft Computing and Pattern Recognition (SoCPaR), pp. 371–376, 2013.
[35] Lin C.Y.: Training a selection function for extraction. In: Proceedings of the Eighth International Conference on Information and Knowledge Management, CIKM’99, pp. 55–62, Association for Computing Machinery, 1999.
[36] Lin C.Y.: ROUGE: A Package for Automatic Evaluation of Summaries. In: Text Summarization Branches Out, pp. 74–81, Association for Computational Linguistics, 2004.
[37] Liu F., Flanigan J., Thomson S., Sadeh N.M., Smith N.A.: Toward Abstractive Summarization Using Semantic Representations. In: HLT-NAACL, 2015.
[38] Lloret E., Palomar M.: Analyzing the Use of Word Graphs for Abstractive Text Summarization. In: Advances in Information Mining and Management, 2011.
[39] Lloret E., Rom´a-Ferri M.T., Palomar M.: COMPENDIUM: a text summarization system for generating abstracts of research papers, Data & Knowledge Engineering, vol. 88, pp. 164–175, 2013.
[40] Louis A., Joshi A.K., Nenkova A.: Discourse indicators for content selection in summarization. In: SIGDIAL Conference, pp. 147–156, 2010.
[41] Louis A., Nenkova A.: Automatically Evaluating Content Selection in Summarization without Human Models. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 1, pp. 306–314, Association for Computational Linguistics, 2009.
[42] Luhn H.: The Automatic Creation of Literature Abstracts, IBM Journal of Research and Development, vol. 2(2), pp. 159–165, 1958.
[43] Mani I., Bloedorn E., Gates B.: Using cohesion and coherence models for text summarization. In: AAAI 1998, pp. 69–76, 1998.
[44] Mani I., House D., Klein G., Hirschman L., Firmin T., Sundheim B.: The TIPSTER SUMMAC Text Summarization Evaluation. In: Proceedings of the Ninth Conference on European Chapter of the Association for Computational Linguistics, pp. 77–85, Association for Computational Linguistics, 1999.
[45] Mann W., Thompson S.: Rhetorical Structure Theory: Towards a functional theory of text organization, Information Processing and Management, vol. 8(3), pp. 243–281, 1988.
[46] Marcu D.: Improving summarization through rhetorical parsing tuning. In: Sixth Workshop on Very Large Corpora, pp. 206–215, 1998.
[47] Marcu D.: The Theory and Practice of Discourse Parsing and Summarization, MIT Press, 2000.
[48] Marcu D.C.: The Rhetorical Parsing, Summarization, and Generation of Natural Language Texts, Ph.D. thesis, 1998.
[49] Mei Q., Zhai C.: Generating Impact-Based Summaries for Scientific Literature. In: ACL, pp. 816–824, 2008.
[50] M´endez-Cruz C.F., Gama-Castro S., Mej´ıa-Almonte C., Castillo-Villalba M.P., Mu˜niz-Rascado L., Collado-Vides J.: First steps in automatic summarization of transcription factor properties for RegulonDB: classification of sentences about structural domains and regulated processes. In: Database, 2017.
[51] Mitra M., Singhal A., Buckley C.: Automatic Text Summarization by Paragraph Extraction. In: Intelligent Scalable Text Summarization, 1997.
[52] Mitrovic S., M¨uller H.: Summarizing Citation Contexts of Scientific Publications. In: Experimental IR Meets Multilinguality, Multimodality, and Interaction – 6th International Conference of the CLEF Association, CLEF, pp. 154–165, 2015.
[53] Mohammad S., Dorr B., Egan M., Hassan A., Muthukrishan P., Qazvinian V., Radev D., Zajic D.: Using Citations to Generate Surveys of Scientific Paradigms. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL ’09, pp. 584–592, Association for Computational Linguistics, 2009
[54] Molina A., Torres-Moreno J.M., SanJuan E., da Cunha I., Sierra Mart´ınez G.E.: Discursive Sentence Compression. In: Proceedings of the 14th International Conference on Computational Linguistics and Intelligent Text Processing, vol. 2, CICLing’13, pp. 394–407, Springer-Verlag, 2013.
[55] Moratanch N., Chitrakala S.: A survey on abstractive text summarization. In: 2016 International Conference on Circuit, Power and Computing Technologies (ICCPCT), pp. 1–7, 2016.
[56] Nenkova A., Passonneau R., McKeown K.: The Pyramid Method: Incorporating human content selection variation in summarization evaluation, ACM Transactions on Speech Language Processing, vol. 4(2), 2007.
[57] Paice C.: Constructing literature abstracts by computer: Techniques and prospects, Information Processing and Management, vol. 26(1), pp. 171–186, 1990.
[58] Pal A.R., Maiti P.K., Saha D.: An Approach to Automatic Text Summarization Using Simplified Lesk Algorithm and Wordnet, International Journal of Control Theory and Computer Modeling, vol. 3, pp. 15–23, 2013.
[59] Parveen D., Strube M.: Integrating Importance, Non-Redundancy and Coherence in Graph-Based Extractive Summarization. In: IJCAI, 2015.
[60] Patil S.R., Mahajan S.: Optimized Summarization of Research Papers as an Aid for Research Scholars Using Data Mining Techniques. In: 2012 International Conference on Radar, Communication and Computing (ICRCC), pp. 243–249, 2012.
[61] Pourvali M., Abadeh M.S.: A new graph based text segmentation using Wikipedia for automatic text summarization, International Journal of Advanced Computer Science and Applications, vol. 3(1), pp. 35–39, 2012.
[62] Qazvinian V., Hassanabadi L.S., Halavati R.: Summarizing text with a genetic algorithm-based sentence extraction, International Journal of Knowledge Managemant Studies, vol. 2(4), 2008.
[63] Qazvinian V., Radev D.R.: Scientific Paper Summarization Using Citation Summary Networks. In: Proceedings of the 22nd International Conference on Computational Linguistics, vol. 1, COLING’08, pp. 689–696, Association for Computational Linguistics, 2008.
[64] Qazvinian V., Radev D.R.: Identifying Non-Explicit Citing Sentences for Citation-Based Summarization. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL ’10, pp. 555–564. Association for Computational Linguistics, 2010.
[65] Qazvinian V., Radev D.R., Mohammad S.M., Dorr B., Zajic D., Whidby M., Moon T.: Generating Extractive Summaries of Scientific Paradigms, Journal of Artificial Intelligence Research, vol. 46(1), pp. 165–201, 2013.
[66] Rashidghalam H., Taherkhani M., Mahmoudi F.: Text summarization using concept graph and BabelNet knowledge base. In: 2016 Artificial Intelligence and Robotics (IRANOPEN), pp. 115–119, 2016.
[67] Rasim A., Ramiz A.: Evolutionary Algorithm for Extractive Text Summarization, Intelligent Information Management, vol. 1(2), pp. 128–138, 2009.
[68] Rossiello G., Basile P., Semeraro G.: Centroid-based Text Summarization through Compositionality of Word Embeddings. In: Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres, pp. 12–21, Association for Computational Linguistics, 2017.
[69] Rush A.M., Chopra S., Weston J.: A Neural Attention Model for Abstractive Sentence Summarization. In: EMNLP, 2015.
[70] Saggion H., AbuRa’ed A., Ronzano F.: Trainable Citation-enhanced Summarization of Scientific Articles. In: Proceedings of the Joint Workshop on Bibliometricenhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL), pp. 175–186, 2016.
[71] Saied H.A., Dugu´e N., Lamirel J.C.: Automatic summarization of scientific publications using a feature selection approach, International Journal on Digital Libraries, vol. 19, pp. 203–215, 2017.
[72] See A., Liu P.J., Manning C.D.: Get To The Point: Summarization with PointerGenerator Networks. In: ACL, 2017.
[73] Shi L., Tong H., Tang J., Lin C.: VEGAS: Visual influEnce GrAph Summarization on Citation Networks, IEEE Transactions on Knowledge and Data Engineering, vol. 27, pp. 3417–3431, 2015.
[74] Shi T., Keneshloo Y., Ramakrishnan N., Reddy C.K.: Neural Abstractive Text Summarization with Sequence-to-Sequence Models. In: ArXiv, vol. abs/1812.02303, 2018.
[75] Steinberger J., Jezek K.: Evaluation Measures for Text Summarization. In: Computing and Informatics, vol. 28, pp. 251–275, 2009.
[76] Su Y., Sun S., Xuan Y., Shi L.: Influence Visualization of Scientific Paper through Flow-Based Citation Network Summarization. In: Proceedings of the 2015 IEEE International Conference on Data Mining Workshop (ICDMW), ICDMW ’15, pp. 1652–1655, IEEE Computer Society, 2015.
[77] Suanmali L., Binwahlan M.S., Salim N.: Sentence Features Fusion for Text Summarization Using Fuzzy Logic. In: 2009 Ninth International Conference on Hybrid Intelligent Systems, vol. 1, pp. 142–146, 2009.
[78] Tan J., Wan X., Xiao J.: Abstractive Document Summarization with a Graph-Based Attentional Neural Model. In: ACL, pp. 1171–1181, 2017.
[79] Teufel S., Moens M.: Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status, Computational Linguistics, vol. 28(4), pp. 409–445, 2002.
[80] Teufel S., Siddharthan A., Tidhar D.: Automatic classification of citation function. In: EMNLP ’06: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pp. 103–110, Association for Computational Linguistics, 2006.
[81] V´azquez E.V., Garc´ıa-Hern´andez R.A., Ledeneva Y.: Sentence features relevance for extractive text summarization using genetic algorithms, Journal of Intelligent and Fuzzy Systems, vol. 35, pp. 353–365, 2018.
[82] Wang X., Yoshida Y., Hirao T., Sudoh K., Nagata M.: Summarization Based on Task-Oriented Discourse Parsing, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23(8), pp. 1358–1367, 2015.
[83] Yadav J., Meena Y.K.: Use of fuzzy logic and wordnet for improving performance of extractive automatic text summarization. In: 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 2071–2077, 2016.
[84] Yan S., Wan X.: SRRRank: leveraging semantic roles for extractive multi- -document summarization, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22(12), pp. 2048–2058, 2014.
[85] Zahir al S., Fatima Q., Cenek M.: New Graph-Based Text Summarization Method. In: 2015 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM), pp. 396–401, 2015.
[86] Zhang Z., Ge S.S., He H.: Mutual-reinforcement document summarization using embedded graph based sentence clustering for storytelling, Information Processing and Management, vol. 48(4), pp. 767–778, 2012.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-32dd5b61-ff17-4f7f-8d20-3c43fa89974f