Automatyczne tworzenie podsumowań tekstów metodami algebraicznymi

Gramacki, J.; Gramacki, A.

Artykuł - szczegóły

Tytuł artykułu

Automatyczne tworzenie podsumowań tekstów metodami algebraicznymi

Autorzy

Gramacki J. , Gramacki A.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

Warianty tytułu

Automatic text summarization using algebraic approach

Języki publikacji

Abstrakty

Duża liczba zwracanych (na przykład przez różnego rodzaju wyszukiwarki internetowe) dokumentów oznacza, że często zmuszeni jesteśmy do czasochłonnego ich przeglądania, celem weryfikacji trafności zwracanych wyników. Gdy dokumenty są długie, czas ich przeglądania znacznie się wydłuża. Można by go wydatnie skrócić, gdyby istniała możliwość automatycznego generowania sensownych podsumowań (streszczeń). W artykule omawiamy wybrane algebraiczne metody służące automatycznemu wydobywaniu z tekstu jego najistotniejszych słów kluczowych oraz najistotniejszych zdań.

Text summarization is a real practical problem due to explosion of the volume of textual information available nowadays. In order to solve this problem, text summarization systems which extract brief information from a given text are created. The end user, by looking only at the summary, may decide whether the document is or is not of interest to him/her. Built summaries can have 2 fundamental forms. Firstly, extractive summarization may collect important sentences from the input text to constitute the summary. Secondly, abstractive summarization tries to capture main concepts of the text and then some new sentences, summarizing the input text, are generated. Nowadays, however, it seems that the latter approach still needs extensive works to be really useful. A summary can be extracted from a single document or multiple documents. In the paper the authors build summaries of one document only. The extension into multi-document summaries is the straightforward task in the case when a set of semantically uniform texts is summarized. Summaries may also be categorized as generic and query-based summaries. In the first case, there are generated summaries con-taining main topics of a document. In the second case, summaries contain the sentences that are related to the given queries. In the paper there are built generic summaries. Summarization systems use different approaches to determine important sentences. Here there is used semantic oriented approach based on a method known as Latent Semantic Analysis (LSA). LSA is an algebraic method that extracts meaning of words and similarity of sentences using the information about usage of the words in the context. It uses Singular Value Decomposition (SVD) for finding semantically similar words and sentences. Using the results of SVD the authors try to select best sentences (which constitute the best summary of the text). The paper is organized as follows. In Section 2 there is formulated the problem. In Section 3 there is shown how a docu-ment may be represented in a useful algebraic format. The so called Term-Sentence matrix (TSM) is used. The authors also point at some preliminary tasks necessary to be performed for successful further analysis. In Subsection 3.2 there is shortly presented an idea of LSA as based on SVD decomposition. In the last section 4 two examples of text summarizations build for both Polish and English texts are given. The two methods used differ slightly from each other. The authors' extracting key words and key sentences seems to be proper content-related summaries of the input texts.

Słowa kluczowe

podsumowywanie automatyczne ukryta semantyka dokumentów przekształcenie SVD

generic text summarization sentence extraction latent semantic analysis singular value decomposition

Wydawca

Wydawnictwo PAK

Czasopismo

Pomiary Automatyka Kontrola

Rocznik

2011

Tom

R. 57, nr 7

Strony

751--755

Opis fizyczny

Bibliogr. 12 poz., rys., tab., wzory

Twórcy

autor

Gramacki J.

autor

Gramacki A.

Uniwersytet Zielonogórski, Instyutut Informatyki i Elektroniki, ul. Licealna 9, 65-417 Zielona Góra, J.Gramacki@iie.uz.zgora.pl

Bibliografia

[1] Berry M. W., Dumais S. T., O’Brien G. W.: Using linear Algebra for Intelligent Information Retrieval, SIAM Rev. 37 (1995) pages 573-595.
[2] Das D., Martins A. F. T.: A Survey on Automatic Text Summarization, Literature Survey for the Language and Statistics II course at CMU, November, 2007.
[3] Furnas G. W., Deerwester S., Dumais S. T, et. al.: Information Retrieval Using a Singular Value Decomposition Model of Latent Semantic Structure, SIGIR’88 Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval, 1988.
[4] Gong Y., Liu X.: Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis, SIGIR’01 Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, 2001.
[5] Gramacki A., Gramacki J.: Usługi biznesowe w SQL Server 2008. Omówienie oraz przykład zastosowania w przemyśle, Informatyka - sztuka czy rzemiosło. KNWS’2010: materiały 7. konferencji naukowej. Świnoujście, 2010, s. 101-104.
[6] Kelly D. A.: Open for Business, Oracle Magazine, January/ February 2011.
[7] Manning C. D., Raghavan P, Schütze H.: Introduction to Information Retrieval, Cambridge University Press, 2008.
[8] McCargar V.: Statistical Approaches to Automatic Text Summarization, Bulletin of the American Society for Information Science and Technology Volume 30, Issue 4, pages 21-25.
[9] Steinberger J., Jezek K.: Text Summarization: An Old Challenge and New Approaches. Foundations of Computational Intelligence (6) 2009: 127-149.
[10] Zha H.: Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering, SIGIR’02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, 2002.
[11] http://tartarus.org/~martin/PorterStemmer/index.html
[12] http://www.cs.put.poznan.pl/dweiss/xml/projects/lametyzator/index.xml

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BSW4-0103-0016