Agregacja danych tekstowych na przykładzie systemu informacji prasowej

Dubel, B.; Kasprowski, P.

Artykuł - szczegóły

Tytuł artykułu

Agregacja danych tekstowych na przykładzie systemu informacji prasowej

Autorzy

Dubel B. , Kasprowski P.

Identyfikatory

Warianty tytułu

Aggregation of textual data on example of press information system

Języki publikacji

Abstrakty

Nadmiar informacji dostępnej w postaci tekstowej w sieci Internet staje się coraz większym problemem, ponieważ automatyczna analiza takich danych jest trudna. Typowymi przykładami dużych baz tekstowych są serwisy prezentujące bieżące informacje prasowe. Z uwagi na dużą liczbę takich serwisów, wiele informacji powtarza się. W artykule omówiono system z grupy tak zwanych agregatorów, który gromadzi w jednym miejscu informacje z wielu serwisów, dokonuje ich analizy i klasyfikacji, a następnie generuje na ich podstawie różnego rodzaju statystyki.

Huge amount of textual information available in Internet becomes one of the most important problems because analysis of such data is difficult automatically. Typical examples of such big text databases are web services presenting press information. The same or very similar information repeats in different services. That is why so called "aggregators" that aggregate and preprocess information from different services are becoming more and more popular. This paper presents one of such aggregators that collects information from multiple services, parses and analyses it and then tries to classify and collect different statistics.

Słowa kluczowe

dane tekstowe agregacja analiza tekstu

textual data text aggregation text parsing

Wydawca

Wydawnictwo Politechniki Śląskiej

Czasopismo

Studia Informatica

Rocznik

2011

Tom

Vol. 32, nr 2B

Strony

301--316

Opis fizyczny

Bibliogr. 7 poz.

Twórcy

autor

Dubel B.

autor

Kasprowski P.

Politechnika Śląska, Instytut Informatyki, ul. Akademicka 16, 44-100 Gliwice, Polska, p.kasprowski@polsl.pl

Bibliografia

1. RSS (Really Simple Sindication), http://www.wikipedia.pl/wiki/RSS.
2. Specyfikacja języka XPath, http://www.w3.org/TR/xpath/.
3. World Wide Web Consortium, http://www.w3.org.
4. Salton G.: Developments in Automatic Text Reüieval. Science. Vol. 253, s. 974-979.
5. Sholom W., White B., Apte C: Lightweight Document Clustering. IBM T.J. Watsan Research Center, 2000.
6. Kłopotek M. A.: Inteligentne wyszukiwarki internetowe. Akademicka Oficyna Wydawnicza EXIT, Warszawa 2001.
7. Deerwester S., Dumais S.T., Fumas G.W., Landauer T.K., Harshman R.: Indexing By Latent Semantic Analysis. Journal of the American Society For Information Science, Vol. 41, 1990.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BSL6-0015-0045