Juicer - a data mining approach to information extraction from the WWW

Masłowska, I.; Weiss, D.

Artykuł - szczegóły

Tytuł artykułu

Juicer - a data mining approach to information extraction from the WWW

Autorzy

Masłowska I. , Weiss D.

Wybrane pełne teksty z tego czasopisma

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

We present a novel approach to automatic text mining on the World Wide Web. Considering the fact that the enormously dynamic growth of the WWW results in a need for new, more powerful information extraction tools we designed and implemented a system, which adapts techniques originally introduced in the field of data mining. We believe that similar systems, which usually base on machine learning or natural language processing methods, can prove to be ineffective when dealing with the very large numbers of hypertext documents of different structure and subject. Moreover, such systems tend to treat HTML documents as plain texts not taking into account the additional information contained in their markup tags.

Słowa kluczowe

WWW world wide web data mining

WWW wybieranie danych

Wydawca

Wydawnictwo Politechniki Poznańskiej

Czasopismo

Foundations of Computing and Decision Sciences

Rocznik

2000

Tom

Vol. 25, No. 2

Strony

67--87

Opis fizyczny

Bibliogr. 13 poz.

Twórcy

autor

Masłowska I.

autor

Weiss D.

Institute of Computing Science, Poznań University of Technology, Piotrowo 3A, 60-965 Poznań, Poland

Bibliografia

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BPP1-0017-0089