Wyniki wyszukiwania - BazTech

Ograniczanie wyników

Znaleziono wyników: 3

Liczba wyników na stronie

Wyniki wyszukiwania

Wyszukiwano:
w słowach kluczowych: web crawling

Sortuj według:

Ogranicz wyniki do:

Web pages content analysis using browser-based volunteer computing

Turek W., Nawarecki E., Dobrowolski G., Krupa T., Majewski P.

Computer Science

2013

Vol. 14 (2)

215--230

Existing solutions to the problem of finding valuable information on the Web suffers from several limitations like simplified query languages, out-of-date in- formation or arbitrary results sorting. In this paper a different approach to this problem is described. It is based on the idea of distributed processing of Web pages content. To provide sufficient performance, the idea of browser-based volunteer computing is utilized, which requires the implementation of text processing algorithms in JavaScript. In this paper the architecture of Web pages content analysis system is presented, details concerning the implementation of the system and the text processing algorithms are described and test results are provided.

Detektory zasobów informacji w crawlingu polskiego Internetu na przykładzie przemysłu tłoczniczego

Opaliński A., Turek W., Głowacki M., Hojny M.

Czasopismo Techniczne. Mechanika

2011

R. 108, z. 4-M/2

401-408

W artykule zaprezentowano koncepcję stworzenia narzędzia wspomagającego wyszukiwanie informacji zgromadzonych w zasobach polskiego Internetu. Działa ono opierając się na systemie zbierającym i indeksującym dane oraz dedykowane gramatyki wyszukiwania, pozwalając efektywniej odnajdywać wartościowe informacje w sieci. Zaprezentowano przewagę prezentowanej koncepcji w porównaniu z rezultatami otrzymanymi przy użyciu wyszukiwarki Google dla przykładu z przemysłu tłoczniczego. Zaprezentowano także możliwości adaptacji systemu do innych gałęzi przemysłu oraz ewolucję jego wersji podstawowej.

The paper presents the idea of an information extraction and search support system based on polish Web resources. System consist web crawling, data indexing and dedicated grammar syntax modules, which results with results quality improvement. As an usage example, it is presented stamp industry use case, compared to Google search results. Possible usage domains, improvement and evolution directions are shown in conclusion.

“CRAWL.PL” Measuring Statistical and Structural Properties of the Polish Web : Technical Report

Castillo C., Starosta B., Sydow M.

Studia Informatica : systems and information technology

2007

Vol. 1(8)

43--73

This document summarizes the results of an experiment made in the Polish-Japanese Institute of Information Technology, Warsaw, Poland during autumn 2005 and winter 2006. The goal of the project was to collect and analyze large portion of Polish Web documents in order to characterize the structure and other properties of the „.pl” domain. Up to the knowledge of the authors, it was the first publicly reported research experiment of this kind over the Polish Web. The following sections include information about downloaded Web pages, Web sites, and their characteristics. We also present various statistics concerning hosts and domains, as well as the link structure. Among the results of the experiment are the first data sets representing graphs of the Polish Web which will be publicly available for other researchers.