Web pages content analysis using browser-based volunteer computing

Turek, W.; Nawarecki, E.; Dobrowolski, G.; Krupa, T.; Majewski, P.

Artykuł - szczegóły

Tytuł artykułu

Web pages content analysis using browser-based volunteer computing

Autorzy

Turek W. , Nawarecki E. , Dobrowolski G. , Krupa T. , Majewski P.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

Existing solutions to the problem of finding valuable information on the Web suffers from several limitations like simplified query languages, out-of-date in- formation or arbitrary results sorting. In this paper a different approach to this problem is described. It is based on the idea of distributed processing of Web pages content. To provide sufficient performance, the idea of browser-based volunteer computing is utilized, which requires the implementation of text processing algorithms in JavaScript. In this paper the architecture of Web pages content analysis system is presented, details concerning the implementation of the system and the text processing algorithms are described and test results are provided.

Słowa kluczowe

volunteer computing text processing web crawling

Wydawca

Wydawnictwa AGH

Czasopismo

Computer Science

Rocznik

2013

Tom

Vol. 14 (2)

Strony

215--230

Opis fizyczny

Bibliogr. 16 poz., rys., tab.

Twórcy

autor

Turek W.

wojciech.turek@agh.edu.pl

AGH University of Science and Technology, Krakow, Poland

autor

Nawarecki E.

nawar@agh.edu.pl

AGH University of Science and Technology, Krakow, Poland

autor

Dobrowolski G.

grzela@agh.edu.pl

AGH University of Science and Technology, Krakow, Poland

autor

Krupa T.

tkrupa@fidointelligence.com

Fido Intelligence, Gdansk, Poland

autor

Majewski P.

pmajewski@fidointelligence.com

Fido Intelligence, Gdansk, Poland

Bibliografia

[1] Kunder M.: WorldWideWebSize.com, 20.09.2012
[2] Alpert J., Hajaj N.: We knew the web was big..., http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html, 25.07.2008
[3] Net Applications.com, Search Engine Market Share, http://marketshare.hitslink.com/search-engine-market-share.aspx, 20.09.2012
[4] Krupa T., Majewski P., Kowalczyk B., Turek W.: On-Demand Web Search Rusing Browser-Based Volunteer Computing. Proc. of Sixth International Conference on Complex, Intelligent and Software Intensive Systems, pp. 184–190, Palermo, Italy, 2012
[5] Brin S., Page L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. Seventh International World-Wide Web Conference, Brisbane, Australia, 1998
[6] Miller R.C., Bharat K.: SPHINX : A Framework for Creating Personal, Site-Specific Web Crawlers. Proc. of WWW7, Brisbane Australia, 1998
[7] Shoberg J.: Building Search Applications with Lucine and Nutch . ISBN: 978-1590596876, APress 2006
[8] Sigursson K.: Incremental crawling with Heritrix. Proc. of the 5th International Web Archiving Workshop, 2005
[9] Sarmenta L.F.G., Hirano S.: Bayanihan: Building and Studying Volunteer Computing Systems Using Java. Future Generation Computer Systems Special Issue on Metacomputing, vol. 15, no. 5/6. Elsevier Publ., 1999.
[10] Anderson D.P.: BOINC: A rb1 System for Public-Resource Computing and Storage. 5th IEEE/ACM International Workshop on Grid Computing, Pittsburgh, USA, 2004
[11] Korpela E., Werthimer D., Anderson D., Cobb J., Leboisky M.: SETI@home-massively distributed computing for SETI. Computing in Science & Engineering, 3(1): 78–83, 2001.
[12] Cappello F., Djilali S., Fedak G., Herault T., Magniette F., Neri V., Lodygensky O.: Computing on large-scale distributed systems: XtremWeb architecture, programming models, security, tests and convergence with grid. Future Generation Computer Systems, 21(3): 417–437, 2005
[13] Buyya R., Ma T., Safavi-Naini R., Steketee C., Susilo R.: Building computational grids with apple’s Xgrid middleware. Proc. of Australasian workshops on Grid computing and e-research, pp. 47–54, 2006
[14] Venkat J.: Grid computing in the enterprise with the UD MetaProcessor. Peer-to-Peer Computing. Proc. Second International Conference on. 2002
[15] Gears: Gears project, http://webcomputing.iit.bme.hu/, 4.12.2011
[16] Simonarson S.: Browser Based Distributed Computing, TJHSST Senior Research Project Computer Systems Lab. 2010.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-ba7cd269-7b84-4ca5-b6e9-24fc77323d2c