PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

“CRAWL.PL” Measuring Statistical and Structural Properties of the Polish Web : Technical Report

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
This document summarizes the results of an experiment made in the Polish-Japanese Institute of Information Technology, Warsaw, Poland during autumn 2005 and winter 2006. The goal of the project was to collect and analyze large portion of Polish Web documents in order to characterize the structure and other properties of the „.pl” domain. Up to the knowledge of the authors, it was the first publicly reported research experiment of this kind over the Polish Web. The following sections include information about downloaded Web pages, Web sites, and their characteristics. We also present various statistics concerning hosts and domains, as well as the link structure. Among the results of the experiment are the first data sets representing graphs of the Polish Web which will be publicly available for other researchers.
Słowa kluczowe
Rocznik
Tom
Strony
43--73
Opis fizyczny
Bibliogr. 17 poz., rys., tab., wykr.
Twórcy
autor
  • University of Rome „La Sapienza”, currently at Yahoo! Research Barcelona, Italy
autor
  • Polish-Japanese Institute of Information Technology, ul. Koszykowa 86, 02-008 Warsaw, Poland
autor
  • Polish-Japanese Institute of Information Technology, ul. Koszykowa 86, 02-008 Warsaw, Poland
Bibliografia
  • 1. http://www. netsprint.pl/serwis/.
  • 2. http://www. users.pjwstk. edu.pl/~msyd/Polish WebDatasets. html.
  • 3. Baeza-Yates R., Castillo C., (Nov 2000), Characterizing the chilean web, Chilean Computer Science Congress, Santiago, Chile.
  • 4. Baeza-Yates R., Lalanne F., (2004), Characterization of the Korean Web, Technical report.
  • 5. Baeza-Yates R., Castillo C., Efthimiadis E., (2006), Characterization of national web domains, ACM TOIT.
  • 6. Baeza-Yates R., Castillo C., Efthimiadis E., (2004), Comparing the characteristics of the chilean and the greek web, Technical report.
  • 7. Boldi P., Codenotti B., Santini M., Vigna S., (2002), Structural properties of the african web, In: Proceedings of the 11th International WWW Conference(11), Honolulu, Hawaii, USA.
  • 8. Broder, Kumar R., Maghoul F., Raghavan P., Rajagopalan S., Stata R., Tomkins A., Wiener J., (2000), Graph structure in the web, In: Proceedings of the 9th WWW Conference.
  • 9. Gyongyi Z., Garcia-Molina H., (2005), Web spam taxonomy, In: First International Workshop on Adversarial Information Retrieval on the Web.
  • 10. Kamvar S., Haveliwala T., Manning C., Golub G., (2003), Exploiting the block structure of the web for computing pagerank, In: Stanford University Technical Report.
  • 11. Kleinberg J., Kumar R., Raghavan P., Rajagopalan S., Tomkins A., (1999), The web as a graph: measurements, models and methods, In: Proceedings of the 5th Annual International Computing and Combinatorics Conference.
  • 12. Kumar R., Raghavan P., Rajagopalan S., Sivakumar D., Tomkins A., Upfal E., (2000), The Web as a graph. In: Proc. 19th ACM SIGACT-SIGMOD- AIGART Symp. Principlesof Database Systems, PODS, pages 1-10. ACM Press, 15-17.
  • 13. Huberman B., Adamic L., (2000), Power law distribution of the world wide web, Technical comment. Science, 287.
  • 14. Bayeza-Yates R., Castillo C. (2006), Relationship between links and trade, In Proceedings of the 15th World Wide Web Conference (posters), Edinburgh, Scotland, May 2006., pages 927-928.
  • 15. Fetterly D., Manasse M., Najork M., (2004) On the evolution of clusters of near-duplicate web pages, Journal of Web Engineering, 2(4):228-246.
  • 16. Leskovec J., Kleinberg J., Falutsos C., (2005), Graphs over time: densification laws, shrinking diameters and possible explanations. In: KDD’05: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 177-187, New York, NY, USA.
  • 17. Saint-Jean F., Baeza-Yates R., Castillo C., (2003), Web dynamics, structure, page quality, In: Proceedings of the 12th International WWW Conference, Workshop on Algorithms and Models for the Web Graph.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-d51b6851-28c9-4ae1-b0df-ec933b768502
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.