Synthetic Web

Kłopotek, M.

Artykuł - szczegóły

Tytuł artykułu

Synthetic Web

Autorzy

Kłopotek M.

Wybrane pełne teksty z tego czasopisma

http://ipipan.waw.pl/instytut/wydawnictwo/prace-ipi-pan

Identyfikatory

Warianty tytułu

Syntetyczna sieć WWW

Języki publikacji

Abstrakty

We got used to the idea that Web is a collection of interlinked documents containing knowledge from almost all areas of human activity. Recent research points however that the very structure of WWW links may by its own be a rich source of diverse knowledge. Understanding of rules for Web structure and growth may have immense impact in diverse research areas, starting with theory of structure and dynamics of massive graphs, utilitarian search for effective search algorithms on WWW, to the psychological research on formation of social communities and detection of such communities on the Internet. Currently the mainstream research concentrates apparently on construction of simple models helping to explain various basic statistical phenomena observed on the Web. This paper briefly explains major models of the Web and recalls basic contradictions between properties of synthetic Web and the real Web. It also offers a way to resolve one such contradiction concerning the high correlation between PageRank and in-degrees in synthetic Web models, while such a correlation is not present in real Web.

Przyzwyczailiśmy się do traktowania sieci WWW jako zbioru dokumentów zawierających wiedzę prawie ze wszystkich dziedzin naszego życia . Ostatnie badania zdają się wskazywać, że równie głębokim źródłem wiedzy może być sama struktura sieci WWW. Zrozumienie struktury i ewolucji sieci dokumentów na WWW może mieć kolosalne znaczenie dla wielu dziedzin-począwszy od badań naqd strukturą i dynamiką wielkich grafów poprzez utilitarne badania nad efektywnością programów przeczesujących sieć WWW aż po psychologiczne badania nad powstawaniem grup społecznych czy też wykrywaniem takich grup. W chwili obecnej główny nurt badań nad strukturą i rozwojem WWW skupia się na konstruowaniu prostych modeli pozwalających wyjaśnić podstawowe statystyczne własności sieci WWW. W niniejszym artykule przedstawiono ważniejsze koncepcje modelu sieci WWW i wskazano na sprzeczności między miarami rzeczywistej sieci WWW a jej syntetycznymi modelami. Zaproponowano sposób obejścia jednej z takich sprzeczności, jaką jest wysoka korelacja między PageRankiem a stopniem wejściowym stron WWW w sieciach syntetycznych, oraz brakiem takiej korelacji w rzeczywistej sieci WWW.

Słowa kluczowe

graph structure Page rank in-degree power law Web search Web measurement Web models

struktura grafu PageRank stopień wejściowy prawo potęgowe przeszukiwanie WWW pomiary WWW modele sieci WWW

Wydawca

Instytut Podstaw Informatyki PAN

Czasopismo

Prace Instytutu Podstaw Informatyki Polskiej Akademii Nauk

Rocznik

2003

Tom

Nr 954

Strony

1--14

Opis fizyczny

Twórcy

autor

Kłopotek M.

Instytut Podstaw Informatyki Polskiej Akademii Nauk, 01-237 Warszawa, ul. Ordona 21, klopotek@ipipan.waw.pl

Bibliografia

1. L. Adamic and B. Huberman. Power Law distribution of the World Wide Web, Technical Comment on [4], Science, 287, 2000, 2115a.
2. W.Aiello, F. Chung. L.Lu. A random graph model for massive graphs. Proc. ACM Symp. On Theory of computing, pp. 171 -180, 2000.
3. A. Arasu, J. Cho, H. Garcia-Molina. Paepcke, S. Raghavan. Searching the Web ACM Transactions on Internet Technology, 1(1), 2001.2-43.
4. R. Baeza-Yates, F. Saint-Jean and C. Castillo Web Structure, Age, and Page Quality. 2nd International Workshop on Web Dynamics. Honolulu, Hawaii, 7th May 2002,http://www.dcs.bbk.ac.uk/webDyn2 onlin eProceedings.html
5. A. Barabasi and R. Albert. Emergence of Scaling in Random Networks. Science, 286(509), 1999.
6. A. Barabasi. R. Albert, and H. Jeong Scale-free characteristics of random networks : The topology of the World Wide Web. Physica A, 281, 2000, 69-77.
7. A. Barabasi. R.Albert and H. Jeong , Mean-eld theory for scale-free random graphs. Physica A, 272, 1999. 173-187
8. M. Bianchini, M. Gori. F. Scarse: Pager rank. A circuital analysis. 2002.
9. B. Bollobas. Random Graphs, Academic Press, 1990.
10. B. Bollobas. O. Riordan. J. Spencer, and G. Tusnady. The degree sequence of a scale-free random graph process. Randerr: Structures and Algorithms, 18(3), 2001. 279-290.
11. B.Bollobas and 0. Riordan. The diameter of a scale-free random graph, preprint, 2001.
12. S. Brin and L. Page. The anatomy of a large- scale hypertexual Web search engine. In Proceedings of the 7th WWW conference, 1998.
13. A. Broder, R. Kumar, F. Maghoul, P. Raghavan. S. Rajagopalan, R. Stata, Andrew Tomkins, J. Weiner. Graph Structure in the Web. In Proceedings of the 9th WWW Conference, 2000.
14. S. Chien. C. Dwork, R. Kumar, D. Sivakumar: Towards exploiting link evolution.
15. Cohn D., Chang H. (2000): Learning to probabilistically identify authoritative documents. In Proceedings of the 17 th International Conference on Machine Learning, 2000.
16. Cohn, D. and Hofmann, T. (2001). The missing link - a probabilistic model of document content and hypertext connectivity, in T. K. Leen. T. G. Dietterich and V. Tresp (eds), Advances in Neural Information Processing Systems. Vol. 10. http:. citeseer.ni.nec.com cohn01 missing.html
17. S. Dill. R. Kumar, K. McCurley, S. Rajagopalan, D. Sivakumar, and A. Tomkins. Self-Similarity in the Web. In Proceedings of the 27th International Conference on Very Large Databases (VLDB). 2001.
18. Ch. Ding. X.He, P.Husbands, H. Zha, H. Simon: PageRank, HITS and a Unified Framework for Link Analysis Proc. ACM SIGIR 2002
19. D. Eppstein and J. Wang: A Steady State Model for Graph Power Law, 2nd International Workshop on Web Dynamics, Honolulu, Hawaii, 7th May 2002, http://www.dcs.bbk.ac.uk/webDvn2/onlincProceedings.html
20. D. Gibson, J.M. Kleinberg and P. Raghavan. Inferring Web communities from link topology, in Proceedings of the ACM Symposium on Hypertext and Hypermedia, 1998.
21. Google Inc. http://www.google.com
22. Hofmann T. (1999):. Probabilistic latent semantic analysis. In Proceedings of the 15th Conference on Uncertainty in AI, pages 289- 296.
23. J.M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 48:604(632, 1999.
24. J. Kleinberg, S. Ravi Kumar, P. Raghavan, S. Rajagopalan and A. Tomkins. The Web as a graph: measurements, models and methods. In Proceedings of the 5th Annual International Computing and Combinatorics Conference, 1999.
25. M.A.Kłopotek: Intelligent information retrieval on the Web. in: Szczepaniak, Piotr S.; Segovia, Javier; Kacprzyk, Janusz; Zadeh, Lotfi A. (Eds.): (2003) Intelligent Exploration of the Web Springer-Verlag ISBN 3-7908-1529-2, pp. 57-73
26. R. Kumar, P. Raghavan. S. Rajagopalan, and A. Tomkins. Trawling the Web for Emerging Cyber-Communities. In Proceedings of the 8th WWW Conference, 1999,403-4167
27. R. Kumar, P. Raghavan, S. Rajagopalan. D. Sivakumar, A. Tomkins, and E. Upfal. Stochastic Models for the Web. In Proceedings of the 41st Annual Symposium on the Foundations of Computer Science, 2000.
28. L. Laura, S. Leonardi, G. Caldarelli, P. De Los Rios: A Multi-Layer Model for the Web Graph. 2nd International Workshop on Web Dynamics, Honolulu, Hawaii, 7th May 2002, http://www.dcs.bbk.ac.uk/webDvn2/proceeding s/laura multi laverweb graph.pdf
29. R. Motwani and P. Raghavan. Randomized Algorithms, Cambridge University Press, 1995.
30. M.E.J.Newman, Models of the small world. J. Stat. Phys. 101. 819-841 (2000).
31. A.Y. Ng, A. X. Zheng, M. I. Jordan : Stable Algorithms for Link Analysis. In Proc. 24th Annual Intl. ACM SIGIR Conference. ACM, (2001).
32. L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank Citation Ranking: Bringing order to the Web, Technical Report, Computer Science Department, Stanford University, 1998.
33. G. Pandurangan, P. Raghavan, E. Upfal: Using PageRank to Characterize Web Structure Computer Science Department, Brown University, Box 1910, Providence, RI 02912-1910, USA,Http://citeseer.ni.nec.com/rd'80428809%2C494 734%2C I %2C0.25%2CDownload/http%3 AqS qqSqwww.cs.brown.eduaSqpeopieqSqgopalqS qprankfinal.ps
34. C.H. Papadimitriou. Lecture Notes, UC Berkeley. Available at http://www.cs.berkeley.edu/ christos/games/powerlaw.ps
35. D. J. Watts and S. H. Strogatz, Nature 393, 440 (1998).
36. WT10g collection draft paper. http://www.ted.cmis.csiro.au/TRECWebAvtlOg info.ps.gz

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BUJ1-0015-0011