Powiadomienia systemowe
- Sesja wygasła!
- Sesja wygasła!
- Sesja wygasła!
Identyfikatory
Warianty tytułu
Języki publikacji
Abstrakty
Crawling systems available on the market are usually closed solutions dedicated to performing a particular kind of task. There is a meaningful group of users, however, which require an all–in–one studio, not only for executing and running Internet robots, but also for (graphical) (re)defining and (re)composing crawlers according to dynamically changing requirements and use–cases. The Cassiopeia framework addresses the above idea. The crucial aspect regarding its efficiency and scalability is concurrency model applied. One of the promising models is staged event–driven architecture providing some useful benefits, such as splitting an application into separate stages connected by events’ queues–which is interesting, taking into account Cassiopeia’s assumptions regarding crawler (re)composition. The goal of this paper is to present the idea and PoC implementation of the Cassiopeia framework, with special attention paid to its crucial architectural element; i.e., design, implementation, and application of staged event–driven architecture.
Wydawca
Czasopismo
Rocznik
Tom
Strony
645--665
Opis fizyczny
Bibliogr. 19 poz., rys., wykr., tab.
Twórcy
autor
- AGH University of Science and Technology, Krakow, Poland
autor
- AGH University of Science and Technology, Krakow, Poland
autor
- AGH University of Science and Technology, Krakow, Poland
Bibliografia
- [1] Arasu A., Cho J., Garcia-Molina H., Paepcke A., Raghavan S.: Searching the Web. ACM Trans. Internet Technol., vol. 1(1), pp. 2–43, 2001. ISSN 1533-5399. http://dx.doi.org/10.1145/383034.383035.
- [2] Bharat K., Broder A.: A technique for measuring the relative size and overlap of public Web search engines. Comput. Netw. ISDN Syst., vol. 30(1-7), pp. 379–388, 1998. ISSN 0169-7552. http://dx.doi.org/10.1016/S0169-7552(98)00127-5.
- [3] Boldi P., Codenotti B., Santini M., Vigna S.: UbiCrawler: a scalable fully distributed web crawler. Softw. Pract. Exper., vol. 34(8), pp. 711–726, 2004. ISSN 0038-0644. http://dx.doi.org/10.1002/spe.587.
- [4] Broder A., Kumar R., Maghoul F., Raghavan P., Rajagopalan S., Stata R.,Tomkins A., Wiener J.: Graph structure in the Web. Computer Networks, vol. 33(1ż6), pp. 309–320, 2000. ISSN 1389-1286. http://dx.doi.org/http: //dx.doi.org/10.1016/S1389-1286(00)00083-9.
- [5] Cho J., Garcia-Molina H.: Parallel crawlers. In: In Proceedings of the 11th international conference on World Wide Web, pp. 124–135. ACM Press, 2002.
- [6] Donald K., Sampaleanu C., Johnson R., Hoeller J.: Spring framework reference documentation.
- [7] Gulli A., Signorini A.: The indexable web is more than 11.5 billion pages. In: Special interest tracks and posters of the 14th international conference on World Wide Web, WWW ’05, pp. 902–903. ACM, New York, NY, USA, 2005. ISBN 1-59593-051-5. http://dx.doi.org/10.1145/1062745.1062789.
- [8] Karger D., Lehman E., Leighton T., Panigrahy R., Levine M., Lewin D.: Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web. In: Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, STOC ’97, pp. 654–663. ACM, New York, NY, USA, 1997. ISBN 0-89791-888-6. http://dx.doi.org/10.1145/258533.258660.
- [9] Karger D., Sherman A., Berkheimer A., Bogstad B., Dhanidina R., Iwamoto K., Kim B., Matkins L., Yerushalmi Y.: Web caching with consistent hashing. Computer Networks, vol. 31(11ż16), pp. 1203–1213, 1999. ISSN 1389-1286. http://dx.doi.org/http://dx.doi.org/10.1016/S1389-1286(99)00055-9.
- [10] Kluczny M.: SEDA as an architecture for efficient, distributed and concurrent systems for web crawling purposes. Master’s thesis, Department of Computer Science, University of Science and Technology, 2012.
- [11] Menczer F., Pant G., Srinivasan P., Ruiz M. E.: Evaluating topic-driven web crawlers. In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 241–249. ACM, 2001.
- [12] Michelson B. M.: Event-driven architecture overview. Patricia Seybold Group, vol. 2, 2006.
- [13] Pai V. S., Druschel P., Zwaenepoel W.: Flash: an efficient and portable web server. In: Proceedings of the annual conference on USENIX Annual Technical Conference, ATEC ’99, pp. 15–15. USENIX Association, Berkeley, CA, USA, 1999. http://dl.acm.org/citation.cfm?id=1268708.1268723.
- [14] Pariag D., Brecht T., Harji A., Buhr P., Shukla A., Cheriton D. R.: Comparing the performance of web server architectures. ACM SIGOPS Operating Systems Review , vol. 41(3), pp. 231–243, 2007.
- [15] Shkapenyuk V., Suel T.: Design and implementation of a high-performance distributed Web crawler. In: Data Engineering, 2002. Proceedings. 18th International Conference on, pp. 357–368. 2002. ISSN 1063-6382. http://dx.doi.org/10. 1109/ICDE.2002.994750.
- [16] Singh A., Srivatsa M., Liu L., Miller T.: Apoidea: A Decentralized Peer-to-Peer Architecture for Crawling the World Wide Web. In: J. Callan, F. Crestani, M. Sanderson, eds., Distributed Multimedia Information Retrieval, Lecture Notes in Computer Science, vol. 2924, pp. 126–142. Springer Berlin Heidelberg, 2004. ISBN 978-3-540-20875-4. http://dx.doi.org/10.1007/978-3-540-24610-7_ 10.
- [17] Welsh M.: A Retrospective on SEDA. 2010.
- [18] Welsh M., Culler D., Brewer E.: SEDA: an architecture for well-conditioned, scalable internet services. SIGOPS Oper. Syst. Rev. 2001. ISSN 0163-5980. http://dx.doi.org/10.1145/502059.502057.
- [19] Wlodarczyk L.: Kassiopeia – distributed and pluginnable crawling system. Master’s thesis, Department of Computer Science, University of Science and Technology, 2011., vol. 35(5), pp. 230–243.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-5f04cbf3-fb36-4520-a4e1-f6feac64c640