Wyniki wyszukiwania - BazTech

1

A k-Nearest Neighbors Method for Classifying User Sessions in E-Commerce Scenario

Suchacka G., Skolimowska-Kulig M., Potempa A.

Journal of Telecommunications and Information Technology

|

2015

|

nr 3

64--69

EN

This paper addresses the problem of classification of user sessions in an online store into two classes: buying sessions (during which a purchase confirmation occurs) and browsing sessions. As interactions connected with a purchase confirmation are typically completed at the end of user sessions, some information describing active sessions may be observed and used to assess the probability of making a purchase. The authors formulate the problem of predicting buying sessions in a Web store as a supervised classification problem where there are two target classes, connected with the fact of finalizing a purchase transaction in session or not, and a feature vector containing some variables describing user sessions. The presented approach uses the k-Nearest Neighbors (k-NN) classification. Based on historical data obtained from online bookstore log files a k-NN classifier was built and its efficiency was verified for different neighborhood sizes. A 11-NN classifier was the most effective both in terms of buying session predictions and overall predictions, achieving sensitivity of 87.5% and accuracy of 99.85%.

2

Towards Finding Scholarly Articles in Internet Using Hadoop MapReduce with Oozie Workflow

Jurkiewicz J., Nowiński A.

Challenges of Modern Technology

|

2013

|

Vol. 4, no. 4

3--6

EN

An article focuses on the new methods for automatic processing and analysis of the scientific papers. It covers the very first part of this task – discovery and harvesting of scientific publications from the internet. Article is focused on discovery and analysis of the html documents to identify publication resources. Usage of data from Common Crawl project allows operating on large subset of the web pages without a need to perform an expensive crawl of the WWW. We present methods for automatic identification of pages describing scholarly documents in WWW network using html meta headers. Presented set of rules applied to the data achieves reasonable quality. A system based on these tools is also presented. It allows easy operating and transferring output to the COntent ANalysis SYStem(CoAnSys) - a processing and analysis system developed in ICM. For achieving this goal set of MapReduce tasks running with Hadoop And Ozzie has been used. The quality and efficiency of described rules are discussed. Finally future challenges for our system are presented.

3

Wyszukiwanie informacji z uwzględnieniem danych dotyczących lokalizacji

Kotulla A.

Studia Informatica

|

2011

|

Vol. 32, nr 2B

73-83

PL

Znacząca część zapytań realizowanych przez wyszukiwarki internetowe dotyczy wyszukiwania lokalnego. W artykule omówiona została problematyka wyszukiwania informacji uwzględniających lokalizację. Zaproponowano system dla zasobów internetowych w języku polskim, umożliwiający pozyskiwanie informacji uwzględniających lokalizację.

EN

A significant part of search queries parsed by web search engines refers to local resources. The problem of searching for information considering localization details is described in this paper. A new local search system is introduced, for web resources in polish language.

4

Asymptotic trust algorithm: extension for reputation systems in online auctions

Leszczyński K., Zakrzewicz M.

Control and Cybernetics

|

2011

|

Vol. 40, no 3

651-666

EN

Online auctions have become a big business and the number of auction site users is growing rapidly. These virtual marketplaces give traders a lot of opportunities to find a contracting party. However, lack of physical contact between users decreases the degree of trust. Auction portals require an efficient mechanism for building trust between participants, whereas most of them provide simple participation counts for reputation rating. Moreover, a single opinion has virtually no effect on a big online store that already has many reputation points, so buyers are very hesitant to give negative feedback for fear of retaliation. Consequently, almost no negative feedback is provided1. In this paper we introduce a new trust system called Asymptotic Trust Algorithm (ATA) which prevents many fraud attempts and still is both simple and easy to understand for most users. Our new method can be applied in addition to the participation counts systems currently used by Allegro, eBay and most of other online auction sites because it does not require any additional information other than positive, negative or neutral feedback on transactions. Most importantly, ATA encourages users to submit unbiased comments, regardless of the number of previous transactions.

5

Analiza zasobów inernetowych na podstawie struktury połączeń

Kotulla A.

Studia Informatica

|

2010

|

Vol. 31, nr 2B

303-312

PL

Opracowanie omawia możliwości analizy zasobów sieci World Wide Web na podstawie struktury połączeń. Przedstawione są dwa najważniejsze podejścia, wyszukiwanie zasobów w całej sieci oraz wyszukiwanie informacji w zależnej od zapytania części sieci. Wskazano nowe obszary zastosowań dla metod analizy struktury połączeń.

EN

This study discusses the possibilities of the analysis of the resources of the World Wide Web network due to the link structure. The most important ways, the searching for resources in the entire network and the searching for information in the query-depended part of the network, are presented. New application areas of the link structure analysis are indicated.

6

Mining indirect association rules for web recommendation

Kazienko P.

International Journal of Applied Mathematics and Computer Science

|

2009

|

Vol. 19, no 1

165-186

EN

Classical association rules, here called 'direct', reflect relationships existing between items that relatively often co-occur in common transactions. In the web domain, items correspond to pages and transactions to user sessions. The main idea of the new approach presented is to discover indirect associations existing between pages that rarely occur together but there are other, 'third' pages, called transitive, with which they appear relatively frequently. Two types of indirect associations rules are described in the paper: partial indirect associations and complete ones. The former respect single transitive pages, while the latter cover all existing transitive pages. The presented IDARM* Algorithm extracts complete indirect association rules with their important measure-confidence-using pre-calculated direct rules. Both direct and indirect rules are joined into one set of complex association rules, which may be used for the recommendation of web pages. Performed experiments revealed the usefulness of indirect rules for the extension of a typical recommendation list. They also deliver new knowledge not available to direct ones. The relation between ranking lists created on the basis of direct association rules as well as hyperlinks existing on web pages is also examined.

7

Przegląd metod ekstrakcji wiedzy w serwisach WWW - Web Structure Mining

Szełemej Ł.

Metody Informatyki Stosowanej

|

2008

|

nr 2 (Tom 15)

109-116

EN

This article presents Web Structure Mining method based on extracted Web data. For analisys this method uses hyperlinks connecting different webpages. Internet is treated as a special graph which points are specified webpages and links present its edges. That data is used to create easy in use Internet structures where query processes are easier, faster and more efficient.

8

Effective Prediction ofWeb User Behaviour with User-Level Models

Dembczyński K., Kotłowski W.

Fundamenta Informaticae

|

2008

|

Vol. 89, nr 2-3

189-206

EN

The paper concerns the problem of predicting behaviour of web users, based on real historical data which constitutes an important issue in web mining. The research reported here was conducted while the authors participated in the international ECML/ PKDD 2007 Discovery Challenge competition – Track 1. The results presented here ended up as the winning solution to the contest. We describe the contest tasks and the real industrial datasets concerning the recorded behaviour of sample of Polish Web users on which our experiments were performed. We present the whole extensive experimental process from the data preprocessing phase to exploratory analysis of the data to the experimental comparison and discussion of various prediction models which we examined. As we explain, our solution has low time and space complexity, scales well with large datasets and, at the same time, produces high-quality results.

9

Mailing Lists Archives Analyzer

Rzecki K., Riegel M.

Studia Informatica : systems and information technology

|

2006

|

Vol. 1(7)

117--125

EN

Article describes chance to explore data hidden in headers of e-mails taken from archive of mailing lists. Scientist part of the article presents a way of transforms information enclosed in Internet resources, explains idea of mailing lists archive and points out knowledge can be taken from. Technical part presents implemented and working system analyzing headers of e-mail messages stored in mailing lists archives. Some example results of this experiment are also given.

10

Rola web miningu w procesie personalizacji dialogu z klientem

Zdonek I.

Zeszyty Naukowe. Organizacja i Zarządzanie / Politechnika Śląska

|

2004

|

z. 20 cz. 1

277-283

PL

Artykuł poświęcony jest jednej z technik pozyskiwania wiedzy o kliencie z internetowych stron przedsiębiorstwa. Nowy trend w marketingu ukierunkowany przede wszystkim na zaspakajanie indywiduwalnych potrzeb klientów wymusza poszukiwania nowych źródeł wiedzy o tych potrzebach. Kwesta ta rodzi nie tylko spore problemy techniczne, ale także problemy natury moralnej związane z przyzwoleniem środowisk konsumenckich na pozyskiwanie i wykorzystywanie tego typu informacji.

EN

In this work a technique of knowledge collection about client from a website is presented. A new marketing trend, which is guided by client's needs forces companies to search trough fresh data sources for new information. This trend creates a few problems not only technical but also of ethical nature. Some customer's environments do not consent to accumulate and use this kind of information.

11

Knowledge discovery in the Internet

Gawrysiak P., Okoniewski M.

Archiwum Informatyki Teoretycznej i Stosowanej

|

2000

|

T. 12, z. 3

203-233

EN

With the rapid expansion of the World Wide Web, the need for efficient data retrieval strategies becomes stronger and will be still growing. Unfortunately classical information retrieval techniques, developed for well-organized collections of textual data do not seem to be able to cope with diversity and amount of information available throughout the Internet. This paper presents some of the newest approaches to information retrieval in large, unstructured hypertext spaces - such as WWW - that focus more on latent information embedded in hyperlinks and document structure, then on actual understanding of Web pages textual content. These techniques, that are marking the new trends and prospects for the Internet technology, have been given recently the name "Web mining", as in fact they are examples of unsupervised machine learning similar to data mining and text mining. Here we discuss methods belonging to the following three groups: link topology analysis, statistical text analysis and query languages and systems design.

PL

Wraz z gwałtownym zwiększaniem się zasobów WWW wzrasta również potrzeba opracowania efektywnych strategii wyszukiwania danych. Klasyczne metody dostosowane do dobrze zorganizowanych struktur danych tekstowych wydają się być niewystarczające w przypadku danych zawartych w Internecie. Niniejszy artykuł prezentuje najnowsze podejścia do wyszukiwania informacji dostępnej w dużych hipertekstowych strukturach danych jak WWW i skupia się na informacji dostępnej w połączeniach pomiędzy stronami WWW oraz rozumieniu zawartości tekstowej stron WWW. Prezentowane metody uzyskały ostatnio angielską nazwę "Web mining" i są przykładem samodzielnego pozyskiwania wiedzy przez maszyny. Dyskutowane metody należą do trzech grup: analizy topologii połączeń, analizy statystycznej tekstu i języków zapytań oraz projektowania systemów.