W niniejszej pracy przedstawiono rozszerzenia algorytmów PLSA i PHIT do grupowania dokumentów tekstowych. Główna idea rozszerzenia polega na wykorzystaniu sieci bayesowskiej typu TAN zamiast sieci naiwnej, jak ma to miejsce w algorytmach pierwotnych.
The paper proposes a new interpretation of the concept of cyclic Bayesian Networks, based on stationary Markov processes over feature vector state transitions.
3
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
Bayesian networks have many practical applications due to their capability to represent joint probability distribution in many variables in a compact way. There exist efficient reasoning methods for Bayesian networks. Many algorithms for learning Bayesian networks from empirical data have been developed. A well-known problem with Bayesian networks is the practical limitation for the number of variables for which a Bayesian network can be learned in reasonable time. A remarkable exception here is the Chow/Liu algorithm for learning tree-like Bayesian networks. However, its quadratic time and space complexity in the number of variables may prove also prohibitive for high dimensional data. The paper presents a novel algorithm overcoming this limitation for the tree-like class of Bayesian networks. The new algorithm space consumption grows linearly with the number of variables n while the execution time is proportional to nźln(n), hence both are better than those of Chow/Liu algorithm. This opens new perspectives in construction of Bayesian networks from data containing tens of thousands and more variables, e.g. in automatic text categorization.
Bayesian networks have many practical applications due to their capability to represent joint probability distribution over many variables in a compact way. Though there exist many algorithms for learning Bayesian networks from data, they are not satisfactory because the learned networks usually are not suitable directly for reasoning as they need to be transformed to some other form (tree, polytree, hypertree) statically or dynamically, and this transformation is not trivial [25]. So far only a restricted class of very simple Bayesian networks: trees and poly-trees are directly applicable in reasoning. This paper defines and explores a new class of networks: the Structured Bayesian Networks. Two methods of reasoning are outlined for this type of networks. Possible methods of learning from data are indicated. Similarity to hierarchical networks is pointed at.
7
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
Bayesian networks have many practical applications due to their capability to represent joint probability distribution in many variables in a compact way. Though there exist many algorithms for learning Bayesian networks from data, they are not satisfactory because the learned networks usually are not suitable for reasoning. So far only a restricted class of very simple Bayesian networks: trees and poly-trees are directly applicable in reasoning. This paper defines and explores a new class of networks: the Structured Bayesian Networks. Two methods of reasoning are outlined for this type of networks. Possible methods of learning from data are indicated. Similarity to hiearachical networks is pointed at.
The purpose of this article is to introduce a new analytical framework dedicated to measuring performance of recommender systems. A standard approach is to assess the quality of a system by means of accuracy related statistics. However, the specificity of the environments in which recommender systems are deployed requires paying much attention to speed and memory requirements of the algorithms. Unfortunately, it is implausible to assess accurately the complexity of various algorithms with formal tools. This can be attributed to the fact that such analyses are usually based on an assumption of dense representation of underlying data structures. In real life, though, the algorithms operate on sparse data and are implemented with collections dedicated for them. Therefore, we propose to measure the complexity of recommender systems with artificial datasets that posses real-life properties. We utilize a recently developed bipartite graph generator to evaluate how the state-of-art recommender system behavior is determined and diversified by topological properties of the generated datasets.
The paper presents a new algorithm for the problem of an enumeration protocol for nodes in a network. The new algorithm, contrary to previous ones, is local both in information access (neighbourhood only) and information stored (proportional to the number of neighbours). This property is achieved at the expense of the type of connectivity the network is assumed to exhibit.
PL
W pracy przedstawiono nowy algorytm enumeracji węzłów sieci. W odróżnieniu od dotychczasowych algorytmów jest on lokalny zarówno w sensie dostępu do informacji (uwzględnia się wyłącznie informacje pochodzące od sąsiadów aktualnie przetwarzanego węzła) jak i przechowywania informacji (ilość informacji jest proporcjonalna do liczby sąsiadów danego węzła). Cechę lokalności uzyskano zawężając rozważania do rodziny grafów triangulowanych, które odgrywają podstawową rolę w teorii sieci bayesowskich. Uogólnieniem tych ostatnich są systemy z wartościowaniami, nazywane też grafowymi systemami ekspertowymi, czyli struktury grafowe służące do reprezentacji niedeterministycznych zależności między zmiennymi (odpowiadają im węzły grafu).
The purpose of this article is to introduce a new bipartite graph generation algorithm. Bipartite graphs consist of two types of nodes and edges join only nodes of different types. This data structure appears in various applications (e.g. recommender systems or text clustering). Both real-life datasets and formal tools enable us to evaluate only a limited set of properties of the algorithms that are used in such situations. Therefore, artificial datasets are needed to enhance development and testing of the algorithms. Our generator can be used to produce a wide range of synthetic datasets.
11
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
This paper is concerned with seeking new applications for the Dempster-Shafer Theory that are by their nature better suited to the axiomatic framework of this theory. In particular, wafer processing on a integrated circuits production line, chemical product quality evaluation etc. are considered. Some extensions to basic DST formalism are envisaged.
This paper contains some results of literature research and special study of research from methods for making discovery. It is lead comparative study of applied method for making discovery in big knowledge bases, and in databases at first. It is discussed over ground elements of same important methods and some examples their applications. It is pay attention for need elaboration of systemic neural network for leading discovery in big knowledge bases of systems, processes, phenomena’s, etc. It is showed also some results of new trends in data mining and in development idea self-organising neural network.
14
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
In this paper we investigate the impact of semantic information on the quality of hierarchical, fuzzy-based clustering of a collection of textual documents. We show that via a relevant tagging of a part of the documents one can improve the quality of overall clustering, both of tagged and un-tagged documents.
Different methods for computing PageRank vectors are analysed. Particularly, we note the opposite behavior of the power method and the Monte Carlo method. Further, a method of reducing the number of iterations of the power method is suggested.
The paper presents a proposal of a set of measures for comparison of maps of document collections as well as preliminary results concerning evaluation of their usefulness and expressive power.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.