Wyniki wyszukiwania - BazTech

1

On graph mining with deep learning: introducing model r for link weight prediction

Hou Yuchen, Holder Lawrence B.

Journal of Artificial Intelligence and Soft Computing Research

|

2019

|

Vol. 9, No. 1

21--40

EN

Deep learning has been successful in various domains including image recognition, speech recognition and natural language processing. However, the research on its application in graph mining is still in an early stage. Here we present Model R, a neural network model created to provide a deep learning approach to the link weight prediction problem. This model uses a node embedding technique that extracts node embeddings (knowledge of nodes) from the known links’ weights (relations between nodes) and uses this knowledge to predict the unknown links’ weights. We demonstrate the power of Model R through experiments and compare it with the stochastic block model and its derivatives. Model R shows that deep learning can be successfully applied to link weight prediction and it outperforms stochastic block model and its derivatives by up to 73% in terms of prediction accuracy. We analyze the node embeddings to confirm that closeness in embedding space correlates with stronger relationships as measured by the link weight. We anticipate this new approach will provide effective solutions to more graph mining tasks

2

An efficient approach for view selection for data warehouse using tree mining and evolutionary computation

Thakare A., Deshpande P.

Computer Science

|

2018

|

Vol. 19 (4)

431--455

EN

The selection of a proper set of views to materialize plays an important role in database performance. There are many methods of view selection that use different techniques and frameworks to select an efficient set of views for materialization. In this paper, we present a new efficient scalable method for view selection under the given storage constraints using a tree mining approach and evolutionary optimization. The tree mining algorithm is designed to determine the exact frequency of (sub)queries in the historical SQL dataset. The Query Cost model achieves the objective of maximizing the performance benefits from the final view set that is derived from the frequent view set given by the tree mining algorithm. The performance benefit of a query is defined as a function of query frequency, query creation cost, and query maintenance cost. The experimental results show that the proposed method is successful in recommending a solution that is fairly close to an optimal solution.

3

B FGMAC : breadth - first frequent subgraph mining with ARC consistency

Douar B., Latiri C., Liquiere M., Slimani Y.

Journal of Artificial Intelligence and Soft Computing Research

|

2011

|

Vol. 1, No. 4

269--281

EN

The paper presents a new projection operator for graphs named AC-projection, which exhibits nice theoretical complexity properties unlike to the graph isomorphism operator typically used in graph mining. We study the size of the search space as well as some practical properties of the projection operator. We also introduce a novel breadth-first algorithm for frequent AC-reduced subgraphs mining. Then, we prove experimentally that we can achieve an important performance gain (polynomial complexity projection) without or with non-significant loss of discovered patterns in terms of quality.

4

Inferring graph grammars by detecting overlap in frequent subgraphs

Kukluk J. P., Holder L. B., Cook D. J.

International Journal of Applied Mathematics and Computer Science

|

2008

|

Vol. 18, no 2

241-250

EN

In this paper we study the inference of node and edge replacement graph grammars. We search for frequent subgraphs and then check for an overlap among the instances of the subgraphs in the input graph. If the subgraphs overlap by one node, we propose a node replacement graph grammar production. If the subgraphs overlap by two nodes or two nodes and an edge, we propose an edge replacement graph grammar production. We can also infer a hierarchy of productions by compressing portions of a graph described by a production and then inferring new productions on the compressed graph. We validate the approach in experiments where we generate graphs from known grammars and measure how well the approach infers the original grammar from the generated graph. We show graph grammars found in biological molecules, biological networks, and analyze learning curves of the algorithm.

5

A Restarted Strategy for Efficient Subsumption Testing

Kuželka O., Železný F.

Fundamenta Informaticae

|

2008

|

Vol. 89, nr 1

95-109

EN

We study runtime distributions of subsumption testing. On graph data randomly sampled from two different generative models we observe a gradual growth of the tails of the distributions as a function of the problem instance location in the phase transition space. To avoid the heavy tails, we design a randomized restarted subsumption testing algorithm RESUMER2. The algorithm is complete in that it correctly decides both subsumption and non-subsumption in finite time. A basic restarted strategy is augmented by allowing certain communication between odd and even restarts without losing the exponential runtime distribution decay guarantee resulting from mutual independence of restart pairs. We empirically test RESUMER2 against the state-of-the-art subsumption algorithm Django on generated graph data as well as on the predictive toxicology challenge (PTC) data set. RESUMER2 performs comparably with Django for relatively small examples (tens to hundreds of literals), while for further growing example sizes, RESUMER2 becomes vastly superior.

6

Constructing a Decision Tree for Graph-Structured Data and its Applications

Geamsakul W., Yoshida T., Ohara K., Motoda H., Yokoi H., Takabayashi K.

Fundamenta Informaticae

|

2005

|

Vol. 66, nr 1,2

131--160

EN

A machine learning technique called Graph-Based Induction (GBI) efficiently extracts typical patterns from graph-structured data by stepwise pair expansion (pairwise chunking). It is very efficient because of its greedy search. Meanwhile, a decision tree is an effective means of data classification from which rules that are easy to understand can be obtained. However, a decision tree could not be constructed for the data which is not explicitly expressed with attribute-value pairs. This paper proposes a method called Decision Tree Graph-Based Induction (DT-GBI), which constructs a classifier (decision tree) for graph-structured data while simultaneously constructing attributes for classification using GBI. Substructures (patterns) are extracted at each node of a decision tree by stepwise pair expansion in GBI to be used as attributes for testing. Since attributes (features) are constructed while a classifier is being constructed, DT-GBI can be conceived as a method for feature construction. The predictive accuracy of a decision tree is affected by which attributes (patterns) are used and how they are constructed. A beam search is employed to extract good enough discriminative patterns within the greedy search framework. Pessimistic pruning is incorporated to avoid overfitting to the training data. Experiments using a DNA dataset were conducted to see the effect of the beam width and the number of chunking at each node of a decision tree. The results indicate that DT-GBI that uses very little prior domain knowledge can construct a decision tree that is comparable to other classifiers constructed using the domain knowledge. DT-GBI was also applied to analyze a real-world hepatitis dataset as a part of evidence-based medicine. Four classification tasks of the hepatitis data were conducted using only the time-series data of blood inspection and urinalysis. The preliminary results of experiments, both constructed decision trees and their predictive accuracies as well as extracted patterns, are reported in this paper. Some of the patterns match domain experts' experience and the overall results are encouraging.

7

A General Framework for Mining Frequent Subgraphs from Labeled Graphs

Inokuchi A., Washio T., Motoda H.

Fundamenta Informaticae

|

2005

|

Vol. 66, nr 1,2

53--82

EN

The derivation of frequent subgraphs from a dataset of labeled graphs has high computational complexity because the hard problems of isomorphism and subgraph isomorphism have to be solved as part of this derivation. To deal with this computational complexity, all previous approaches have focused on one particular kind of graph. In this paper, we propose an approach to conduct a complete search for various classes of frequent subgraphs in a massive dataset of labeled graphs within a practical time. The power of our approach comes from the algebraic representation of graphs, its associated operations and well-organized bias constraints to limit the search space efficiently. The performance has been evaluated using real world datasets, and the high scalability and flexibility of our approach have been confirmed with respect to the amount of data and the computation time.