A Dichotomic Search Algorithm for Mining and Learning in Domain-Specific Logics

Ferré, S.; King, R. D.

Artykuł - szczegóły

Tytuł artykułu

A Dichotomic Search Algorithm for Mining and Learning in Domain-Specific Logics

Autorzy

Ferré S. , King R. D.

Wybrane pełne teksty z tego czasopisma

https://fi.episciences.org/

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

Many application domains make use of specific data structures such as sequences and graphs to represent knowledge. These data structures are ill-fitted to the standard representations used in machine learning and data-mining algorithms: propositional representations are not expressive enough, and first order ones are not efficient enough. In order to efficiently represent and reason on these data structures, and the complex patterns that are related to them, we use domain-specific logics. We show these logics can be built by the composition of logical components that model elementary data structures. The standard strategies of top-down and bottom-up search are ill-suited to some of these logics, and lack flexibility. We therefore introduce a dichotomic search strategy, that is analogous to a dichotomic search in an ordered array. We prove this provides more flexibility in the search, while retaining completeness and non-redundancy. We present a novel algorithm for learning using domain specific logics and dichotomic search, and analyse its complexity. We also describe two applications which illustrates the search for motifs in sequences; where these motifs have arbitrary length and length-constrained gaps. In the first application sequences represent the trains of the East-West challenge; in the second application they represent the secondary structure of Yeast proteins for the discrimination of their biological functions.

Słowa kluczowe

data structures domain specific logics search algorithm dichotomy formal concept analysis concept learning pattern mining bioinformatics

Wydawca

IOS Press

Czasopismo

Fundamenta Informaticae

Rocznik

2005

Tom

Vol. 66, nr 1,2

Strony

1--32

Opis fizyczny

Bibliogr. 52 poz., wykr.,

Twórcy

autor

Ferré S.

ferrre@irisa.fr

Irisa, Université de Rennes 1 Campus de Beaulieu, 35042 Rennes cedex, France

autor

King R. D.

rdk@aber.ac.uk

Department of Computer Science University of Wales, Aberystwyth Aberystwyth, SY23 3DB, Wales, UK

Bibliografia

[1] Abe, K., Kawasoe, S., Asai, T., Arimura, H., Arikawa, S.: Optimized Substructure Discovery for Semistructured Data, Principles and Practice of Knowledge Discovery in Databases, LNCS 2431, 2002.
[2] Albert-Lorincz, H., Boulicaut, J.-F.: Mining Frequent Sequential Patterns under Regular Expressions: A Highly Adaptive Strategy for Pushing Contraints, SIAM International Conference on Data Mining (C. K. Daniel Barbará, Ed.), SIAM, 2003.
[3] Alt, M., Martin, F.: Generation of Efficient Interprocedural Analyzers with PAG, Static Analysis Symp., LNCS 983, 1995.
[4] Apostolico, A.: String Editing and Longest Common Subsequences, in: Handbook of Formal Languages (G. Rozenberg,A. Salomaa, Eds.), vol. 2 LinearModeling: Background and Application, chapter 8, Springer-Verlag, Berlin, 1997, 361–398.
[5] Badea, L.: A refinement Operator for Theories, Inductive Logic Programming, LNCS 2157, 2001.
[6] Badea, L., Stanciu,M.: Refinement Operators Can Be (Weakly) Perfect, Int. Conf. Inductive Logic Programming (S. Džeroski, P. A. Flach, Eds.), LNCS 1634, Springer, 1999.
[7] Baker, B. S., Giancarlo, R.: Longest Common Subsequence from Fragments via Sparse Dynamic Programming, Algorithms - ESA’98, 6th Annual European Symposium, LNCS 1461, Springer, 1998.
[8] Brachman, R. J.: On the Epistemological Status of Semantic Nets, in: Associative Networks: Representation of Knowledge and Use of Knowledge by Examples (N. V. Findler, Ed.), Academic Press, New York, 1979.
[9] Brāzma, A., Jonassen, I., Eidhammer, I., Gilbert, D.: Approaches to the automatic discovery of patterns in biosequences, Technical Report 113, Department of Informatics, University of Bergen, Norway, 1995.
[10] Brejová, B., DiMarco, C., Vinaˇr, T., Hidalgo, S. R., Holguin, G., Patten, C.: Finding Patterns in Biological Sequences, Technical Report CS-2000-22, University ofWaterloo, 2000.
[11] Clare, A., King, R. D.: Predicting gene function in Saccharomyces cerevisiae, Bioinformatics, 19(Suppl. 2), 2003, ii42–ii49.
[12] Cohen, W. W., Hirsh, H.: Learning the CLASSIC Description Logic: Theoretical and Experimental Results, Principles of Knowledge Representation and Reasoning: Proc. of the 4th Int. Conf., Morgan Kaufmann, 1994.
[13] Dehaspe, L., Raedt, L. D.: Mining association rules in multiple relations, Int. Workshop Inductive Logic Programming (N. Lavraˇc, S. Džeroski, Eds.), LNCS 1297, Springer, 1997.
[14] Dehaspe, L., Toivonen, H., King, R. D.: Finding frequent substructures in chemical compounds, Int. Conf. Knowledge Discovery and Data Mining (R. Agrawal, P. Stolorz, G. Piatetsky-Shapiro, Eds.), AAAI Press., 1998.
[15] Domingos, P.: The Role of Occam’s Razor in Knowledge Discovery, Data Mining and Knowledge Discovery, 3(4), 1999, 409–425.
[16] Fensel, D., Wiese, M.: Refinement of rule sets with JoJo, Eu. Conf. Machine Learning (P. Brazdil, Ed.), LNCS 667, Springer, 1993.
[17] Ferré, S., Ridoux, O.: A Logical Generalization of Formal Concept Analysis, Int. Conf. Conceptual Structures (G. Mineau, B. Ganter, Eds.), LNCS 1867, Springer, 2000.
[18] Ferré, S., Ridoux, O.: A Framework for Developing Embeddable Customized Logics, Int. Work. Logic-based Program Synthesis and Transformation (A. Pettorossi, Ed.), LNCS 2372, Springer, 2002.
[19] Ferré, S., Ridoux, O.: An Introduction to Logical Information Systems, Information Processing & Management, 40(3), 2004, 383–419.
[20] Finn, V. K.: On Machine-Oriented Formalization of Plausible Reasoning in the Style of F. Backon–J.S.Mill, Semiotika Informatika, 20, 1983, 35–101, In Russian.
[21] Frühwirth, T., Hanschke, P.: Principles and Practice of Constraint Programming, chapter 19 – Terminological Reasoning with Constraint Handling Rules, MIT Press, 1995.
[22] Fürnkranz, J.: Separate-and-ConquerRule Learning, Technical Report OEFAI-TR-96-25,Austrian Research Institute for Artificial Intelligence, Schottengasse 3, A-1010Wien, Austria, 1996.
[23] Ganter, B., Kuznetsov, S.: Formalizing Hypotheses with Concepts, Int. Conf. Conceptual Structures (G. Mineau, B. Ganter, Eds.), LNCS 1867, Springer, 2000.
[24] Ganter, B., Wille, R.: Formal Concept Analysis — Mathematical Foundations, Springer, 1999.
[25] Geamsakul, W., Matsuda, T., Yoshida, T., Motoda, H., Washio, T.: Classifier Construction by Graph-Based Induction for Graph-Structured Data, Pacific-Asia Conf. Advances in Knowledge Discovery and Data-Mining, LNCS 2637, Springer, 2003.
[26] Goffeau et al, A.: The Yeast Genome Directory, Nature, 387, 1997, 1–105.
[27] Inokuchi, A., Washio, T., Motoda, H.: Complete Mining of Frequent Patterns from Graphs: Mining Graph Data, Machine Learning, 50(3), 2003, 321–354.
[28] Jensen, D., Cohen, P.: Multiple Comparisons in Induction Algorithms, Machine Learning, 38(3), 2000, 309–338.
[29] Jonassen, I.: Efficient discovery of conserved patterns using a pattern graph, Technical Report 118, Department of Informatics, University of Bergen, Norway, 1996.
[30] Kononenko, I., Kovačič, M.: Learning as optimization: Stochastic generation of multiple knowledge, Int. Workshop Machine Learning (D. Sleeman, P. Edwards, Eds.), Morgan Kaufmann, 1992.
[31] Lanckriet, G. R., Deng, M., Cristianini, N., Jordan, M. I., Noble, W. S.: Kernel-based data fusion and its application to protein function prediction in Yeast, Pacific Symp. Biocomputing, 2004.
[32] Lee, S. D., Raedt, L. D.: Constraint Based Mining of First-Order Sequences in SeqLog, Proc. of the Workshop on Multi-Relational Data Mining (S. Dˇzeroski, L. D. Raedt, S. Wrobel, Eds.), University of Alberta, Edmonton, Canada, July 2002.
[33] Lesh, N., Zaki, M. J., Ogihara, M.: Scalable Feature Mining for Sequential Data, IEEE Intelligent Systems, 15(2), 2000, 48–56.
[34] Michie, D., Muggleton, S., Page, D., Srinivasan, A.: To the International Computing Community: a new East-West Challenge, Oxford University Computing Laboratory, Oxford, UK, 1994.
[35] Mitchell, T.: Machine Learning, McGraw-Hill, 1997.
[36] Muggleton, S.: Inverse Entailment and Progol, New Generation Computation, 13, 1995, 245–286.
[37] Muggleton, S., Feng, C.: Efficient Induction in Logic Programs, in: Inductive Logic Programming (S. Muggleton, Ed.), Academic Press, 1992, 281–298.
[38] Muggleton, S., Raedt, L. D.: Inductive Logic Programming: Theory and Methods, Journal of Logic Programming, 19,20, 1994, 629–679.
[39] Ouali, M., King, R. D.: Cascaded multiple classifiers for secondary structure prediction, Prot. Sci, 9, 2000, 1162–1176.
[40] Page, D., Srinivasan, A.: ILP: A Short Look Back and a Longer Look Forward, Journal of Machine Learning Research, 4, 2003, 415–430.
[41] Plotkin, G.: Automatic Methods of Inductive Inference, Ph.D. Thesis, Edinburgh University, august 1971.
[42] Quinlan, J. R.: C4.5: programs for Machine Learning, Machine Learning, Morgan Kaufmann, 1993.
[43] Rigoutsos, I., Floratos, A.: Combinatorial pattern discovery in biological sequences: the TEIRESIAS algorithm, Bioinformatics, 14(1), 1998, 55–67.
[44] Rückert, U., Kramer, S.: Generalized Version Space Trees, Int. Workshop Knowledge Discovery in Inductive Databases (J.-F. Boulicaut, S. Džeroski, Eds.), On-line proceedings at, http://www.cing-project.org/ecmlpkdd2003/, 2003.
[45] Srinivasan, A.: Aleph: A Learning Engine for Proposing Hypotheses, Url: http://web.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/.
[46] Srinivasan, A., King, R. D.: Feature construction with Inductive Logic Programming: A study of quantitative predictions of biological activity aided by structural attributes, Data Mining and Knowledge Discovery, 3(1), 1999, 37–57.
[47] Srinivasan, A., Muggleton, S., King, R.: Comparing the use of background knowledge by Inductive Logic Programming systems, Int. Workshop on Inductive Logic Programming (L. De Raedt, Ed.), Department of Computer Science, Katholieke Universiteit Leuven, 1995.
[48] Suppl. in Proteins: Structure, Function and Genetics: Critical Assessment of Techniques for Protein Structure Prediction (CASP), vol. 45, 2001.
[49] Widmer, G.: Combining knowledge-based and instance-based learning to exploit qualitative knowledge, Informatica, 17, 1993, 371–385, Special issue on Multistrategy Learning.
[50] Wille, R.: Ordered Sets, chapter Restructuring lattice theory: an approach based on hierarchies of concepts, Reidel, 1982, 445–470.
[51] Wolpert, D. H., Macready,W. G.: No Free Lunch Theorems for Search, Technical Report SFI-TR-95-02-010, Santa Fe, NM, 1995.
[52] Zucker, J.-D., Ganascia, J.-G.: Representation Changes for Efficient Learning in Structural Domains, Int. Conf. Machine Learning (L. Saitta, Ed.), Morgan Kaufmann, 1996.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BUS2-0007-0019