PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Learning causal theories with non-reversible MCMC methods

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Causal laws are defined in terms of concepts and the causal relations between them. Following Kemp et al. (2010), we investigate the performance of the hierarchical Bayesian model, in which causal systems are represented by directed acyclic graphs (DAGs) with nodes divided into distinct categories. This paper presents two non-reversible search and score algorithms (Q1 and Q2) and their application to the causal learning system. The algorithms run through the pairs of class-assignment vectors and graph structures and choose the one which maximizes the probability of given observations. The model discovers latent classes in relational data and the number of these classes and predicts relations between objects belonging to them. We evaluate its performance on prediction tasks from the behavioural experiment about human cognition. Within the discussed approach, we solve a simplified prediction problem when object classification is known in advance. Finally, we describe the experimental procedure allowing in-depth analysis of the efficiency and scalability of both search and score algorithms.
Rocznik
Strony
323--361
Opis fizyczny
Bibliogr. 48 poz., rys., tab.
Twórcy
  • NASK National Research Institute, Kolska 12, 01045 Warszawa, Poland
Bibliografia
  • Akaike, H. (1974) A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6): 716–723.
  • Angelino, E., Johnson, M. J. and Adams, R. P. (2016) Patterns of scalable bayesian inference. Foundations and Trends in Machine Learning, 9: 119–247.
  • Arabas, P. (2019) Energy aware data centers and networks: a survey. Journal of Telecommunications and Information Technology, 4: 26–36.
  • Arabas, P. (2021) Modeling and simulation of hierarchical task allocation system for energy-aware hpc clouds. Simulation Modelling Practice and Theory, 107: 102–221.
  • Bierkens, J. (2016) Non-reversible Metropolis-Hastings. Stat Comput, 26: 1213—1228.
  • Carey, S. (1985) Conceptual Change in Childhood. MIT Press.
  • Chickering, D. (2002) Optimal structure identification with greedy search. Journal of Machine Learning Research, 3: 507–554. doi: 10.1162/153244303321897717.
  • Contaldi, C., Vafaee, F. and Nelson, P.C. (2019) Bayesian network hybrid learning using an elite-guided genetic algorithm. Artificial Intelligence Review, 52(1): 245–272.
  • Cooper, G. F. and Herskovits, E. (1992) A bayesian method for the induction of probabilistic networks from data. Mach. Learn., 9(4): 309–347. ISSN 0885-6125. doi: 10.1023/A:1022649401552.
  • Corander, J., Ekdahl, M. and Koski, T. (2008) Parallel interacting mcmc for learning of topologies of graphical models. Data Mining and Knowledge Discovery, 17: 431–456.
  • Corander, P., Gyllenberg, M. and Koski, T. (2006) Bayesian model learning based on a parallel mcmc strategy. Statistics and Computing, 16: 355–362.
  • Dai, J., Ren, J. and Du, W. (2020) Decomposition-based bayesian network structure learning algorithm using local topology information. Knowledge-Based Systems, 105602.
  • Dolan, E. and More, J. (2001) Benchmarking optimization software with performance profiles. Mathematical Programming, 91. doi: 10.1007/s101070100263.
  • Flores, M. J., Nicholson, A. E., Brunskill, A., Korb, K. B. and Mascaro, S. (2011) Incorporating expert knowledge when learning bayesian network structure: A medical case study. Artificial Intelligence in Medicine, 53 (3):181–204. doi: https://doi.org/10.1016/j.artmed.20 11.08.004.
  • Friedman, N. and Koller, D. (2001) Being bayesian about network structure: A bayesian approach to structure discovery in bayesian networks. Mach. Learn., 50. doi: 10.1023/A:1020249912095.
  • Friedman, N., Nachman, I. and Pe’er, D. (1999) Learning bayesian network structure from massive datasets: The sparse candidate algorithm. In: Proceedings of the Fifteenth Conference on Uncertainty and Artificial Intelligence. Morgan Kaufmann Publishers, 206–215. doi: 10.13140/2.1.1125.2169.
  • Gao, F. and Huang, D. (2020) A node sorting method for k2 algorithm in bayesian network structure learning. In: 2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA), 106–110. doi: 10.1109/ICAICA50127.2020.9182465.
  • Goudie, R. J. B. and Mukherjee, S. (2016) A gibbs sampler for learning dags. Journal of Machine learning Research, 17(2): 1–39.
  • Griffiths, T. L., Chater, N., Kemp, C., Perfors, A., Tenenbaum, J. B. and Goodman, N. D. (2010) Probabilistic models of cognition: exploring representation and inductive biases. Trends in Cognitive Sciences, 14: 357–364.
  • Hansen, N., Auger, A., Mersmann, O., Tusar, T. and Brockhoff, D. (2016) Coco: A platform for comparing continuous optimizers in a blackbox setting. Optimization Methods and Software, 36: 114–144.
  • Hastings, W. K. (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57:97–109.
  • Jones, M. and Love, B. C. (2011) Bayesian fundamentalism or enlightenment? On the explanatory status and theoretical contributions of bayesian models of cognition. Behavioral and Brain Sciences, 34:169–231.
  • Kemp, C., Griffiths, T. L. and Tenenbaum, J.B. (2004) Discovering latent classes in relational data. Technical report, MIT, CSAIL, 2004.
  • Kemp, C., Tenenbaum, J. B., Niyogi, S. and Griffiths, T. (2010) A probabilistic model of theory formation. Cognition, 114: 165–196.
  • Koller, D., Friedman, N., Getoor, L. and Taskar, B. (2007) Graphical models in a nutshell. In: L. Getoor and B. Taskar, eds., Introduction to Statistical Relational Learning. The MIT Press, 13–55.
  • Koski, T. and Noble, J. M. (2012) A review of bayesian networks and structure learning. Mathematica Applicanda, 40: 51–103.
  • Koski, T. J. T. and Noble, J. M., eds. (2009) Bayesian Networks: an Introduction. Wiley.
  • Lee, C. and van Beek, P. (2017) Metaheuristics for score-and-search bayesian network structure learning. In: Canadian Conference on Artificial Intelligence, 129–141. Springer.
  • Madigan, D. and York, J. (1993) Bayesian graphical models for discrete data. International Statistical Review, 63: 215–232.
  • Madigan, D., Andersson, S., Perlman, M. and Volinsky, C. (2000) Bayesian model averaging and model selection for Markov equivalence classes of acyclic digraphs. Communications in Statistics: Theory and Methods, 25. doi: 10.1080/03610929608831853.
  • Madsen, A. L., Jensen, F., Salmerón, A., Langseth, H. and Nielsen, T. D. (2017) A parallel algorithm for bayesian network structure learning from large data sets. Knowledge-Based Systems, 117: 46–55.
  • Mansinghka, V., Kemp, C., Tenenbaum, J. B. and Griffiths, T. L. (2006) Structured priors for structure learning. In: Proceedings of the 22nd conference on uncertainty in artificial intelligence (UAI). AUAI Press, 324–331.
  • McClelland, J., Botvinick, M. M., Noelle, D. C., Plaut, D. C., Rogers, T. T., Seidenberg, M. S. and Smit, L. B. (2010) Letting structure emerge: connectionist and dynamical systems approaches to cognition. Trends in Cognitive Sciences, 14: 348–356.
  • Moore, A. and Wong, W.-k. (2004) Optimal reinsertion: A new search operator for accelerated and more accurate bayesian network structure learning. In: Proceedings of the Twentieth International Conference on Machine Learning (ICML’03). AAAI Press, 552–559.
  • More, J. and Wild, S. (2009) Benchmarking derivative-free optimization algorithms. SIAM Journal on Optimization, 20: 172–191. doi: 10.1137/080724083.
  • Murphy, G. and Medin, D. (1985) The role of theories in conceptual coherence. Psychological Review, 92(3): 289–316. ISSN 0033-295X. doi: 10.1037/0033-295X.92.3.289.
  • Peters, G. (2008) Markov chain Monte Carlo: stochastic simulation for bayesian inference. Statistics in Medicine, 27(16): 3213–3214. doi: 10.1002 /sim.3240.
  • Ravenzwaaij, D., Cassey, P. and Brown, S. D. (2018) A simple introduction to Markov chain Monte-Carlo. Psychonomic Bulletin & Review, 25: 143–154.
  • Rios, F., Noble, J. and Koski, T. (2015) A prior distribution over directed acyclic graphs for sparse bayesian networks. arXiv: 1504.06701
  • Rissanen, J. (1978) Modeling by shortest data description. Automatica, 14(5): 465–471. ISSN 0005-1098. doi: https://doi.org/10.1016/0005-1098(78)90005-5.
  • Robinson, R. (1973) Counting labeled acyclic digraphs. In: F. Harary, ed., New Directions in the Theory of Graphs, 239–273. Academic Press, New York, NY.
  • Scanagatta, M., Salmeron, A. and Stella, F. (2019) A survey on bayesian network structure learning from data. Progress in Artificial Intelligence, 8: 425–439.
  • Schwarz, G. (1978) Estimating the dimension of a model. Ann. Statist., 6(2): 461–464. doi: 10.1214/aos/1176344136.
  • Silander, T., Leppä-Aho, J., Jääsaari, E. and Roos, T. (2018) Quotient normalized maximum likelihood criterion for learning bayesian network structures. In: International Conference on Artificial Intelligence and Statistics, 948–957. [Publisher ????]
  • Szynkiewicz, P. (2018) Comparative study of pso and cma-es algorithms on black-box optimization benchmarks. Journal of Telecommunications and Information Technology, 4: 5–17.
  • Tenenbaum, J. B., Kemp, C., Griffiths, T. L. and Goodman, N. D. (2011) Statistics, structure, and abstraction. Science, 331: 1279–1285.
  • v. d. Vaart, A. W. (1998) Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press. doi: 10.1017/CBO9780511802256.
  • Wellman, H. M. and Gelman, S. A. (1992) Cognitive development: Foundational theories of core domains. Annual Review of Psychology, 43(1):3 3 7–375. doi: 10.1146/annurev.ps.43.020192.002005.PMID: 1539946.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-e1f29d13-8b76-4494-ae90-d4f037bbd30b
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.