When testing whether a continuous variable differs between categories of a factor variable or their combinations, taking into account other continuous covariates, one may use an analysis of covariance. Several post-hoc methods, such as Tukey’s honestly significant difference test, Scheffé’s, Dunn’s, or Nemenyi’s test are well-established when the analysis of covariance rejects the hypothesis there is no difference between any categories. However, these methods are statistically rigid and usually require meeting statistical assumptions. In this work, we address the issue using a random forest-based algorithm, practically assumption-free, classifying individual observations into the factor’s categories using the dependent continuous variable and covariates on input. The higher the proportion of trees classifying the observations into two different categories is, the more likely a statistical difference between the categories is. To adjust the method’s first-type error rate, we change random forest trees’ complexity by pruning to modify the proportions of highly complex trees. Besides simulations that demonstrate a relationship between the tree pruning level, tree complexity, and first-type error rate, we analyze the asymptotic time complexity of the proposed random forest-based method compared to established techniques.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.