Stochastic Trees
This section will delve into a specific family of non-parametric models that has gained considerable traction in the causal inference world: Bayesian Additive Regression Trees (BART) and some of its notable derivatives. These models offer a unique blend of flexibility and interpretability, making them particularly well-suited for causal inference tasks.
Before we dive into the specifics of these models, it’s crucial to understand the fundamental assumptions that underpin their use in causal inference. Like many causal inference methods, BART and its derivatives rely on two key assumptions: the Stable Unit Treatment Value Assumption (SUTVA) and strong ignorability.
Stable Unit Treatment Value Assumption (SUTVA): SUTVA comprises two parts:
No interference: The treatment applied to one unit doesn’t influence the outcomes of other units. In a business context, a marketing campaign targeted at one customer wouldn’t sway the purchasing decisions of others.
No hidden variations of treatments: There’s only one version of each treatment level. If we’re studying the effect of a new training program, all employees receive the same version of it.
Strong Ignorability: This assumption consists of two components:
Unconfoundedness: Given the observed covariates, treatment assignment is independent of potential outcomes. In essence, if we account for all relevant variables, whether a unit receives treatment or not is unrelated to their outcomes under either condition.
Positivity (or overlap): Every unit has a non-zero probability of receiving each treatment level. In a business setting, this means every customer has some chance of being exposed to a new marketing campaign, regardless of their characteristics.
These assumptions are the bedrock upon which we build causal interpretations. When these conditions hold, we can attribute the observed differences in outcomes between treated and untreated units to the treatment itself, rather than to confounding factors.
We’ll explore three key models in this family: BART, Bayesian Causal Forests (BCF), and LongBet. Each of these models builds upon its predecessors, offering improvements in terms of causal effect estimation, handling of confounding, and applicability to different data structures.
In our exploration, we’ll be leveraging the stochtree R package (Herren et al. 2024), which implements these models using a technique called “warm-start” as introduced by Krantsevich, He, and Hahn (2023). The warm-start approach is a computational innovation that significantly improves the efficiency and effectiveness of these models, particularly for large datasets.
The warm-start technique works by using a fast approximation method (XBART) to generate initial tree structures, which are then used as starting points for the full Bayesian MCMC algorithm. This approach combines the speed of approximate methods with the statistical rigor of full Bayesian inference, resulting in models that are both computationally efficient and statistically robust.
By using warm-start, we can fit these sophisticated models to larger datasets and explore more complex causal relationships than was previously feasible. This makes these models particularly valuable for business data science applications, where we often deal with large, complex datasets and need to uncover nuanced causal relationships.
Let’s begin our journey into the world of stochastic trees and their applications in causal inference, keeping in mind the critical assumptions that allow us to draw causal conclusions from these powerful tools.