Bayesian trees

Bayesian trees

Priors of trees

Priors for simple trees

We can define a tree as a set of nodes: \(T\).

For each node we define a splitting variable \(k\) and a splitting threshold \(r\).

Our prior is \(P(T, k, r)\).

We split this up to:

\(P(T, k, r)=P(T)P(k, r|T)\)

\(P(T, k, r)=P(T)P(k|T)P(r|T, k)\)

So we want to estimate:

  • \(P(T)\) - The number of nodes.

  • \(P(k|T)\) - Which variables we split by, given the tree size.

  • \(P(r|T, k)\) - The cutoff, given the tree size and the variables we are splitting by.

Priors for mixed trees

If at the leaf we have a parametric model, our prior is instead:

\(P(T, k, r, \theta )=P(T)P(k|T)P(r|T, k)P(\theta | T, k, r)\)

We then need to additionally estimate \(P(\theta | T, k, r)\).

The pinball prior

We can generate a tree with a fixed number of leaves, according to our prior.

As we start the tree we assocaite the root node with a count of all leaves.

As we split a node, the remaining leaf counts are divided between the directions. If there is only one leaf left, we do no further splitting.

Estimating other priors

Bayesian CART

Our prior

Call the collective parameters of the tree \(\Theta =(T, k, r)\) and \(\theta\).

Collectively our prior is defined by \(P(\Theta )\) and \(P(\theta )\)

Bayes’ theorem

We want to know the posterior given our data \(X\).

\(P(\Theta | X)=\dfrac{P(X|\Theta )P(\Theta )}{P(X)}\)

\(P(\Theta | X)\propto P(X|\Theta )P(\Theta )\)

Expanded posterior

We know explore \(P(X|\Theta )\)

\(P(X|\Theta )=\int P(X| \theta , \Theta)P(\theta)d\theta\)

This means our posterior is:

\(P(\Theta | X)\propto P(\Theta )\int P(X| \theta , \Theta)P(\theta)d\theta\)

Estimation

This can be estimated with MCMC.