In a classical regression tree, we follow a decison process as before, but the outcome is real number.
Within each leaf, all inputs are assigned that same number.
With a regression problem we cannot split nodes the same way as we did for classification.
Instead by split by the residual sum of squares.
In classical trees all items in a leaf are assigned the same values. In this model, all are given \(\theta\) for a parametric model.
This makes the resulting trees smoother.
We have some \(\hat y_i = f(\mathbf x_i, \theta ) + \epsilon\)
The approach generalises classic regression trees. There the estimate was \(\bar y\). Here it’s a regression.
At each node we do OLS. If the \(R^2\) of the model is less than some constant, we find a split which maximises the minimum of the two new \(R^2\).
Previously our decision tree classifier was binary.
We can instead adapt the mixed tree model and using a probit model at each leaf.