Many thanks to Brian D. Ripley and Bert Gunter for replying to my question
regarding regression trees.
My original post:
To what extent are regression tree (with a single continuous dependent
variable) results robust in regards to collinearity or correlation between
independent variables? I'm interested in this issue when both continuous
and factor variables are included in the independent variables and also when
only continuous independent variables are used.
The summary:
Both responses indicated that regression trees are not robust to independent
variable collinearity.
Brian Ripley wrote:
Not at all. Tree-based methods are not robust to many things, which is why
methods such as bagging and boosting have arisen. They were designed to
find fairly complex but clear-cut relationships.
Bert Gunter wrote:
More than "not at all", I'd say: Decidedly ambiguous. The tree topologies
(though not necessarily the predictions) can change radically with minor
alterations to the data.
Additional question:
Given this lack of robustness, how would one examine and address
collinearity between a mix of continuous and factor independent variables?
|