# Decision Trees¶

Sanjiv R. Das

## Prediction Trees¶

• A natural outcome of recursive partitioning of the data.
• CART, which stands for classification analysis and regression trees.
• Prediction trees are of two types: (a) Classification trees, where the leaves of the trees are different categories of discrete outcomes. and (b) Regression trees, where the leaves are continuous outcomes.
• We may think of the former as a generalized form of limited dependent variables, and the latter as a generalized form of regression analysis.

## Recursive Partitioning¶

• Bifurcate the data into two categories such that the additional information from categorization results in better information than before the binary split.
• Raw frequency $p$ of how many people made donations, i.e., and number between 0 and 1. The information of the predicted likelihood $p$ is inversely related to the sum of squared errors (SSE) between this value $p$ and the $x_i = 0,1$ values of the observations.
$$SSE_1 = \sum_{i=1}^n (x_i - p)^2$$
• Second bifurcation: $$SSE_2 = \sum_{i, Income < K} (x_i - p_L)^2 + \sum_{i, Income \geq K} (x_i - p_R)^2$$

• By choosing $K$ correctly, our recursive partitioning algorithm will maximize the gain, i.e., $\delta = (SSE_1 - SSE_2)$. We stop branching further when at a given tree level $\delta$ is less than a pre-specified threshold.

## C4.5 Classifier¶

Recursive partitioning as in the previous case, but instead of minimizing the sum of squared errors between the sample data $x$ and the true value $p$ at each level, here the goal is to minimize entropy. This improves the information gain. Natural entropy ($H$) of the data $x$ is defined as

$$H = -\sum_x\; f(x) \cdot ln \;f(x)$$

where $f(x)$ is the probability density of $x$. This is intuitive because after the optimal split in recursing down the tree, the distribution of $x$ becomes narrower, lowering entropy. This measure is also often known as "differential entropy."

## Gini coefficient¶

The Gibi measures the quality of the split. It is defined as

$$Gini = 1 - \sum_{c=1}^C P_c^2$$

where $c$ indexes $C$ categories and $P_c$ is the proportion of the split in category $c$, or simply, the probability of splitting to $c$.

Here you have a binary split, so you need just two probabilities, left and right. For example look at the 3rd row in the tree, second box from left (blue color)

$$Gini = 1 - (2/21)^2 - (19/21)^2 = 0.172$$

The smaller the Gini the better the split. Notice at the top node the Gini is 0.5.