Logistic Regression

Sanjiv R. Das

Limited Dependent Variables

The Logistic Function

$$ y = \frac{1}{1+e^{-f(x_1,x_2,...,x_n)}} \in (0,1) $$

where

$$ f(x_1,x_2,...,x_n) = a_0 + a_1 x_1 + ... + a_n x_n \in (-\infty,+\infty) $$

Odds Ratio

What are odds ratios? An odds ratio (OR) is the ratio of probability of success to the probability of failure. If the probability of success is $p$, then

$$ OR = \frac{p}{1-p}; \quad \quad p = \frac{OR}{1+OR} $$

Odds Ratio Coefficients

Metrics

  1. Accuracy: the number of correctly predicted class values.

  2. ROC and AUC: The Receiver-Operating Characteristic (ROC) curve is a plot of the True Positive Rate (TPR) against the False Positive Rate (FPR) for different levels of the cut-off posterior probability. This is an essential trade-off in all classification systems.

  3. TPR = sensitivity or recall = TP/(TP+FN)

  4. FPR = (1 − specificity) = FP/(FP+TN)

AUC of the ROC curve

More Metrics

  1. Precision = $\frac{TP}{TP+FP}$

  2. Recall = $\frac{TP}{TP+FN}$

  3. F1 score = $\frac{2}{\frac{1}{Precision} + \frac{1}{Recall}}$

(F1 is the harmonic mean of precision and recall.)

Using R

Multinomial Logit

The probability of each class $(0,1,...,k)$ for $(k+1)$ classes is as follows:

$$ Pr[y=j] = \frac{e^{a_j^\top x}}{\sum_{i=1}^k e^{a_i^\top x}} $$

and

$$ Pr[y=0] = \frac{1}{\sum_{i=1}^k e^{a_i^\top x}} $$

Note that $\sum_{i=1}^k Pr[y=i] = 1$.