Residual Analysis

The standard assumption of regression model is that the random error terms $ \epsilon_1,\ldots,\epsilon_n$ are iid normally distributed random variables with mean 0 and common variance $ \sigma^2$. Thus, the statistical model for the simple linear regression is given by

$\displaystyle Y_i \sim N(\beta_0 + \beta_1 x_i, \sigma^2)$    for $ i=1,\ldots,n$.

The data set consists of

  1. explanatory variable for $ x_i$'s;
  2. dependent variable for $ Y_i$'s.
For each value $ x_i$ of the independent variable, the prediction equation provides a fitted value $ \hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i$. Then we can plot the fitted value $ \hat{Y}_i$ against the standardized residuals $ \frac{Y_i - \hat{y}_i}{\hat{\sigma}}$.

In model validation we look for a pattern, the indication of which suggests that the regression of choice is not a good model.