e-Statistics

Regression Line

The coefficients $ \beta_0$ and $ \beta_1$ of the linear regression model

$\displaystyle Y_i = \beta_0 + \beta_1 x_i + \epsilon_i,
\quad i=1,\ldots,n,
$

are called the intercept and the slope parameters, respectively.

The data set consists of

  1. explanatory variable for $ x_i$'s;
  2. dependent variable for $ Y_i$'s.
Then the point estimates $ \hat{\beta}_0$ and $ \hat{\beta}_1$ of the parameters $ \beta_0$ and $ \beta_1$ are obtained as follows.

$ \hat{\beta}_0 = \bar{Y} - \hat{\beta}_1 \bar{x} =$ and $ \hat{\beta}_1 = \displaystyle\frac{S_{xy}}{S_{xx}} =$ .

Here the values $ \bar{x}$, $ \bar{Y}$, $ S_{xx}$, and $ S_{xy}$ are computed as in the following table.

Variable Mean Sum of squares
Explanatory $ \displaystyle
\bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i$ $ \displaystyle
S_{xx} = \sum_{i=1}^{n} (x_i - \bar{x})^2$
Response $ \displaystyle
\bar{Y} = \frac{1}{n} \sum_{i=1}^{n} Y_i$ $ \displaystyle
S_{xy} = \sum_{i=1}^{n} (x_i - \bar{x})(Y_i - \bar{Y})$

The sample correlation

$ \hat{\rho} = \displaystyle\frac{S_{xy}}{\sqrt{S_{xx}S_{yy}}} =$ .
describes the strength of linear relationship for the pair $ (x_i, Y_i)$ of data. Here $ S_{yy} = \sum_{i=1}^{n} (Y_i - \bar{Y})^2$ is the sum of squares within the response variable $ Y_i$'s. The value $ \hat{\rho}$ is always between $ -1$ and $ 1$. The value $ \hat{\rho}$ is close to $ 1$ when the pairs lie close to the straight line with positive slope, and it is close to $ -1$ when it is aligned with a negative slope.

The fitted linear model

$\displaystyle \hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x
$

is called the prediction equation (or regression line). The scatter plot together with regression line suggests how well the line fits along the data.