e-Statistics

t-Test

Data are collected from two groups, say ``Group 1'' and ``Group 2,'' concerning with how Group 1 and Group 2 differ in terms of their respective population means $ \mu_1$ and $ \mu_2$.

Data must be arranged either (a) in two columns each of which contains data for the respective group, or (b) all in a single column with another categorical variable identifying "Group 1" and "Group 2." Then data analysis begins with the calculation of the respective sample means $ \bar{X}$ and $ \bar{Y}$, and the sample standard deviations $ S_1$ and $ S_2$ from Group 1 and Group 2 with the respective sample sizes $ n$ and $ m$.

The hypothesis test must be described by the alternative hypothesis

$ H_A:\hspace{0.05in}\mu_1$ $ \mu_2$

In general procedure the variance estimate of $ (\bar{X}-\bar{Y})$ is given by $ S_{\bar{X}-\bar{Y}}^2 = \frac{S_1^2}{n} + \frac{S_2^2}{m}$ with the sample variances $ S_1^2$ and $ S_2^2$ of Group 1 and 2. Then the test statistic $ T = \frac{\bar{X} - \bar{Y}}{S_{\bar{X}-\bar{Y}}}$ is likely observed around zero, not toward negatively extreme, or not toward positively extreme, under the null hypothesis $ H_0: \mu_1 = \mu_2$ against the respective alternative hypotheses `` $ \mu_1 \neq \mu_2$,'' `` $ \mu_1 < \mu_2$,'' or `` $ \mu_1 > \mu_2$.'' The opposite of such an observation is expressed by the p-value smaller than $ \alpha$, and it suggests evidence to support the alternative hypothesis $ H_A$.

When it is reasonable to assume that the two population variances $ \sigma_1^2$ and $ \sigma_2^2$ of Group 1 and 2 are equal, the variance estimate is given by $ S_{\bar{X}-\bar{Y}}^2
= \left(\frac{1}{n} + \frac{1}{m}\right) S_p^2$ via pooled sample variance $ S_{p}^2 = \frac{(n-1)S_x^2 + (m-1)S_y^2}{n+m-2}$. In pooled t-test, $ d = T\sqrt{\frac{1}{n}+\frac{1}{m}}$ is called the Cohen's d, aiming at the estimate of standardized mean difference.

If the null hypothesis is rejected, it would be preferable to construct the confidence interval for the population mean difference $ \mu_1 - \mu_2$.

$ \displaystyle
\left(\bar{X} - \bar{Y}
- t_{\alpha/2,df} S_{\bar{X}-\bar{Y}},\:
\bar{X} - \bar{Y}
+ t_{\alpha/2,df} S _{\bar{X}-\bar{Y}}
\right) =$ ( , )

Here the choices of confidence level $ (1-\alpha)$ are 90%, 95%, or 99%.