## Multiple Comparisons

When the null hypothesis is rejected, the question arises on which pairs of 's are really different. Such investigation is carried out by an analysis of all pairwise differences, called Multiple comparisons.

The data from k groups are arranged either (a) all in a single variable with another categorical variable indicating factor levels, or (b) in multiple columns each of whose variables represents a factor level.

Here we construct the confidence intervals simultaneously for all pairwise differences . Then the point estimate of and that of its variance become and , respectively. Various methods are proposed to find a critical point so that we can obtain the confidence intervals

- Tukey's method.
Tukey introduced a studentized range distribution
- Scheffé's method.
As a special case of Scheffé's S Method,
we can obtain
- Bonferroni's method. The Boole's inequality implies that we can choose with . Here is the -th percentile for student -distribution with degrees of freedom.

The significance tests for pairwise differences are then performed in the following manners: If the confidence interval for does not contain zero, then we reject `` .'' The larger the critical point is, the harder it is to reject `` '' (that is, the more conservative the test is). Therefore, in practice we often choose the smallest critical point in order to obtain the least conservative confidence interval; thus, performing the least conservative test.

Remark on simultaneity. Whether we should conduct the analysis of variance (AOV) before multiple comparisons (MC) is a little sensitive issue, since it creates simultaneity of AOV and MC. However, because of the duality between the AOV and the Scheffé's S Method, a systematic approach popular among statistician requires the AOV in order to proceed with the MC. Also note that when we attempt different multiple comparison procedures (for example, Scheffé's and Tukey-Kramer's methods), naturally we do not discuss simultaneity of these procedures and understandably their conclusions may be inconsistent (for example, Scheffé's method may not detect any significance while Tukey-Kramer's method indicates significances for some pairs).