e-Statistics

Test of Independence

In a study where there are two characteristics the researchers want to know whether these two characteristics, say ``A'' and ``B,'' are linked or independent. For such study we have paired observations in categorical data of size $ n$, which is summarized in the contingency table.

The column (usually the first) of categorical variable must be specified at the box on the left, which consists of categorical values for the characteristic ``A.'' Then the remaining multiple columns of counting data are specified at one by one, each of which corresponds to a categorical value of another characteristic ``B.'' In each time the cell frequencies $ X_{A,B}$'s of the specified value B are placed at the "Observed" in the table below.

The statement of null hypothesis becomes ``the two characteristics are independent.'' Let $ n_{A,\cdot}$ and $ n_{\cdot,B}$ denote the total counts of the respective value $ A$ and $ B$ (i.e., the raw and column sum in the contingency table). Under the null hypothesis, the expected frequencies for the contingency table are given by $ E_{A,B} = (n_{A,\cdot}\times n_{\cdot,B}) / n$ where $ n$ denotes the total cell counts. Then the chi-square statistic is

$ \chi^2 = \displaystyle\sum_{A} \sum_{B}
\frac{(X_{A,B} - E_{A,B})^2}{E_{A,B}}
...
...}
\frac{(X_{A,B} - n_{A,\cdot} n_{\cdot,B}/n)^2}{n_{A,\cdot} n_{\cdot,B}/n} =$

Let $ m_A$ and $ m_B$ denote the number of categorical values in the row and the column, respectively. Then we have $ df = (m_A - 1)(m_B - 1) =$ degrees of freedom, and construct the critical region $ x > \chi^2_{\alpha,df}$ for the value $ \chi^2 = x$ of the chi-square statistic to determine whether the null hypothesis can be rejected or not. Equivalently we can reject the null hypothesis (that is, we can find dependence and evidence of association of the two characteristics) if p-value $ p^* =$ is significant (that is, $ p^* < \alpha$).