e-Statistics

## Test of Independence

In a study where there are two characteristics the researchers want to know whether these two characteristics, say A'' and B,'' are linked or independent. For such study we have paired observations in categorical data of size , which is summarized in the contingency table.

The column (usually the first) of categorical variable must be specified at the box on the left, which consists of categorical values for the characteristic A.'' Then the remaining multiple columns of counting data are specified at one by one, each of which corresponds to a categorical value of another characteristic B.'' In each time the cell frequencies 's of the specified value B are placed at the "Observed" in the table below.

The statement of null hypothesis becomes the two characteristics are independent.'' Let and denote the total counts of the respective value and (i.e., the raw and column sum in the contingency table). Under the null hypothesis, the expected frequencies for the contingency table are given by where denotes the total cell counts. Then the chi-square statistic is

Let and denote the number of categorical values in the row and the column, respectively. Then we have degrees of freedom, and construct the critical region for the value of the chi-square statistic to determine whether the null hypothesis can be rejected or not. Equivalently we can reject the null hypothesis (that is, we can find dependence and evidence of association of the two characteristics) if p-value is significant (that is, ).