The Chi-Square 'Goodness
of Fit' test is used to test whether a sample
is drawn from a population that conforms
to a specified distribution.
The hypothesis
is:
H_{0} the sample conforms to
the specified distribution
H_{1} the sample does not conform
to the distribution
The test is illustrated by example. An
organization has three categories of employees,
'A', 'B' and 'C'. It collects the following
data:
Category |
#
Employees |
Days
Sick |
A |
100 |
10 |
B |
60 |
12 |
C |
40 |
14 |
Total |
200 |
36 |
The organization wants to test the hypothesis:
H_{0} the proportion of sickness
is the same for each category of employees
H_{1} the proportion of sickness
differs between categories
The first step is to form the table. The
'expected' column shows the results that
would be expected if the proportions were
equal between categories ie. if the null
hypothesis were true:
Category |
#
Employees |
Days Well |
Expected |
Chi-Square
Contribution |
Days
Sick |
Expected |
Chi-Square
Contribution |
A |
100 |
90 |
82.0 |
0.78 |
10 |
18.0 |
3.56 |
B |
60 |
48 |
49.2 |
0.03 |
12 |
10.8 |
0.13 |
C |
40 |
26 |
32.8 |
1.41 |
14 |
7.2 |
6.42 |
Total |
200 |
164 |
164 |
2.22 |
36 |
36 |
10.11 |
The chi-square statistic is calculated
by summing the chi-square contributions
from each category:
Where:
A_{i} actual value for category
'i'
E_{i} expected value for category
'i'
There are two degrees of freedom (if two
of the 'days sick' data values are known
the third can be calculated from the totals).
The critical p-value
can be obtained from tables, or the p-value
can be calculated using eg. Excel:
=CHIDIST(12.33,2) gives
0.0021
Refer also to Contingency
Tables for another application
of the chi-square test. |