The Chi-Square 'Goodness of Fit' test is used to test whether a sample is drawn from a population that conforms to a specified distribution.
The hypothesis is:
H0 the sample conforms to the specified distribution
H1 the sample does not conform to the distribution
The test is illustrated by example. An organization has three categories of employees, 'A', 'B' and 'C'. It collects the following data:
Category
|
# Employees
|
Days Sick
|
A
|
100
|
10
|
B
|
60
|
12
|
C
|
40
|
14
|
Total
|
200
|
36
|
The organization wants to test the hypothesis:
H0 the proportion of sickness is the same for each category of employees
H1 the proportion of sickness differs between categories
The first step is to form the table. The 'expected' column shows the results that would be expected if the proportions were equal between categories ie. if the null hypothesis were true:
Category
|
# Employees
|
Days Well |
Expected
|
Chi-Square Contribution
|
Days Sick
|
Expected
|
Chi-Square Contribution
|
A
|
100
|
90
|
82.0
|
0.78
|
10
|
18.0
|
3.56
|
B
|
60
|
48
|
49.2
|
0.03
|
12
|
10.8
|
0.13
|
C
|
40
|
26
|
32.8
|
1.41
|
14
|
7.2
|
6.42
|
Total
|
200
|
164
|
164
|
2.22
|
36
|
36
|
10.11
|
The chi-square statistic is calculated by summing the chi-square contributions from each category:
Where:
Ai actual value for category 'i'
Ei expected value for category 'i'
There are two degrees of freedom (if two of the 'days sick' data values are known the third can be calculated from the totals).
The critical p-value can be obtained from tables, or the p-value can be calculated using eg. Excel:
=CHIDIST(12.33,2) gives 0.0021
Refer also to Contingency Tables for another application of the chi-square test.
|