The Chi-Square 'Goodness of Fit' test is used to test whether a sample is drawn from a population that conforms to a specified distribution.
The hypothesis is:
H0 the sample conforms to the specified distribution
H1 the sample does not conform to the distribution
The test is illustrated by example. An organization has three categories of employees, 'A', 'B' and 'C'. It collects the following data (ignore the 'expected' and 'chi-square' contribution columns for the moment):
Category
|
# Employees
|
Days Sick
|
Expected
|
Chi-Square Contribution
|
A
|
100
|
10
|
18
|
3.56
|
B
|
60
|
12
|
10.8
|
0.13
|
C
|
40
|
14
|
7.2
|
6.42
|
Total
|
200
|
36
|
36
|
10.11
|
If the sample conformed exactly to the distribution, the days sick would be shared out as shown in the expected column. The chi-square statistic is calculated by summing the chi-square contributions from each category:
Where:
Ai actual value for category 'i'
Ei expected value for category 'i'
There are two degrees of freedom (if two of the 'days sick' data values are known the third can be calculated from the totals).
The critical p-value can be obtained from tables, or the p-value can be calculated using eg. Excel:
=CHIDIST(10.11,2) gives 0.0064
Contingency tables are an application of the chi-square test used when the relationship is between two variables. For example, the organization decides to investigate whether there is a relationship between employers who take sick leave, and who take their full entitlement of annual leave. The hypothesis is:
H0 there is no relationship between taking leave and propensity for sickness
H1 there is a relationship between taking leave and sickness
The data are as follows:
|
Sick
|
Not Sick
|
Total
|
Take Leave |
65
|
55
|
120
|
Don't take leave |
50
|
30
|
80
|
Total |
115
|
85
|
200
|
The expected values for the individual cells are found from:
The chi-square contributions for each cell are calculate from:
The expected values and the chi-square contribution are
|
Sick
|
Not Sick
|
Total
|
Take Leave |
69 (0.23)
|
51 (0.31)
|
120
|
Don't take leave |
46 (0.35)
|
34 (0.47)
|
80
|
Total |
115
|
85
|
200
|
The total chi-square value is 1.36. The number of degrees of freedom can be calculated from:
(rows - 1) x (column - 1)
This gives one degree of freedom. The number of degrees of freedom may also be obtained by considering that given any cell and the totals, the values in the remaining cells can be calculated.
From Excel =CHIDIST(1.36,1) the p-value is 0.24; this would not be accepted at the 0.05 level of significance.
|