Central Limit Theorem
- take a large number of samples from a population that does not conform to a normal distribution
- calculate the mean each of those samples
- find the shape of the population distribution formed by these sample means
You will find that the distribution of the sample means will resemble a normal distribution. The larger the the number of items in each sample, the better the approximation.
The Central Limit Theorem is of considerable practical importance because many methods used in inferential statistics rely on the samples being taken from a population that conforms to a normal distribution. Many populations do not conform to a normal distribution, but this can be overcome by using the means of samples taken from the population. Control charts are a good example of this.
A random sample taken from a population is used to estimate the population mean. The sample mean is a point estimate, and is unlikely to exactly equal the true population mean.
The confidence interval defines a band around the sample mean within which the true population will lie, to some degree of confidence:
For example, there is a 95% probability that the true population mean will lie within the 95% confidence interval of the sample mean. The method used to calculate the confidence interval will vary, but usually involves the normal distribution for large samples, or the t-distribution for small samples.
The 100(1-a)% confidence interval for the mean of a small sample (t distribution) is:
The number of independent data values that are used in estimating the value of a population parameter.
The number of degrees of freedom in the standard deviation formula is n-1:
If 'n' were used, instead of 'n-1', the value of 's' would be biased; the standard deviation calculated from small samples would underestimate the population standard deviation.
The number of degrees of freedom is 'n-1' because only 'n-1' of the data values 'xi' are independent; if any 'n-1' are known then the other can be calculated (using x-bar).
The mean x-bar is an estimate of the true population mean and was calculated using the same xi values that are being used in the standard deviation calculation. It can be shown that, because of this, errors between the estimate x-bar and the true population mean tend to bias the value of 's'.
The entire collection of the items under study. In inferential statistics the population under study might be the hypothetical future output of a process, given certain parameter values.
The confidence interval is used to predict the interval within which the population mean falls. The prediction interval is used to predict the interval within which a single future observation will fall.
The 100(1-a)% prediction interval for a small sample (t distribution) is:
The standard deviation of the mean of a sample. If you:
- take a large number of samples, of equal size, from a population
- calculate the mean of each sample
- calculate the standard deviation of the sample means
you will have found the standard error. The standard error is related to the population (process) standard deviation by:
where 'n' is the sample size.