Central Limit Theorem states that the sample means will be approximately normally distributed for a large sample size regardless of the distribution from which the sample is taken.
Assume you are sampling from a population with mean and standard deviation of and let be representing sample mean of of independently drawn observations, then
The standard deviation of the sampling distribution of is equal to:
Now, If the population is normally distributed then, is also normally distributed.
What if the population is not normally distributed?
The central limit theorem addresses this question.
You may refer to an example of the Central Limit Theorem in the link below.
Properties of Central Limit Theorem
- The variable will be a standard normal distribution where mean = 0 and standard error = 1.
- The sampling distribution of large sample where will follow the normal distribution with mean same as the population mean and standard deviation
- The mean of sample means will be equal to the mean of the population which means if you take the average of all samples means it will be equal to the average of the population. Note that, this depends on the sample size.
- If the Population is normally distributed then the mean of the sample will be normal irrespective of the sample size.
Central Limit Theorem for proportions
It is believed that college student spends on average 65.5 minutes daily on texting using their cell phone and the corresponding standard deviation is 145 minutes. Data from a sample of 100 students were collected for calculating the amount of time spent on texting.
Calculate the probability that the average time spent by this sample of students will exceed 90 minutes.
Using the Central limit theorem, the mean of the sampling distribution is 65.5 and the corresponding standard deviation is calculated by the formula
Standard Deviation(For sample) =
We can assume that the Z score will lie somewhere between a standard deviation of 1 and 2 that is (65.5 + 14.5) which is 80 and (65.5+2*(14.5)) which is 90. (See the graph below)
Now, I will calculate the Z score.
The Z score is calculated using
From, Empirical rule of 99.7%-95%-68% we can assume the probability to be somewhere between 13.5% and 3.50 (0.15+2.35). (See the graph below)
We can find the Probability or area under the curve using a Z table which has the probability calculated for Z values ranging from -3.49 to +3.49.
You have to find the value P-value by looking at the left column for 1.6 and 0.09 from the Z table for 1.6 and 0.09 from the top column we get P as 0.954. Since I have found the area for +ve Z we have to subtract this value with 1. So, the probability will be (1-0.95449)= 0.04551.
You can also look at the -1.69 Z score and get the P-value directly which is exactly same as above.
So, I could say that the probability of exceeding 90 minutes is 4.55%.
Calculating Probability in SAS
To calculate the probability and the z score you can use the
probnorm function in SAS as below.
DATA NORMAL; MU=65.5; SIGMA=14.5; Y=90; Z=(Y-MU)/SIGMA; PROBABILITY=1- PROBNORM(Z); RUN;