Confidence Interval for Population Mean

Published on:
written bySubhro
STATISTICS

A confidence interval is constructed from sample data is a range of values that is likely to include the population parameter with a certain probability.

The objective of a confidence interval is to provide the location and precision of population parameters.

Confidence Interval for the population mean may be stated as  30 \le \mu \le 50, which means population means lies between values of 30 and 50.

Since the interval estimate may or may not contain the true parameter estimate, we associate confidence (probability) of finding the true parameter value in the interval.

We may say that there is a 95% confidence that the interval contains the population mean, implying a 5% chance that the interval may not contain the population means.

Confidence levels are usually written as (1-\alpha)100% on the interval estimate of a population parameter, and it is the probability that the interval estimate will contain the true population parameter.

When \alpha=0.05,95% is the confidence level, and 0.95 is the probability that the interval estimate will have the population parameter.

The value of \alpha is called significance which signifies the chance of not observing the true population means in the interval estimate.

Confidence Interval for Population Mean when Standard Deviation is known.

The confidence interval for a population mean is determined by taking the sample mean(point estimate) and adding or subtracting a margin of error from it.

 \overline{X} \pm E

If the population Standard deviation is known, the margin of error is determined by

 E = Z_{\alpha/2} \frac{\sigma}{\sqrt{n}}

where  \alpha \text{ (significance level)} = 1 - CL\text{ (Confidence level)} and correspondingly,  CL = 1 - \alpha

So, if the CL = 95%  \alpha = 1 - 0.95 = 0.05

 Z_{\alpha/2} is called the critical value, which can be found in the Z table.

Confidence interval

z value tells us how many Standard deviations an observation is from the mean. A Z score of -2 tells us that the observation is 2 Standard deviations to the left of the mean.

More specifically, it allows us to calculate how much area a specific Z score is associated with. We can find the exact area using a Z table, also known as Standard Normal Table.

The table shows the total area on the left side of any value of Z.

The Top row and the first column correspond to the Z value and all the numbers in the middle correspond to the areas.

Let’s find the Z value for a 95% confidence interval.

We know that,  \alpha= 0.05 for 95% confidence interval. The total area represents 1. Since 95% or 0.95 is the area in the middle and the leftover area is the  \alpha, we have to divide  \alpha into two equal parts, which will correspond to 0.025 area to the left and 0.025 area to the right.

So, the area to the left will be 0.95 + 0.025 = 0.975. We can calculate the Positive Z value by looking at the Z table and finding the area closest to 0.975, which is 1.96.

Confidence Interval for Population Mean

This Z value tells us that 95% of the area lies with roughly 1.96 standard deviations from the mean.

Since the normal distribution is symmetrical, the corresponding value to the left of the curve will be -1.96.

We can write the 95% confidence interval for the population mean when the population standard deviation is known as :

 \overline{X} \pm 1.96 \frac{\sigma}{\sqrt{n}}

Example:

A sample of 100 subjects was chosen to estimate the length of stay at a hospital. The sample mean was 4.5 days and the population standard deviation was 1.2 days.

  1. Calculate the 95% confidence interval for the population mean.
  2. What is the probability that the population means is greater than 4.73 days?

Solution:

(1) Known Values are:

 \overline{X} = 4.5days  \sigma=1.2

Estimated value of mean from a sample size can be calculated using

\overline{X} \pm E and we know that Margin of Error  \text{(E)} = \frac{\sigma}{\sqrt{n}} and thus the formula can be written as:

 \overline{X} \pm Z_{\alpha/2} \text{ x } \frac{\sigma}{\sqrt{n}}

So,  \sigma/ \sqrt{n} = 1.2 / \sqrt{100} = 0.12

The 95% confidence interval is given by:

 4.5 - 1.96 \text{ x }0.12 \text{ and } 4.5+1.96 \text{ x }0.12 = (4.2648,4.7352)

where  \pm 1.96 is the critical value obtained from the Z table for 95% confidence Interval where  \alpha \text{/}2= 0.0975 and 0.025

Thus, to interpret this, we can say that we are 95% confidence that the population mean is between 4.2648 and 4.7352.

 4.2648 \le \mu \text{ }4.7352

(2) Since the upper limit of 95% confidence interval is 4.7352, we can say that the probability of a population means greater than 4.7352 is approximately 0.025.

Calculating the Confidence Interval in a SAS data step

The Confidence Interval can be calculated in a SAS data step as below.

data CI;
 N=100;
 SAMPLE_MEAN=4.5;
 STD_DEV=1.2;
 ALPHA=0.05;
 Z =probit(1-ALPHA/2);
 LCLM=SAMPLE_MEAN-Z*STD_DEV/SQRT(N);
 HCLM=SAMPLE_MEAN+Z*STD_DEV/SQRT(N);
run;
Confidence Interval for Population Mean

 

Confidence Interval for Population Mean when Standard Deviation is unknown.

When the confidence interval is unknown, we will not be able to use the below formula.

 X \pm Z_{\alpha/2} \frac{\sigma}{\sqrt{n}}

William Gossett proves that if the population follows a normal distribution and the standard deviation is calculated from the sample, the statistics below will follow a t-distribution with (n-1) degrees of freedom.

 t = \frac{\overline{X} - \mu}{S/\sqrt{n}}

S is the standard deviation estimated from the sample. The t-distribution is almost similar to standard normal distribution. It has a bell shape, and its mean median and mode equal 0.

The major difference between t-distribution and standard normal distribution is that t-distribution has a broad tail compared to standard normal distribution. However, as the degrees of freedom increase, the t-distribution converges to a standard normal distribution.

The  (1-\alpha) 100% confidence interval mean from a population that follows a normal distribution when the standard deviation is unknown is given by  \overline{X} \pm t_{\alpha/2,n-1} \text{ x } \frac{S}{\sqrt{n}}

Confidence Interval for Population Mean

Example :

An online grocery store is interested in estimating the basket size of its customer orders to optimize the size of crates used for delivering the grocery items. For a sample size of 70 customers, the basket size was 24, and the standard deviation estimated from its sample was 3.8. Calculate the 95% confidence interval for the basket size of the customer order.

Solution:

 n=70,\overline{X} =24, S = 3.8 degress of freedom is (n-1) = 69

The T-value can be found using the T table or the TINV function in SAS.

Using the T- table, you have to look at the intersection of degrees of freedom for the corresponding Confidence Level.

Confidence Interval for Population Mean

Since the degrees of freedom 69 is not available, we have to look for the closest value of 69, which is 60, and the corresponding T value is 2.000.

The confidence, interval for the size of basket is given by

 \overline{X} \pm t_{\alpha/2} \frac{S}{\sqrt{n}}

The Lower confident limit is given by  24 - 2 \frac{3.8}{\sqrt{70}} = 23.09

The Upper confident limit is given by  24 + 2 \frac{3.8}{\sqrt{70}} = 24.91

Thus, the 95% confidence interval for the size of the basket is (23.09,24.91)

Calculating Confidence Interval in SAS

We can use the PROC MEANS procedure with the CLM option in SAS to find the Lower and Upper Confidence limit.

I have simulated the above example using random numbers and calculated below the Lower and Upper Confidence limit.

data basket;
 do i=1 to 70;
  size=round(20+ floor(1+30-20)*rand("uniform"), .01);
  output;
 end;
 drop i;
run;
proc means alpha=0.05 clm mean std maxdec=3; 
var size; 
run;
Confidence Interval for Population Mean

If you don’t have the raw data, you can also use the below data step to calculate Confidence Limit.

data CI;
 X=24;
 S=3.8;
 N=70;
 ALPHA=0.05;
 CRITICAL_VALUE=TINV(1-ALPHA/2, N-1);
 LCLM=X - (CRITICAL_VALUE * S/sqrt(N) );
 HCLM=X + (CRITICAL_VALUE * S/sqrt(N) );
run;

Every week we'll send you SAS tips and in-depth tutorials

JOIN OUR COMMUNITY OF SAS Programmers!

Subhro Kar is an Analyst with over five years of experience. As a programmer specializing in SAS (Statistical Analysis System), Subhro also offers tutorials and guides on how to approach the coding language. His website, 9to5sas, offers students and new programmers useful easy-to-grasp resources to help them understand the fundamentals of SAS. Through this website, he shares his passion for programming while giving back to up-and-coming programmers in the field. Subhro’s mission is to offer quality tips, tricks, and lessons that give SAS beginners the skills they need to succeed.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Share via
Copy link
Powered by Social Snap