Box and Whisker Plot : Explained

Box and Whisker Plot : Explained

A box plot (also known as a box and whisker plot) is a chart often used in descriptive data analysis to visually show the distribution of numerical data and skewness by displaying the data quartiles (or percentiles) averages.

Box plots show the five-number summary of a set of data: the minimum score, first (lower) quartile, median, third (upper) quartile, and maximum score.

First Quartile (Q1/25th Percentile): It is the middle number between the smallest and median of the dataset.

Median (Q2/50th Percentile): Median is the middle value of the dataset.

Third quartile (Q3/75th Percentile): The middle value between the median and the dataset’s highest value.

Interquartile range (IQR): 25th to the 75th Percentile.

Whiskers: The whiskers go from each quartile to the minimum or maximum. The upper and lower whiskers represent values outside the middle 50% (i.e. the lower 25% of values and the upper 25% of values).

Outliers: Outlier is an observation numerically separated from the rest of the data.

Minimum: The lowest value, excluding outliers. “minimum”: Q1 -1.5*IQR

Maximum:  The highest value, excluding outliers. “maximum”: Q3 + 1.5*IQR

An Example of Box and Whisker Plot

Draw a box and whisker plot for the data set {3, 7, 8, 5, 12, 14, 21, 13, 18,50}.

Step 1: Order the data in ascending order.

3,7,8,5,12,14,21,13,18,50

Step 2: Find the median.

The median is the mean of the middle two numbers:

3,7,8,5,12,14,21,13,18,22

\frac{12+14}{2}= 13

The median is 13

Step 3: Find the quartiles.

The first quartile is the median of the data points to the left of the median.

3,7,8,5,12

Q1=8

The third quartile is the median of the data points to the right of the median.

14,21,13,18,22

Q3=13

Step 4: Complete the five-number summary by finding the min and the max.

The min is the smallest data point, which is 3.
The max is the largest data point, which is 50.

Step 6: Scale and label an axis that fits the datasets.

Box and Whisker Plot : Explained
Box and Whisker Plot

Interpretation of  Box and Whisker Plot

Normal Distribution or Symmetric Distribution: If a box plot has equal proportions around the median and the whiskers are the same on both sides of the box then the distribution is normal.

Positively Skewed: A distribution is positively skewed when the median is closer to the bottom quartile (Q1).

Negatively Skewed: When the median is closer to the upper quartile (Q3) and the whisker is shorter on the upper end of the box, then the distribution is negatively skewed.

Box and Whisker Plot : Explained
Interpretation of  Box and Whisker Plot

Interquartile range (IQR): The box plot depicting the median 50 per cent of scores is computed by subtracting the lower quartile from the upper quartile (e.g., Q3-Q1).

Outlier: If a data point is more than 1.5 times the interquartile range (IQR) above the top quartile (Q3), it will be declared an outlier.

Q3 + 1.5 * IQR

Similarly, if a value is lower than the 1.5*IQR below the lower quartile (Q1), the value will be considered an outlier.

Q1 – 1.5 * IQR

How to create a Box and Whisker plot in SAS?

You can create a BOX Plot in SAS using the SG PLOT procedure. First, let us look at a very simple example. I have used the same data set as above in this article.

We use the VBOX or HBOX Statement in PROC SGPLOT and specify the analysis variable.

data inp;
input var @@;
datalines;
3 7 8 5 12 14 21 13 18 50
;
run;

To create a vertical BOX Plot use the VBOX statement as below.

proc sgplot data=inp;
   vbox var;
run;
Box and Whisker Plot : Explained
Box and Whisker plot in SAS

To create a horizontal BOX Plot use the HBOX statement as below.

proc sgplot data=inp;
   hbox var;
run;
Box and Whisker Plot : Explained

Using Category = option

In PROC SGPLOT,  you can specify a categorical variable in the category= option.

proc sgplot data=sashelp.cars;
 title "Price by Car Type";
 hbox msrp / category=type;
run;

You can see the result from PROC SGPLOT  below. This is a horizontal box plot. The square in the box indicates the group mean. The vertical line inside the box is the median (50’th percentile).

The two vertical lines that constitute the top and bottom of the box also know as the whiskers are the 25’th (Q1) and 75’th (Q3) percentiles respectively.

Box and Whisker Plot : Explained

Summary

In this SAS tutorial, we learned what a boxplot is and how to create and interpret a boxplot in SAS.  Stay tuned for more intriguing topics, and post any questions in the comments below.

Every week we'll send you SAS tips and in-depth tutorials

JOIN OUR COMMUNITY OF SAS Programmers!

Subhro

Subhro provides valuable and informative content on SAS, offering a comprehensive understanding of SAS concepts. We have been creating SAS tutorials since 2019, and 9to5sas has become one of the leading free SAS resources available on the internet.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

This Post Has 6 Comments

  1. WKBN Prame

    Are there different conventions on whisker length (i.e. IQRx1.5, IQRx3, up to 98th percentile). Does outlier identification differ under these different situations?

    1. Subhro

      IQRX1.5 is one of the methods to identify outliers in a dataset. There can be only 4 Quartiles in a dataset. Q1-Q4, Even you find data 98th percentile of a dataset that will fall under Q4. The upper quartile(Q3) is the number dividing the third and fourth quartile.
      So, any data point beyond Q3+1.5⋅IQR is an outlier on a higher side.

  2. Stahlwandpools

    No matter if some one searches for his required thing, therefore he/she desires
    to be available that in detail, so that thing is maintained over here.

  3. modern replica lighting

    I was very happy to discover this page. I wanted to thank you for ones time for this wonderful read!!
    I definitely loved every little bit of it and i also have you saved as a favorite to see new information in your blog.

    1. Subhro

      Thank You!!

  4. manha.cc

    Heya! I know this is somewhat off-topic however I needed to ask.

    Does running a well-established blog like yours take a lot of work?

    I am brand new to operating a blog but I do write in my diary everyday.
    I’d like to start a blog so I can easily share my personal experience and feelings online.
    Please let me know if you have any kind of suggestions or tips for new aspiring
    blog owners. Thankyou!