Descriptive Statistics in SAS: A Complete Guide to PROC MEANS, PROC SUMMARY and PROC UNIVARIATE

Home Contact

December 27, 2019

4 min

What are Descriptive Statistics?

Types of Descriptive Statistics

SAS Procedures for Descriptive Statistics

Advanced Options in PROC MEANS

Advanced Options in PROC UNIVARIATE

Practical Examples

Interpretation of Results

Best Practices

Conclusion

What are Descriptive Statistics?

Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures. Together with simple graphics analysis, they form the basis of virtually every quantitative analysis of data.

Descriptive statistics are typically distinguished from inferential statistics. With descriptive statistics, you are simply describing what is or what the data shows. With inferential statistics, you are trying to reach conclusions that extend beyond the immediate data alone.

Types of Descriptive Statistics

Mean

The mean in descriptive statistics is the average of the data. It is calculated by adding all the data and dividing by the number of data points.

$\overline{X} = \frac{21+22+23+24}{4} = 22.5$

Limitations

The mean is significantly affected by the presence of outliers. Therefore, it is not a useful measure in taking decisions.

$\overline{X} = \frac{21+22+23+200}{4} = 66.5$

Median

The Median in descriptive statistics is the value that divides the data into two equal parts. To find the median value, the data must be arranged in ascending order, and the median is the value at position $\frac{n+1}{2}$ when $n$ is odd.

When $n$ is even, the median is the average value of $\frac{n}{2}^{th}$ and $\frac{n+2}{2}^{th}$ observation after arranging the data in an increasing order.

Example:

245	326	180	226	305	195	220	295

Step 1: Arrange the data in ascending order

180	195	220	226	245	295	305	326

Step 2: Find the position of the median

Since $n=8$ (even number), the median is the average of the 4th and 5th values.

$\text{Median} = \frac{226 + 245}{2} = 235.5$

Mode

The mode in descriptive statistics is the value that appears most frequently in a data set.

Example:

245	326	180	226	305	195	220	295	245	180

In the above data, the values 245 and 180 appear twice, while all other values appear only once. Therefore, this data set has two modes: 245 and 180.

Range

The range in descriptive statistics is the difference between the largest and smallest values in a data set.

$\text{Range} = \text{Largest value} - \text{Smallest value}$

Example:

For the data set: 180, 195, 220, 226, 245, 295, 305, 326

$\text{Range} = 326 - 180 = 146$

Standard Deviation

Standard deviation in descriptive statistics measures the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range.

The formula for standard deviation is:

$\sigma = \sqrt{\frac{\sum(x_i - \bar{x})^2}{n}}$

Where:

$x_i$ = each individual value
$\bar{x}$ = mean of all values
$n$ = number of values

Variance

Variance in descriptive statistics is the square of the standard deviation. It measures how far a set of numbers is spread out from their average value.

$\text{Variance} = \sigma^2$

Skewness

Skewness in descriptive statistics is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined.

Positive skewness: The right tail is longer; the mass of the distribution is concentrated on the left.
Negative skewness: The left tail is longer; the mass of the distribution is concentrated on the right.
Zero skewness: The distribution is symmetric.

Kurtosis

Kurtosis in descriptive statistics is a measure of the “tailedness” of the probability distribution of a real-valued random variable. Kurtosis describes the shape of a probability distribution.

High kurtosis: More of the variance is due to infrequent extreme deviations.
Low kurtosis: Less of the variance is due to infrequent extreme deviations.

SAS Procedures for Descriptive Statistics

SAS provides several procedures for calculating descriptive statistics:

PROC MEANS

PROC MEANS is used to calculate summary statistics for numeric variables. By default, it provides the following statistics:

N (number of observations)
Mean (average)
Standard deviation
Minimum value
Maximum value

Basic Syntax:

PROC MEANS DATA=dataset-name;
    VAR variable-name;
RUN;

Example:

PROC MEANS DATA=sashelp.class;
    VAR age height weight;
RUN;

Output:

Variable	N	Mean	Std Dev	Minimum	Maximum
Age	19	13.32	1.49	11.00	16.00
Height	19	62.34	5.13	51.30	72.00
Weight	19	100.03	22.77	50.50	150.00

PROC SUMMARY

PROC SUMMARY is similar to PROC MEANS but provides more control over the output. It creates summary statistics and can create output datasets.

Basic Syntax:

PROC SUMMARY DATA=dataset-name;
    VAR variable-name;
    OUTPUT OUT=output-dataset;
RUN;

Example:

PROC SUMMARY DATA=sashelp.class;
    VAR age height weight;
    OUTPUT OUT=summary_stats;
RUN;

PROC UNIVARIATE

PROC UNIVARIATE provides detailed descriptive statistics for numeric variables, including:

Moments (mean, variance, skewness, kurtosis)
Basic statistical measures (mean, median, mode)
Tests for location
Quantiles
Extreme observations

Basic Syntax:

PROC UNIVARIATE DATA=dataset-name;
    VAR variable-name;
RUN;

Example:

PROC UNIVARIATE DATA=sashelp.class;
    VAR age height weight;
RUN;

Advanced Options in PROC MEANS

Using CLASS Statement

The CLASS statement is used to group the analysis by categorical variables.

PROC MEANS DATA=sashelp.class;
    CLASS sex;
    VAR age height weight;
RUN;

Using BY Statement

The BY statement is similar to CLASS but requires the data to be sorted first.

PROC SORT DATA=sashelp.class OUT=class_sorted;
    BY sex;
RUN;

PROC MEANS DATA=class_sorted;
    BY sex;
    VAR age height weight;
RUN;

Requesting Specific Statistics

You can request specific statistics using the appropriate keywords:

PROC MEANS DATA=sashelp.class N MEAN MEDIAN MODE RANGE STD;
    VAR age height weight;
RUN;

Creating Output Datasets

You can create output datasets with summary statistics:

PROC MEANS DATA=sashelp.class NOPRINT;
    VAR age height weight;
    OUTPUT OUT=stats
           N=n
           MEAN=mean_age mean_height mean_weight
           STD=std_age std_height std_weight;
RUN;

Advanced Options in PROC UNIVARIATE

Requesting Specific Plots

PROC UNIVARIATE DATA=sashelp.class PLOTS;
    VAR age height weight;
RUN;

Testing for Normality

PROC UNIVARIATE DATA=sashelp.class NORMAL;
    VAR age height weight;
RUN;

Creating Histograms

PROC UNIVARIATE DATA=sashelp.class HISTOGRAM;
    VAR age height weight;
RUN;

Practical Examples

Example 1: Basic Descriptive Statistics

/* Create sample data */
DATA test_data;
    INPUT ID Score @@;
    DATALINES;
    1 85 2 92 3 78 4 95 5 88
    6 76 7 91 8 84 9 89 10 93
    ;
RUN;

/* Calculate descriptive statistics */
PROC MEANS DATA=test_data N MEAN MEDIAN MODE STD RANGE;
    VAR Score;
RUN;

Example 2: Grouped Analysis

/* Create sample data with groups */
DATA grouped_data;
    INPUT Group $ Value @@;
    DATALINES;
    A 25 A 30 A 35 A 40 A 45
    B 15 B 20 B 25 B 30 B 35
    ;
RUN;

/* Calculate statistics by group */
PROC MEANS DATA=grouped_data;
    CLASS Group;
    VAR Value;
RUN;

Example 3: Comprehensive Analysis with PROC UNIVARIATE

PROC UNIVARIATE DATA=sashelp.class NORMAL PLOTS;
    VAR height;
    HISTOGRAM / NORMAL;
    INSET N = 'N' MEAN = 'Mean' STD = 'Std Dev' / POS = NW;
RUN;

Interpretation of Results

Understanding the Output

When you run PROC MEANS or PROC UNIVARIATE, you’ll see various statistics:

N: Number of observations
Mean: Average value
Std Dev: Standard deviation
Minimum: Smallest value
Maximum: Largest value
Skewness: Measure of asymmetry
Kurtosis: Measure of tailedness

What to Look For

Mean vs Median: If they’re very different, your data might be skewed
Standard Deviation: Large values indicate high variability
Skewness:
- Positive: Right-skewed (tail extends to right)
- Negative: Left-skewed (tail extends to left)
- Close to 0: Approximately symmetric
Kurtosis:
- High kurtosis: Heavy tails, sharp peak
- Low kurtosis: Light tails, flat peak

Best Practices

Always check for missing values before running descriptive statistics
Use appropriate procedures for your analysis needs
Consider the scale of your variables when interpreting results
Use visualizations alongside numerical statistics
Document your analysis with clear titles and labels

Conclusion

Descriptive statistics provide the foundation for understanding your data. SAS offers powerful procedures like PROC MEANS, PROC SUMMARY, and PROC UNIVARIATE to calculate these statistics efficiently. By understanding both the concepts and the SAS implementation, you can effectively summarize and describe your data, which is crucial for making informed decisions in any data analysis project.

Remember that descriptive statistics are just the first step in data analysis. They help you understand what your data looks like, but for making inferences about populations, you’ll need to move on to inferential statistics.

Table Of Contents

What are Descriptive Statistics?

Types of Descriptive Statistics

Mean

Median

Mode

Range

Standard Deviation

Variance

Skewness

Kurtosis

SAS Procedures for Descriptive Statistics

PROC MEANS

PROC SUMMARY

PROC UNIVARIATE

Advanced Options in PROC MEANS

Using CLASS Statement

Using BY Statement

Requesting Specific Statistics

Creating Output Datasets

Advanced Options in PROC UNIVARIATE

Requesting Specific Plots

Testing for Normality

Creating Histograms

Practical Examples

Example 1: Basic Descriptive Statistics

Example 2: Grouped Analysis

Example 3: Comprehensive Analysis with PROC UNIVARIATE

Interpretation of Results

Understanding the Output

What to Look For

Best Practices

Conclusion

Tags

Share

Table Of Contents

.css-bz6hia{box-sizing:border-box;margin:0;min-width:0;display:block;color:var(--theme-ui-colors-heading,#f5f5f7);font-weight:bold;-webkit-text-decoration:none;text-decoration:none;margin-bottom:1rem;font-size:1.5rem;position:relative;}What are Descriptive Statistics?

.css-154paor{box-sizing:border-box;margin:0;min-width:0;display:block;color:var(--theme-ui-colors-heading,#f5f5f7);font-weight:bold;-webkit-text-decoration:none;text-decoration:none;margin-bottom:1rem;font-size:1.25rem;position:relative;}Mean

Tags

Share

What are Descriptive Statistics?

Mean