HomeContact
Descriptive Statistics in SAS: A Complete Guide to PROC MEANS, PROC SUMMARY and PROC UNIVARIATE
December 27, 2019
4 min

Table Of Contents

01
What are Descriptive Statistics?
02
Types of Descriptive Statistics
03
SAS Procedures for Descriptive Statistics
04
Advanced Options in PROC MEANS
05
Advanced Options in PROC UNIVARIATE
06
Practical Examples
07
Interpretation of Results
08
Best Practices
09
Conclusion

What are Descriptive Statistics?

Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures. Together with simple graphics analysis, they form the basis of virtually every quantitative analysis of data.

Descriptive statistics are typically distinguished from inferential statistics. With descriptive statistics, you are simply describing what is or what the data shows. With inferential statistics, you are trying to reach conclusions that extend beyond the immediate data alone.

Types of Descriptive Statistics

Mean

The mean in descriptive statistics is the average of the data. It is calculated by adding all the data and dividing by the number of data points.

X=21+22+23+244=22.5\overline{X} = \frac{21+22+23+24}{4} = 22.5

Limitations

The mean is significantly affected by the presence of outliers. Therefore, it is not a useful measure in taking decisions.

X=21+22+23+2004=66.5\overline{X} = \frac{21+22+23+200}{4} = 66.5

Median

The Median in descriptive statistics is the value that divides the data into two equal parts. To find the median value, the data must be arranged in ascending order, and the median is the value at position n+12\frac{n+1}{2} when nn is odd.

When nn is even, the median is the average value of n2th\frac{n}{2}^{th} and n+22th\frac{n+2}{2}^{th} observation after arranging the data in an increasing order.

Example:

245326180226305195220295

Step 1: Arrange the data in ascending order

180195220226245295305326

Step 2: Find the position of the median

Since n=8n=8 (even number), the median is the average of the 4th and 5th values.

Median=226+2452=235.5\text{Median} = \frac{226 + 245}{2} = 235.5

Mode

The mode in descriptive statistics is the value that appears most frequently in a data set.

Example:

245326180226305195220295245180

In the above data, the values 245 and 180 appear twice, while all other values appear only once. Therefore, this data set has two modes: 245 and 180.

Range

The range in descriptive statistics is the difference between the largest and smallest values in a data set.

Range=Largest valueSmallest value\text{Range} = \text{Largest value} - \text{Smallest value}

Example:

For the data set: 180, 195, 220, 226, 245, 295, 305, 326

Range=326180=146\text{Range} = 326 - 180 = 146

Standard Deviation

Standard deviation in descriptive statistics measures the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range.

The formula for standard deviation is:

σ=(xixˉ)2n\sigma = \sqrt{\frac{\sum(x_i - \bar{x})^2}{n}}

Where:

  • xix_i = each individual value
  • xˉ\bar{x} = mean of all values
  • nn = number of values

Variance

Variance in descriptive statistics is the square of the standard deviation. It measures how far a set of numbers is spread out from their average value.

Variance=σ2\text{Variance} = \sigma^2

Skewness

Skewness in descriptive statistics is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined.

  • Positive skewness: The right tail is longer; the mass of the distribution is concentrated on the left.
  • Negative skewness: The left tail is longer; the mass of the distribution is concentrated on the right.
  • Zero skewness: The distribution is symmetric.

Kurtosis

Kurtosis in descriptive statistics is a measure of the “tailedness” of the probability distribution of a real-valued random variable. Kurtosis describes the shape of a probability distribution.

  • High kurtosis: More of the variance is due to infrequent extreme deviations.
  • Low kurtosis: Less of the variance is due to infrequent extreme deviations.

SAS Procedures for Descriptive Statistics

SAS provides several procedures for calculating descriptive statistics:

PROC MEANS

PROC MEANS is used to calculate summary statistics for numeric variables. By default, it provides the following statistics:

  • N (number of observations)
  • Mean (average)
  • Standard deviation
  • Minimum value
  • Maximum value

Basic Syntax:

PROC MEANS DATA=dataset-name;
VAR variable-name;
RUN;

Example:

PROC MEANS DATA=sashelp.class;
VAR age height weight;
RUN;

Output:

VariableNMeanStd DevMinimumMaximum
Age1913.321.4911.0016.00
Height1962.345.1351.3072.00
Weight19100.0322.7750.50150.00

PROC SUMMARY

PROC SUMMARY is similar to PROC MEANS but provides more control over the output. It creates summary statistics and can create output datasets.

Basic Syntax:

PROC SUMMARY DATA=dataset-name;
VAR variable-name;
OUTPUT OUT=output-dataset;
RUN;

Example:

PROC SUMMARY DATA=sashelp.class;
VAR age height weight;
OUTPUT OUT=summary_stats;
RUN;

PROC UNIVARIATE

PROC UNIVARIATE provides detailed descriptive statistics for numeric variables, including:

  • Moments (mean, variance, skewness, kurtosis)
  • Basic statistical measures (mean, median, mode)
  • Tests for location
  • Quantiles
  • Extreme observations

Basic Syntax:

PROC UNIVARIATE DATA=dataset-name;
VAR variable-name;
RUN;

Example:

PROC UNIVARIATE DATA=sashelp.class;
VAR age height weight;
RUN;

Advanced Options in PROC MEANS

Using CLASS Statement

The CLASS statement is used to group the analysis by categorical variables.

PROC MEANS DATA=sashelp.class;
CLASS sex;
VAR age height weight;
RUN;

Using BY Statement

The BY statement is similar to CLASS but requires the data to be sorted first.

PROC SORT DATA=sashelp.class OUT=class_sorted;
BY sex;
RUN;
PROC MEANS DATA=class_sorted;
BY sex;
VAR age height weight;
RUN;

Requesting Specific Statistics

You can request specific statistics using the appropriate keywords:

PROC MEANS DATA=sashelp.class N MEAN MEDIAN MODE RANGE STD;
VAR age height weight;
RUN;

Creating Output Datasets

You can create output datasets with summary statistics:

PROC MEANS DATA=sashelp.class NOPRINT;
VAR age height weight;
OUTPUT OUT=stats
N=n
MEAN=mean_age mean_height mean_weight
STD=std_age std_height std_weight;
RUN;

Advanced Options in PROC UNIVARIATE

Requesting Specific Plots

PROC UNIVARIATE DATA=sashelp.class PLOTS;
VAR age height weight;
RUN;

Testing for Normality

PROC UNIVARIATE DATA=sashelp.class NORMAL;
VAR age height weight;
RUN;

Creating Histograms

PROC UNIVARIATE DATA=sashelp.class HISTOGRAM;
VAR age height weight;
RUN;

Practical Examples

Example 1: Basic Descriptive Statistics

/* Create sample data */
DATA test_data;
INPUT ID Score @@;
DATALINES;
1 85 2 92 3 78 4 95 5 88
6 76 7 91 8 84 9 89 10 93
;
RUN;
/* Calculate descriptive statistics */
PROC MEANS DATA=test_data N MEAN MEDIAN MODE STD RANGE;
VAR Score;
RUN;

Example 2: Grouped Analysis

/* Create sample data with groups */
DATA grouped_data;
INPUT Group $ Value @@;
DATALINES;
A 25 A 30 A 35 A 40 A 45
B 15 B 20 B 25 B 30 B 35
;
RUN;
/* Calculate statistics by group */
PROC MEANS DATA=grouped_data;
CLASS Group;
VAR Value;
RUN;

Example 3: Comprehensive Analysis with PROC UNIVARIATE

PROC UNIVARIATE DATA=sashelp.class NORMAL PLOTS;
VAR height;
HISTOGRAM / NORMAL;
INSET N = 'N' MEAN = 'Mean' STD = 'Std Dev' / POS = NW;
RUN;

Interpretation of Results

Understanding the Output

When you run PROC MEANS or PROC UNIVARIATE, you’ll see various statistics:

  • N: Number of observations
  • Mean: Average value
  • Std Dev: Standard deviation
  • Minimum: Smallest value
  • Maximum: Largest value
  • Skewness: Measure of asymmetry
  • Kurtosis: Measure of tailedness

What to Look For

  1. Mean vs Median: If they’re very different, your data might be skewed
  2. Standard Deviation: Large values indicate high variability
  3. Skewness:
    • Positive: Right-skewed (tail extends to right)
    • Negative: Left-skewed (tail extends to left)
    • Close to 0: Approximately symmetric
  4. Kurtosis:
    • High kurtosis: Heavy tails, sharp peak
    • Low kurtosis: Light tails, flat peak

Best Practices

  1. Always check for missing values before running descriptive statistics
  2. Use appropriate procedures for your analysis needs
  3. Consider the scale of your variables when interpreting results
  4. Use visualizations alongside numerical statistics
  5. Document your analysis with clear titles and labels

Conclusion

Descriptive statistics provide the foundation for understanding your data. SAS offers powerful procedures like PROC MEANS, PROC SUMMARY, and PROC UNIVARIATE to calculate these statistics efficiently. By understanding both the concepts and the SAS implementation, you can effectively summarize and describe your data, which is crucial for making informed decisions in any data analysis project.

Remember that descriptive statistics are just the first step in data analysis. They help you understand what your data looks like, but for making inferences about populations, you’ll need to move on to inferential statistics.


Tags

PROC MEANSPROC SUMMARYPROC UNIVARIATEDescriptive Statistics

Share


© 2025 9to5sas
AboutContactPrivacyTerms