PROC TABULATE in SAS is used to display descriptive statistics in table format. PROC TABULATE has many statements that define how this procedure will summarize the data.
The PROC TABULATE procedure is a substitute for PROC MEANS and PROC FREQ.
The PROC TABULATE procedure must have a CLASS statement to specify any categorical variables to summarize. Any numerical variables must be included in the VAR statement of PROC TABULATE.
Proc Report and Proc Tabulate are both used to create summary tables. Although, Proc Report can create detail tables with COMPUTE Blocks which cannot be done using PROC Tabulate.
TABULATE produces three dimensions of reports: PAGE, ROW and COLUMN dimensions, using a TABLE statement along with a CLASS and VAR statement, which is used for calculating a summary function.
In PROC REPORT, everything is done with the COLUMN and DEFINE statements, along with COMPUTE blocks.
With PROC REPORT, you can have custom lines between groups, for example, “TOTAL SALES of SUV cars in USA Region”, whereas TABULATE Procedure does not have this feature.
For more details on which procedure to use in which scenarios, you can read the article on PROC TABULATE versus PROC REPORT.
The three statements are necessary for every Proc Tabulate step.
|CLASS||The CLASS statement is used to form groups within rows and columns.|
|VAR||The VAR statements include the numeric variables which need to be summarized.|
|TABLE||The TABLE statements contain the table definition.|
The TABLE statement is the most important statement for the PROC TABULATE step. It has dimensions and definitions within those dimensions.
The tables generated by TABULATE can have up to three dimensions to their definition: page, row, and column.
These dimensions always appear in page, row, and column order.
|PAGE||Pages define how the individual pages are formed.|
|ROW||It is the most used dimension and defines the table’s rows within each page.|
|COLUMN||Columns are always present and define the columns within rows and pages.|
There has to be at least one column dimension, and you cannot have a page dimension without having both row and column dimensions.
The general syntax of the TABLE statement is
table page, row, column;
To build the individual page with row and column dimensions, you must use a combination of options or elements. The three types of options are:
|SINGULAR||It is used when a single element is needed.|
|CONCATENATED||Using this option, you can join or concatenate multiple elements using a space.|
|NESTED||One element is nested within another to form a hierarchy using an asterisk.|
A singular element consists of a single variable. A basic example of a singular table is below.
title "Single Column Table Example"; proc tabulate data=sashelp.class; class sex; var weight; table sex,weight; run;
The analysis variable, Weight, is specified in the VAR statement and a single column.
Since there is no statistic specified, the default statistic of SUM is displayed.
Using the concatenated statement, you can combine multiple elements within columns and/or rows. A concatenated definition is formed when two or more space-separated elements are included in the same dimension.
<span class="token keyword">title</span> <span class="token string">"Concatenated Elements Example";</span>proc tabulate <span class="token keyword">data=</span>sashelp<span class="token punctuation">.</span>cars <span class="token format"><span class="token keyword">format=dollar8.</span>;</span> <span class="token keyword">class</span> Origin Type<span class="token punctuation">;</span> <span class="token keyword">var</span> MSRP INVOICE<span class="token punctuation">;</span> <span class="token keyword">table</span> Origin Type<span class="token punctuation">,</span> MSRP INVOICE<span class="token punctuation">;</span> <span class="token step keyword">run;</span>
The above example has two classification variables(Origin and Type). The label associated with each analysis variable is used in the column header by default.
The analysis and classification variables can be used in page, row, or column dimensions.
Nested definitions allow us to create tables within tables. The nested elements can be classification variables, analysis variables, statistics, options, and modifiers.
These are designated as nested elements through the use of the asterisk.
title "Nested Elements Example"; proc tabulate data=sashelp.cars; class ORIGIN TYPE; var MSRP; table ORIGIN,TYPE*MSRP*(N max min); run;
In the above example. the row dimension(ORIGIN) is singular, and the column dimension (TYPE) has the analysis variable MSRP nested within a classification variable TYPE.
Three space-separated statistics are concatenated into a group with parenthesis and then the group is nested under the variable TYPE.
Combination of Elements
TABLE statement can contain a combination of nested and concatenated elements.
These will include not only variables and statistics but options as well.
proc tabulate data=sashelp.cars; class Origin Type; var MSRP; table Type='Type of cars' all='Total' origin all='Total', msrp='Price in dollars'*(n max*f=dollar8. min*f=dollar8. mean) /box='Type of cars across regions'; keylabel n='Number of cars' max="Maximum Price" min='Minimum Price' mean='Average Price'; run;
The table definition has two concatenated elements(Type and Origin) in the row dimension.
The ALL keyword summarizes across the associated elements: Type and Origin.
Dollar Format(f=dollar8.) is used for Max and Min Statistics.
The BOX= option adds the text in the upper left corner of the table.
The KEYLABEL statement allows you to assign a text label to statistics and the keyword ALL.
Calculating Percentages Using PROC TABULATE
The PCTN and PCTSUM options request the calculation of percentages based on the denominator specified using angle brackets.
PCTN calculates the percentages based on counts (N), while PCTSUM calculates the percentages based on the total of an analysis variable.
Percentage based on counts
<span class="token step keyword">proc tabulate</span> <span class="token proc-args"><span class="token arg keyword">data=sashelp.cars;</span></span> <span class="token keyword">class</span> origin drivetrain<span class="token punctuation">;</span> <span class="token keyword">table</span> origin<span class="token punctuation">,</span>drivetrain <span class="token operator">*(</span>n pctn<span class="token operator"><</span>drivetrain<span class="token operator">>='%')</span> all<span class="token punctuation">;</span>run<span class="token punctuation">;</span>
Within each value of ORIGIN, the percentage of observations for each value of DRIVETRAIN is calculated.
Since PCTN is nested within ORIGIN, the denominator(DRIVETRAIN) is the total count for that ORIGIN value.
From the above table, we can interpret that 21.52% (out of 158 ) of All-Wheel drive is from Asia.
Percentage Generation Statistics
Instead of providing denominators in angular brackets, you can also use several percentage generation statistics. For these statistics, the denominator is predetermined.
The denominator can be based on a report, page or column.
|Percentage applies to:||Per cent Frequency||Per cent Total|
proc tabulate data=sashelp.cars(where=(upcase(type) in ('SUV','SEDAN','SPORTS'))); class origin drivetrain type; tables origin*(type all)all, drivetrain*(n colpctn rowpctn pctn<type all>="Type PCTN" pctn<drivetrain>="DriveTrain PCTN") all(n colpctn rowpctn pctn<type all>); run;
Percentage based on the total analysis variable
Proc Format; inValue status 'Dead' = 1 'Alive' = 0; Run; data heart; set sashelp.heart; Status_num= input( status, status. ); run; proc tabulate data=heart; class BP_Status sex; var status_num; table sex * (sum='Deaths' pctsum<bp_status>='% of row' pctsum<sex>='% of column' pctsum) all,bp_status*status_num; run;
<bp_status>sums the values of Deaths for all occurrences of BP_STATUS within the same value of SEX. Thus, for SEX=”FEMALE”, the denominator is 529+ 248+82 = 859 and the row percentage is 529/859*100 =61.58.
<BP_STATUS>sums the frequency counts for all occurrences of SEX within the same value of BP_STATUS. Thus, for BP_STATUS=”HIGH”, the denominator is 529+602 =1131 and the Percentage is 529/1131*100 = 46.77%
The third use of PCTN has no denominator definition; therefore, all class variables (BP_STATUS and SEX)are included in the denominator definition.
Thus, for all cells, the denominator is 529 + 602 + 248 + 379 + 82 + 82 = 1922 and the percentage is 529/1922*100 = 27.52%
You can also read Tips for Generating Percentages Using the SAS TABULATE Procedure for more advanced usage of PROC Tabulate.