How to transpose a SAS dataset using the Proc Transpose procedure
PROC TRANSPOSE provides the ability to go from a long dataset (where there are multiple rows for a given subject) to a wide dataset (where there are multiple columns for a subject).
Most SAS procedures prefer normalised data that tends to be tall and narrow, like Proc means and Proc Freq.
Since we often do not have control over the form of the data when we receive it, we need to be able to convert the data from the normal to non-normal form and from non-normal to normal form or from long data to wide data and vice versa.
This process is known as transposing the data, and the operations are commonly performed by
Variables become observations, and observations become variables.
The general syntax of the Transpose Procedure
PROC TRANSPOSE DATA=Dataset-name OUT=New-dataset-name; BY variable(s); COPY variable(s); ID variable; VAR variable(s); RUN;
- The variables specified in the
BYstatement is transposed within the combination of the BY variable. The BY variables themselves aren’t transposed but are used to determine the row structure of the transposed dataset.
- The variables need to be sorted before running
PROC TRANSPOSEunless you specify the
- For long-to-wide transposes, the BY variables should uniquely identify each row.
- For wide-to-long transposes, the BY variables determine the row structure of the long data; that is, it determines the repetition of the rows.
- The variables need to be sorted before running
IDstatement can be used to help identify rows. The new columns created will be named as per the variables specified in the ID statement. Thus, ID Statement also gives names to the Transposed column.
- The ID statement also ties a value in a specific row to a specified new column.
- In the case of long-to-wide transposes, the structure of the column is determined by the ID variable. There will be one column for each unique value of the ID variable (or if multiple ID variables are present, one column for each unique combination of values).
- For wide-to-long transposes, you typically do not need an ID variable. However, if you do supply an ID variable, it will determine the column structure.
- The combination of variables on the BY and ID statements must identify down to the row level.
- The variables in the
VARstatement are transposed. If the VAR statement is not included, PROC TRANSPOSE will transpose all numeric variables that are not included in a BY statement or an ID statement. Character variables are transposed only if they are listed in a VAR statement.
- Usually, one variable is specified for a long to wide transpose, whereas multiple variables are specified for wide to long datasets.
- The output dataset returns one row for each variable in the VAR statement.
Transposing Long to Wide Datasets
PROC TRANSPOSE provides the ability to go from a long dataset to a wide dataset. Below is an example of a long dataset (SASHELP.ORSALES).
proc transpose data=sashelp.orsales out=sales; var quantity profit total_retail_price; run;
Transposing Wide to Long Datasets
The syntax for transposing wide to long datasets is identical. Still, the objective is to reduce the number of columns and create a data structure where multiple rows are used to define the different attributes of a variable.
proc transpose data=sashelp.library out=column1; id libref; var _all_; run;
Options available Proc Transpose
NAME= SAS automatic variable
_NAME_ contains the name of the variable being transposed. The remaining transposed variables are named
COL1 all the way through
DELIMITER= specifies a delimiter to use as a name for transposed variables in the output data set. The delimiter specified is inserted between variable values if more than one variable is given in the ID statement.
You can use the
SUFFIX= option to specify a prefix or suffix for each new variable name.
data exa; input subject test $ score; datalines; 1 post 92 1 pre 90 2 post 88 2 pre 77 3 post 50 3 pre 51 4 post 77 4 pre 72 5 post 69 5 pre 60 ; run;
proc transpose data=exa out=exa1 prefix=score; by subject; id test; var score; run;
Transposing multiple variables – Double Transpose
Double Transpose helps us to transpose multiple variables and reshape long data to a wide format.
Below is the original format of the data we want to convert to a wide format.
data subj; input subject Month $ potassium sodium; datalines; 210 JAN 5.0 14.0 210 FEB 3.0 11.0 210 MAR 2.0 12.0 211 JAN 1.0 11.0 211 FEB 5.0 10.0 211 MAR 3.0 19.0 212 JUN 3.0 12.0 ; run;
We want an output similar to the below.
The first PROC TRANSPOSE step creates one column for each value of the variable Potassium and Sodium, and all the values are stored in a single variable COL1.
proc transpose data=subj out=labtran; by subject Month notsorted; var sodium potassium; run;
The second PROC TRANSPOSE step reconverts the columns of Potassium and Sodium into rows. The data now has every month represented as a column for each Potassium and Sodium value.
proc transpose data=labtran out=sparsed(drop=_name_); by subject; var col1; id Month _name_; run; proc print;