You can change the position of a variable in a SAS dataset with any of the following statements.
ATTRIB, ARRAY, FORMAT, INFORMAT, LENGTH, and RETAIN.
[su_note note_color=”#FEF09B” radius=”4″]
Note that only the variables whose positions are relevant need to be listed in the above statements. Variables not listed in these statements will retain their same positional order following the listed variables.
In order for any of these statements to have the desired effect, they must be placed before a SET, MERGE, or UPDATE statement in the DATA step.
For the examples in this article, we have used a subset of the SASHELP.RETAIL and SASHELP.BMIMEN dataset with few observations selected.
1. ATTRIB Statement
data attrib_method; attrib day month year date sales length=3; set sashelp.retail(obs=5); run;
2. Reorder Variables Using the LENGTH Statement
The Length statement is used to specify the number of bytes for storing character and numeric variables. The default length of numeric variables is 8 and the length of a character variable is set at the first occurrence of the variable. In the example dataset, all variables are numeric, so we can set the length of these variables to 3.
For more information, see our guide on Length And Precision Of SAS Variables.
data length_method; length day month year date sales 3.; set sashelp.retail; run;
3. Retain Statement
For more details, see our article on Retain Statement.
data retain_method; retain day month year date sales; set sashelp.retail; run;
However, unless the sole purpose of a data step is to re-order variables, the use of RETAIN for this purpose can lead to unwanted results when the value of any ‘retained’ variable is carried over from one data step iteration to another.
SAS arranges columns in a dataset according to the order in which the variables are declared or defined, so any variables declared prior to the SET statement will be placed first in the resulting output dataset.
In the below example, we have created a Category variable to hold the weight Category of a person based on BMI. By default, the category variable will be placed first in the output dataset.
In the right, table we created a Frequency table using the PROC FREQ procedure which shows the count of persons in each of these categories. Notice that, Category is missing if BMI = 18.5.
PROC SURVEYSELECT DATA=sashelp.bmimen OUT=BMI METHOD=SRS SAMPSIZE=10 SEED=1234567; RUN; data bmimen1 (keep=age bmi Category); format age bmi category; Length category $20.; set bmi; age=int(age); if (bmi < 18.5) then Category='underweight'; else if (bmi < 18.5) AND (bmi < 25) then Category='healthy weight '; else if (bmi >=25) AND (bmi < 30) then Category='overweight'; else if (bmi >=30) then category="obesity"; run;
We can see the counts under each category and the number of missing values using the Proc Freq procedure.
Now, we will use the retain statement to place the category variable at the end of the output dataset and reorder variables as age, BMI, category.
data bmimen1 (keep=age bmi Category); retain age bmi category; Length category $20.; set bmi; age=int(age); if (bmi < 18.5) then Category='underweight'; else if (bmi > 18.5) AND (bmi > 25) then Category='healthy weight '; else if (bmi >= 25) AND (bmi < 30) then Category='overweight'; else if (bmi >= 30) then category="obesity"; run;
As you can see using the retain statement, the count of the Underweight person has become 2 and there are no missing values. So, what went wrong?
In the original logic, due to the non-exhaustive IF/ELSE conditions, individuals with a BMI of 18.5 are not explicitly assigned a value for “Category”.
The inclusion of the RETAIN statement, however, means that BMI of 18.5 and 25 will be assigned to a category and they are assigned the value of “Category” that was retained from the previous data step iteration. which is “Underweight” where instead we might have expected a missing value.
This has lead to erroneous results.
Of course, the above error would be corrected by updating the second IF condition as –
else if (bmi >= 18.5) AND (bmi < 25) then, but this is only a simple case; with several levels of IF conditions and variable assignments in a single data step it’s likely possible to have misleading results, especially if you rely on missing values for OUTPUT statements or WHERE conditions later in the code.
4. Using format
The FORMAT statement is another alternative to reorder variables in a SAS dataset. Place the format statement before the SET statement and followed by a variable list (no formats need be specified), and SAS will arrange columns in the dataset.
[ You might also like: 6 methods to reorder Variables in SAS Data Set].
[su_note note_color=”#FEF09B” radius=”4″]
Note: The FORMAT statement must be placed before the SET statement. Placing FORMAT after the SET statement will reset the format of any variables specified in the ordering list.
data bmimen1 (keep=age bmi Category); format age bmi category; Length category $20.; set bmi; age=int(age); if (bmi < 18.5) then Category='underweight'; else if (bmi > 18.5) AND (bmi < 25) then Category='healthy weight '; else if (bmi >= 25) AND (bmi < 30) then Category='overweight'; else if (bmi >= 30) then category="obesity"; run;
5. Reorder variables using an array
The reason for separating character and numeric variables is because arrays do not allow the mixing of variable types.
data array_method; ARRAY num (*) 3 age bmi; array ch (*) $ 20 category; set bmimen1; run;
6. Proc SQL
PROC SQL in the context of variable reordering only REQUIRES a reordered list of variables but allows the specification of additional attributes, such as type, length, format, etc. PROC SQL also unique in that it allows you to rename or “alias” variables.
For more details, see our Comprehensive Guide To PROC SQL.
When using the SQL procedure, the order in which variables are listed in the SELECT statement will establish the variable order for the resulting output dataset.
proc sql outobs=5; create table bmi as select day, month, year, date, sales from sashelp.retail; run;
Though the RETAIN statement can be used to order columns in a SAS dataset, its intended function of retaining values across data step iterations can make it a poor choice for this purpose.
Instead, you can either use the FORMAT statement for data steps or the SELECT statement for PROC SQL as it has the same effect without compromising on the data. I would personally use and prefer the FORMAT statement to reorder the variable list.
So, this was our side to reorder variables in a SAS dataset. We really hope that you must have found it useful. Moreover, if you have any other suggestion regarding other plagiarism tools then suggest us below the comment section. We would really take those lists in our further blog post.
Thanks for reading!