Data set Options in SAS5 min read

Like Tweet Pin it Share Share Email

In SAS the data set is the most important part of analyses, and it is crucial that you know all sorts of things about the data.

SAS has a multitude of data set options that will help you to accomplish your goals while working with the datasets.

Data set options are used to modify how a data set is either read or written. There are over three dozen of these options, and while you will generally only make use of a hand full of them, you should have a good idea of their scope.

To use these option(s) place them in parentheses immediately following the name of the data set to which they are to be applied.

While data set options can be used virtually anytime the data set is named, some of the options are situation-dependent, which means that you will have to understand what an option does before applying it.

For instance, options that control how a data set is to be read would not be used on a DATA statement.

In the following example, the KEEP data set option is applied to the data set being used by PROC SORT.

proc sort data=sashelp.cars(keep= make model price)
by price; 

Regardless of how many variables are in SASHELP.CARS the SORT procedure will only have to deal with the three variables provided in the KEEP= option.

For a SORT procedure, this can substantially speed up the processing.


It is possible to create an empty (zero observation) data set. However, we may want to control whether the new table will replace an existing table of the same name or not.


REPLACE=NO prevents the replacement of a permanent data set. This data set option overrides the system option of the same name. REPLACE=YES is the default.

REPEMPTY option determines whether or not an empty data set can overwrite an existing data set.

These two options are usually used together. Normally, we want to be able to replace permanent data sets unless the new version is empty.

Using the DATASTMTCHK option

Traditionally the issue of overwriting a data set that has observations with an empty one has been especially problematic when the semicolon has been left off of the DATA statement.

In the following DATA step, because of the missing semicolon in the SET statement is masked and 3 empty datasets are created.


data advrpt.VeryImportant
set sashelp.class; 

The missing semicolon causes SAS to see the SET statement as a part of the DATA statements. The result is that there is no incoming data set, consequently, the created data sets will have no variables or observations.

The DATASTMTCHK system option protects us from this very problem by not allowing datasets to be created with names, such as SET and MERGE. Setting DATASTMTCHCK to NONE removes this protection.

Password Protection

Data sets can be both encrypted and password protected. Password and encryption data set options include:

ALTERPassword to alter the data set
ENCRYPTEncrypt the data set
PWSpecify the password
PWREQPassword request window
READPassword to read the data set.
WRITEPassword to write to the dataset.
data Login(encrypt=yes pwreq=yes 
read=readpwd write=writepwd); 
DB='DEApp'; UID='MaryJ'; pwd='12z3'; output; 
DB='p127'; UID='Mary'; pwd='z123'; output; 
proc print data=login;

While these password protections can be useful within SAS, the protected files are still vulnerable to deletion or manipulation using tools outside of SAS.

Including the ENCRYPT option adds another layer of protection from tools other than SAS.

KEEP, DROP, and RENAME Options

When using the KEEP, DROP, or RENAME in a DATA step, you can choose between using data set options or DATA step statements.

The following examples highlight the differences between the KEEP statement and the KEEP= data set option.

data class(keep=name age sex);
set sashelp.class(keep=name age sex where=(age>'12'));
  • The KEEP statement below n is only applied to the new data set (WORK.CLASS) and in no way affects the Program Data Vector or what variables will be read from the incoming data set (SASHELP.CLASS).
  • The KEEP statement variable list is applied to the new outgoing data set.
  • The IF statement is executed after the entire observation is read and loaded into the PDV.

The KEEP statement is exactly the same as specifying the KEEP= option on the dataset in the DATA statement.

The KEEP= option on the SET statement is applied before the PDV is built and only the listed variables will be read from the incoming data set and will be included in the PDV. This can significantly improve performance when dealing with large datasets.

  • The KEEP= data set option only impacts which variables will be written to the new data set.
  • On the SET statement, the KEEP= data set option is applied to the incoming data set.
  • The WHERE= filter is specified as a data set option and is applied before observations are read.
  • The RENAME option allows you to change the name of a variable either as it is read or as it is written.

Using FIRSTOBS and OBS Data Set Options

The data set options FIRSTOBS and OBS can be used separately or in conjunction with one another to limit which observations are read and/or written.

  • FIRSTOBS: specifies the number of the first observation to be read
  • OBS: specifies the last observation that is to be read.

When these 2 options are used together, they work independently of each other. When FIRSTOBS is not used OBS corresponds to the number of observations that will be read. OBS= Option counts from observation 1 regardless of the value of FIRSTOBS.

proc print data=sashelp.class(firstobs=4 obs=6);
Using FIRSTOBS and OBS options

Only the first 6 observations are available to be printed (OBS=6); However, the first to be printed is the 4th observation (FIRSTOBS=4). As a result, only three observations are actually printed.

Comments (0)

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.