Data set Options in SAS

In SAS, the data set is an essential part of analyses, and you must know all sorts of things about the data.

SAS has many data set options to help you accomplish your goals while working with the datasets.

Data set options modify how a data set is read or written. There are over three dozen of these options, and while you will generally only make use of a hand full of them, you should have a good idea of their scope.

To use these option (s), place them in parentheses immediately following the name of the data set to which they are to be applied.

While you can use data set options virtually anytime the data set is named, some options are situation-dependent. Therefore, you must understand what an option does before applying it.

For instance, options that control how a data set is to be read would not be used on a DATA statement.

In the following example, the KEEP data set option is applied to the data set being used by PROC SORT.

proc sort make model price)
by price; 

Regardless of how many variables are in SASHELP.CARS, the SORT procedure will only have to deal with the three variables provided in the KEEP= option.

For a SORT procedure, this can substantially speed up the processing.

REPLACE and REPEMPTY Dataset Options

It is possible to create an empty (zero observation) data set. However, we may want to control whether the new table will replace an existing table of the same name or not.

REPLACE=NO – It prevents the replacement of a permanent data set. This data set option overrides the system option of the same name. REPLACE=YES is the default.

REPEMPTY option determines whether or not an empty data set can overwrite an existing data set.

These two options are usually used together. Typically, we want to be able to replace permanent data sets unless the new version is empty.

Using the DATASTMTCHK Dataset Option

Traditionally, overwriting a data set with observations with an empty one has been especially problematic when the semicolon is left off the DATA statement.

The missing semicolon in the SET statement is masked in the following DATA step, and three empty datasets are created.


data advrpt.VeryImportant
set sashelp.class; 

The missing semicolon causes SAS to see the SET statement part of the DATA statements. The result is that there is no incoming data set; consequently, the created data sets will have no variables or observations.

The DATASTMTCHK system option protects us from this problem by not allowing datasets to be created with names like SET and MERGE. Setting DATASTMTCHCK to NONE removes this protection.

Password Protection DataSet Options

Data sets can be both encrypted and password protected. Password and encryption data set options to include:

ALTER Password to alter the data set
ENCRYPT Encrypt the data set
PW Specify the password
PWREQ Password request window
READ Password to read the data set.
WRITE Password to write to the dataset.
Password Protection dataset options
data Login(encrypt=yes pwreq=yes 
read=readpwd write=writepwd); 
DB='DEApp'; UID='MaryJ'; pwd='12z3'; output; 
DB='p127'; UID='Mary'; pwd='z123'; output; 

While these password protections can be helpful within SAS, the protected files are still vulnerable to deletion or manipulation using tools outside of SAS.

Including the ENCRYPT option adds another layer of protection from tools other than SAS.

ReadHow to Password protect SAS datasets?

KEEP, DROP, and RENAME Dataset Options

When using the KEEP, DROP, or RENAME in a DATA step, you can choose between data set options or DATA step statements.

The following examples highlight the differences between the KEEP statement and the KEEP= data set option.

data class(keep=name age sex);
set sashelp.class(keep=name age sex where=(age>'12'));
  • The KEEP statement below n is only applied to the new data set (WORK.CLASS) and in no way affects the Program Data Vector or what variables will be read from the incoming data set (SASHELP.CLASS).
  • The KEEP statement variable list is applied to the new outgoing data set.
  • The IF statement is executed after the entire observation is read and loaded into the PDV.

The KEEP statement is the same as specifying the KEEP= option on the dataset in the DATA statement.

The KEEP= option on the SET statement is applied before the PDV is built, and only the listed variables will be read from the incoming data set and included in the PDV. It can significantly improve performance when dealing with large datasets.

  • The KEEP= data set option only impacts which variables will be written to the new data set.
  • On the SET statement, the KEEP= data set option is applied to the incoming data set.
  • The WHERE= filter is specified as a data set option and is used before observations are read.
  • The RENAME option allows you to change the name of a variable either as it is read or written.

Should you Dataset options or Statements

During the DATA step, the DROP, KEEP, and RENAME statements or the DROP=, KEEP=, and RENAME= data set options to tell SAS which variables to process or output.

You can get the results you want by using one or more of these statements and data set options alone or together.

Order of Application

If your program needs you to use more than one data set option or a mix of data set options and statements, it’s helpful to know that SAS drops, keeps, and renames variables in the following order:

  • First, SET, MERGE, and UPDATE statements evaluate the options on input data sets from left to right. Before the RENAME= option is used, the DROP= and KEEP= options are used.
  • The RENAME statement comes after the DROP and KEEP statements.
  • Lastly, the DATA statement looks at the options for output data sets from left to right. Before the RENAME= option is used, the DROP= and KEEP= options are used.

Using FIRSTOBS and OBS Data Set Options

You can use FIRSTOBS and OBS separately or in conjunction to limit which observations are read and written.

  • FIRSTOBS: specifies the number of the first observation to be read
  • OBS: specifies the last observation that is to be read.

When these two options are used together, they work independently. When FIRSTOBS is not used, OBS corresponds to the number of observations that will be read. OBS= option counts from observation one regardless of the value of FIRSTOBS.

proc print data=sashelp.class(firstobs=4 obs=6);
Using FIRSTOBS and OBS options

Only the first six observations are available to be printed because of the OBS=6; However, the first to be printed in the 4th observation (FIRSTOBS=4). As a result, only three observations are printed.

Every week we'll send you SAS tips and in-depth tutorials


Subhro Kar is an Analyst with over five years of experience. As a programmer specializing in SAS (Statistical Analysis System), Subhro also offers tutorials and guides on how to approach the coding language. His website, 9to5sas, offers students and new programmers useful easy-to-grasp resources to help them understand the fundamentals of SAS. Through this website, he shares his passion for programming while giving back to up-and-coming programmers in the field. Subhro’s mission is to offer quality tips, tricks, and lessons that give SAS beginners the skills they need to succeed.