In SAS the data set is the most important part of analyses, and it is crucial that you know all sorts of things about the data.
SAS has a multitude of data set options that will help you to accomplish your goals while working with the datasets.
Data set options are used to modify how a data set is either read or written. There are over three dozen of these options, and while you will generally only make use of a hand full of them, you should have a good idea of their scope.
To use these option(s) place them in parentheses immediately following the name of the data set to which they are to be applied.
While data set options can be used virtually anytime the data set is named, some of the options are situation-dependent, which means that you will have to understand what an option does before applying it.
For instance, options that control how a data set is to be read would not be used on a DATA statement.
In the following example, the
KEEP data set option is applied to the data set being used by
proc sort data=sashelp.cars(keep= make model price) out=carslist; by price; run;
Regardless of how many variables are in SASHELP.CARS the SORT procedure will only have to deal with the three variables provided in the KEEP= option.
For a SORT procedure, this can substantially speed up the processing.
REPLACE and REPEMPTY
It is possible to create an empty (zero observation) data set. However, we may want to control whether the new table will replace an existing table of the same name or not.
REPLACE=NO prevents the replacement of a permanent data set. This data set option overrides the system option of the same name. REPLACE=YES is the default.
REPEMPTY option determines whether or not an empty data set can overwrite an existing data set.
These two options are usually used together. Normally, we want to be able to replace permanent data sets unless the new version is empty.
Using the DATASTMTCHK option
Traditionally the issue of overwriting a data set that has observations with an empty one has been especially problematic when the semicolon has been left off of the DATA statement.
In the following DATA step, because the missing semicolon in the SET statement is masked and 3 empty datasets are created.
MYLIB.VERYIMPORTANT, WORK.SET and SASHELP.CLASS
options DATASTMTCHK=NONE; data advrpt.VeryImportant set sashelp.class; run;
The missing semicolon causes SAS to see the SET statement as a part of the DATA statements. The result is that there is no incoming data set, consequently, the created data sets will have no variables or observations.
DATASTMTCHK system option protects us from this very problem by not allowing datasets to be created with names, such as
MERGE. Setting DATASTMTCHCK to NONE removes this protection.
Data sets can be both encrypted and password protected. Password and encryption data set options include:
||Password to alter the data set|
||Encrypt the data set|
||Specify the password|
||Password request window|
||Password to read the data set.|
||Password to write to the dataset.|
data Login(encrypt=yes pwreq=yes read=readpwd write=writepwd); DB='DEApp'; UID='MaryJ'; pwd='12z3'; output; DB='p127'; UID='Mary'; pwd='z123'; output; run;
While these password protections can be useful within SAS, the protected files are still vulnerable to deletion or manipulation using tools outside of SAS.
Including the ENCRYPT option adds another layer of protection from tools other than SAS.
KEEP, DROP, and RENAME Options
When using the
RENAME in a DATA step, you can choose between using data set options or DATA step statements.
The following examples highlight the differences between the KEEP statement and the KEEP= data set option.
data class(keep=name age sex); set sashelp.class(keep=name age sex where=(age>'12')); run;
- The KEEP statement below n is only applied to the new data set (WORK.CLASS) and in no way affects the Program Data Vector or what variables will be read from the incoming data set (SASHELP.CLASS).
- The KEEP statement variable list is applied to the new outgoing data set.
- The IF statement is executed after the entire observation is read and loaded into the PDV.
The KEEP statement is exactly the same as specifying the KEEP= option on the dataset in the DATA statement.
The KEEP= option on the SET statement is applied before the PDV is built and only the listed variables will be read from the incoming data set and will be included in the PDV. This can significantly improve performance when dealing with large datasets.
- The KEEP= data set option only impacts which variables will be written to the new data set.
- On the SET statement, the KEEP= data set option is applied to the incoming data set.
- The WHERE= filter is specified as a data set option and is applied before observations are read.
- The RENAME option allows you to change the name of a variable either as it is read or as it is written.
Using FIRSTOBS and OBS Data Set Options
The data set options FIRSTOBS and OBS can be used separately or in conjunction with one another to limit which observations are read and/or written.
FIRSTOBS: specifies the number of the first observation to be read
OBS: specifies the last observation that is to be read.
When these 2 options are used together, they work independently of each other. When FIRSTOBS is not used OBS corresponds to the number of observations that will be read. OBS= Option counts from observation 1 regardless of the value of FIRSTOBS.
proc print data=sashelp.class(firstobs=4 obs=6); run;
Only the first 6 observations are available to be printed (OBS=6); However, the first to be printed is the 4th observation (FIRSTOBS=4). As a result, only three observations are actually printed.