What is SAS programming?
SAS Programming is to use the SAS language for data management, data manipulation, and various data analysis tasks. It consists of outlines of SAS code to provide instructions to the SAS system for various data management or analytical tasks such as how to read, view, produce output, manipulate or analyze the data.
What is the prerequisite for learning SAS Programming?
Components of SAS
The key components of SAS are the following:
SAS programs are widely used for accessing, analyzing, managing or presenting the data. SAS programs are a combination of DATA STEP and PROC (procedure) steps.
The primary method for creating a SAS dataset is using the data step. The data step begins with the DATA statement and contains other programming statements. You can use programming statements for manipulating the existing SAS dataset or create a new dataset from raw data files.
PROC steps are pre-defined procedures in SAS which used to analyze and process data in a SAS data set. PROC steps are also used for data munging which is the process of transforming data from one “raw” data form into another format.
- Create new SAS data sets
- List, sort, and provide summaries of data.
- Generate statistical results.
- Generate plots and charts.
Libraries are collections of files that are accessible by the SAS system. A library can be a physical or logical collection of the files. SAS datasets are stored in SAS Libraries. There are 2 types of SAS Libraries.
- Temporary Libraries
- Permanent Libraries
Temporary libraries in SAS are volatile which means any SAS files or datasets stored in the temporary library will be available only for the current SAS session and will be removed once the SAS session is closed. By default, SAS creates files in a temporary library known as WORK.
Permanent SAS library refers to the files which are stored on an external storage medium of your computer. These files or datasets are not deleted when the SAS session terminates. You can work with the files in a permanent SAS library by specifying the libref as the first part of a two-level SAS filename.
Predefined SAS Libraries
By default, SAS has several libraries that are listed below.
SASHELP is a Read-only permanent library that contains sample data and other files that control how SAS works.
SASUSER is a permanent library that contains SAS files in the Profile catalog and that stores your personal settings. You might not have write access to the Sasuser directory if you are using SAS Studio or SAS University Edition. To verify whether you have Write access, run the below code.
proc options option=rsasuser; run;
If the result from the
PROC OPTIONS procedure is NORSASUSER, then sasuser folder is writable. If the result from the
PROC OPTIONS code is RSASUSER, then sasuser folder is Read-only.
WORK is a temporary library for files that do not need to be saved from session to session. You can also define additional libraries. When you define a library, you indicate the location of your SAS files to SAS. After you define a library, you can manage SAS files within it.
The SAS System creates and uses a variety of structured files called SAS files. These files are stored in the SAS data libraries. Learn more about SAS Libraries in the article Working with SAS libraries.
Types of SAS Files
SAS files stored in SAS data libraries are referred to as members of a library. Each member has a member type. The SAS System differentiates between SAS files and external Windows files in a folder by using unique file extensions.
Some of the common extensions and member types are listed in the below table.
For the complete list of SAS file extension, you can check out the SAS website.
A data set is one of the types of SAS files. Data sets must have a name. Valid data set names can be 1 to 32 characters long and must begin with a letter or an underscore.
SAS Data set has two parts:
- Descriptive Portion
- Data portion
A SAS data set consists of two parts: a descriptor portion and a data portion. A SAS data set can also point to indexes, which enables SAS to locate rows in the data set more efficiently.
Once the data is in the form of a SAS data set, there is no need to specify the attributes of the data set or the variables in your program statements. SAS can obtain the information directly from the data set.
The descriptor portion of a SAS data set contains information about the data set, including the following:
- Name of the data set
- Date and time that the data set was created
- Number of observations
- Number of variables
The Collection of data values in SAS is in a table format.
Observations (also called rows) in a SAS data set are collections of data values that usually relate to a single object.
Each row is a collection of data values that relate to a single object. Each of the columns is a collection of values that describe a particular characteristic. and for each of the variables in a data set, there are a series of attributes.
Variables (also called columns) in the data set are collections of values that describe a particular characteristic.
The descriptor portion also contains information about the properties such as variable’s name, type, length, format, informat, and label of each variable in the SAS data set.
Here is a listing of the attribute information in the descriptor portion of the SAS data set SASHELP.AIR
A variable name conforms to the SAS naming conventions and it follows the same rules as SAS data set names.
Rules for Variable Names
- They can be 1 to 32 characters long.
- They must begin with a letter (A-Z, either uppercase or lowercase) or an underscore (_).
- They can continue with any combination of numbers, letters, or underscores.
VALIDVARNAME= System Option
VALIDVARNAME system option is set to V7 (letters of the Latin alphabet, numerals, or underscores) by default.
If you would like to use characters other than the valid ones, you must specify
If the name includes either a percent sign (%) or an ampersand (&) or ven variable name with spaces you must use single quotation marks in the name literal.
‘% of profit’n=percent;
VALIDVARNAME specifies the rules for valid SAS variable names that can be created
and processed during a SAS session.
V7 specifies that variable names must follow these rules:
- SAS variable names can be up to 32 characters long.
- The first character must begin with a letter of the Latin alphabet (A – Z, either uppercase or lowercase) or an underscore (). Subsequent characters can be letters of the Latin alphabet, numerals, or underscores.
- Trailing blanks are ignored. The variable name alignment is left-justified.
- A variable name cannot contain blanks or special characters except for an underscore.
- A variable name can contain mixed-case letters. SAS stores and writes the variable name in the same case that is used in the first reference to the variable. However, when SAS processes a variable name, SAS internally converts it to uppercase. Therefore, you cannot use the same variable name with a different combination of uppercase and lowercase letters to represent different variables. For example, cat, Cat, and CAT all represent the same variable.
- You cannot assign variables with the names of special SAS automatic variables (such as _N and ERROR) or variable list names (such as NUMERIC, CHARACTER, and ALL) to variables.
UPCASE specifies that the variable name follows the same rules as V7, except that the variable name is uppercase, as in earlier versions of SAS.
ANY specifies that SAS variable names must follow these rules:
- The name can begin with or contain any characters, including blanks, national characters, special characters, and multi-byte characters.
- The name can be up to 32 bytes long.
- The name cannot contain any null bytes.
- Leading blanks are preserved, but trailing blanks are ignored.
- The name must contain at least one character. A name with all blanks is not permitted.
- A variable name can contain mixed-case letters. SAS stores and writes the variable name in
- The same case is used in the first reference to the variable. However, when SAS processes a variable name, SAS internally converts it to uppercase. Therefore, you cannot use the same variable name with a different combination of uppercase and lowercase letters to represent different variables. For example, cat, Cat, and CAT all represent the same variable.
- Character Variables
- Numeric Variables
Character variables can contain any value. whereas numeric variables can contain only numeric values.
Missing values are represented as a blank in Character variables whereas a period represents a missing value for numeric values.
Character variables can be up to 32,767 bytes long and Numeric variables have a constant default length of 8.
Formats are the variable attributes that affect how data values are written or control how the value is displayed.
SAS has character, numeric, and date and time formats. You can also create and store your own formats. To write values out using a particular form, you have to select the appropriate format.
For example, to display the value 1234 as $1,234.00 in a report, you can use the
You have to specify the maximum width (w) of the value to be written and (d) is the number of decimal places.
For example, to display the value 5678 as 5,678.00 in the output, you can use the COMMA8.2 format, which specifies a width of 8 including 2 decimal places.
Informats read data values in certain forms into standard SAS values. It determines how data values are read into a SAS data set.
You must use informats to read numeric values that contain letters or other special characters.
For example, the numeric value $12,345.00 contains two special characters, a dollar
sign ($) and a comma (,).
In this case, you can use an informat to read the value while removing the dollar sign and comma, and then store the resulting value as a standard numeric value.
Labels are attributes of a variable, which consists of descriptive text up to 256 characters long.
An index is a separate file that you can create for a SAS data file in order to provide
direct access to a specific observation.
The index file has the same name as its data file and a member type of INDEX. Indexes can provide faster access to specific observations, particularly when you have a large data set.
The purpose of SAS indexes is to optimize WHERE expressions and to facilitate BY-group processing.
Extended attributes are defined on a data set or on a variable that is user-defined metadata. For example, you can store a description of a variable or the formula used to produce the variable value or save a URL that specifies information about your data set.
To Store a name: value pairs with a data set:
XATTR ADD DS attribute_name = attribute_value;
To Store a name: value pairs with a variable:
XATTR ADD VAR variable_name(attribute_name=attribute_value);
SAS output is the result of executing SAS programs. Most SAS procedures and some DATA step applications produce output.
There are three types of SAS output:
- SAS Log file
- SAS Procedure output file
- SAS console log file
SAS Log file
The SAS log file is generated when a SAS program is executed. It contains information about the processing of SAS statements. Notes are written to the SAS log along with any applicable error or warning messages as each program steps executes.
SAS Procedure Output File
SAS sends the output to the procedure output file whenever a SAS program executes a PROC step. SAS procedure output is handled by the Output Delivery System (ODS).
SAS Console Log File
The console log is used when an error, warning or note has to be written to the SAS log but the log is not available.