This post is a comprehensive guide to SAS programming. It covers the basics of the SAS language, including data management, data manipulation, and various data analysis tasks.
Additionally, we will introduce the key components of SAS, such as SAS programs, SAS libraries, SAS files, SAS datasets, and SAS outputs. Lastly, the post discusses the properties and types of SAS files and datasets, along with their attributes and purposes.
What is SAS Programming?
SAS Programming is to use the SAS language for data management, data manipulation, and various data analysis tasks. It consists of outlines of SAS code to provide instructions to the SAS system for various data management or analytical tasks such as how to read, view, produce output, and manipulate or analyze the data.
What is the prerequisite for learning SAS Programming?
Components of SAS
The key components of SAS are the following:
SAS programs are widely used for accessing, analyzing, managing or presenting data. SAS programs combine DATA STEP and PROC (procedure) steps.
The primary method for creating a SAS dataset is using the data step. The data step begins with the DATA statement and contains other programming statements. You can use programming statements to manipulate the existing SAS dataset or create a new dataset from raw data files.
PROC steps are pre-defined procedures in SAS used to analyze and process data in a SAS data set. PROC steps are also used for data munging, transforming data from one “raw” data form into another format.
- Create new SAS data sets
- List, sort, and provide summaries of data.
- Generate statistical results.
- Generate plots and charts.
Libraries are collections of files that are accessible by the SAS system. A library can be a physical or logical collection of files. SAS datasets are stored in SAS Libraries. There are 2 types of SAS Libraries.
- Temporary Libraries
- Permanent Libraries
Temporary libraries in SAS are volatile, meaning any SAS files or datasets stored in the temporary library will be available only for the current SAS session and removed once the SAS session is closed. By default, SAS creates files in a temporary library known as WORK.
A permanent SAS library refers to the files stored on your computer’s external storage medium. These files or datasets are not deleted when the SAS session terminates. You can work with the files in a permanent SAS library by specifying the libref as the first part of a two-level SAS filename.
Predefined SAS Libraries
By default, SAS has several libraries that are listed below.
SASHELP is a Read-only permanent library that contains sample data and other files that control how SAS works.
SASUSER is a permanent library that contains SAS files in the Profile catalog and stores your settings. You might not have write access to the Sasuser directory using SAS Studio or SAS University Edition. To verify whether you have Write access, run the code below.
proc options option=rsasuser; run;
If the result from the
PROC OPTIONS procedure is NORSASUSER, then sasuser folder is writable. If the result from the
PROC OPTIONS code is RSASUSER, then sasuser folder is Read-only.
WORK is a temporary library for files that do not need to be saved from session to session. You can also define additional libraries. When you define a library, you indicate the location of your SAS files to SAS. After you define a library, you can manage SAS files within it.
The SAS System creates and uses various structured files called SAS files. These files are stored in the SAS data libraries. Learn more about SAS Libraries in the article Working with SAS libraries.
Types of SAS Files
SAS files stored in SAS data libraries are referred to as library members. Each member has a member type. The SAS System uses unique file extensions to differentiate between SAS files and external Windows files in a folder.
The extensions and member types are listed in the table below.
|SAS File Extension||Member||Description|
|.log||.log||SAS log file|
|.sas7sdat||DATA||SAS data file|
|.sas7sndx||INDEX||Data file index; not treated by the SAS System as a separate file|
|.sas7spgm||PROGRAM||Stored program (DATA step)|
|.sas7svew||VIEW||SAS data view|
|.sas7sacs||ACCESS||Access descriptor file|
|.sas7sods||SASODS||Output delivery system file|
|.sas7sdmd||DMDB||Data mining database|
|.sas7ssitm||ITEMSTOR||Item store file|
|.sas7sput||PUTILITY||Permanent utility file|
A data set is one of the types of SAS files. Data sets must have a name. Valid data set names can be 1 to 32 characters long and must begin with a letter or an underscore.
SAS Data set has two parts:
- Descriptive Portion
- Data portion
A SAS data set consists of two parts: a descriptor portion and a data portion. A SAS data set can also point to indexes, which enables SAS to locate rows in the data set more efficiently.
Once the data is in the form of a SAS data set, there is no need to specify the attributes of the data set or the variables in your program statements. SAS can obtain the information directly from the data set.
The descriptor portion of a SAS data set contains information about the data set, including the following:
- Name of the data set
- Date and time that the data set was created
- Number of observations
- Number of variables
The Collection of data values in SAS is in a table format.
Observation (Rows) in SAS
Observations (also called rows) in a SAS data set are collections of data values that usually relate to a single object.
Each row is a collection of data values related to a single object. Each column is a collection of values describing a particular characteristic; for each variable in a data set, there are a series of attributes.
SAS Variables (Columns)
Variables (also called columns) in the data set are values describing a particular characteristic.
The descriptor portion also contains information about the properties such as the variable’s name, type, length, format, informat, and label of each variable in the SAS data set.
Here is a listing of the attribute information in the descriptor portion of the SAS data set SASHELP.AIR
A variable name conforms to the SAS naming conventions and follows the same rules as SAS data set names.
Rules for Variable Names
- They can be 1 to 32 characters long.
- They must begin with a letter (A-Z, either uppercase or lowercase) or an underscore (_).
- They can continue with any combination of numbers, letters, or underscores.
VALIDVARNAME= System Option
The VALIDVARNAME system option is set to V7 (letters of the Latin alphabet, numerals, or underscores) by default.
If you would like to use characters other than the valid ones, you must specify
If the name includes a per cent sign (%) or an ampersand (&), or even a variable name with spaces, you must use single quotation marks in the name literal.
‘% of profit’n=percent;
VALIDVARNAME specifies the rules for valid SAS variable names that can be created and processed during a SAS session.
V7 specifies that variable names must follow these rules:
- SAS variable names can be up to 32 characters long.
- The first character must begin with a letter of the Latin alphabet (A – Z, either uppercase or lowercase) or an underscore (). Subsequent characters can be letters of the Latin alphabet, numerals, or underscores.
- Trailing blanks are ignored. The variable name alignment is left-justified.
- A variable name cannot contain blanks or special characters except for an underscore.
- A variable name can contain mixed-case letters. SAS stores and writes the variable name in the same case used in the first reference to the variable. However, when SAS processes a variable name, SAS internally converts it to uppercase. Therefore, you cannot use the same variable name with a different combination of uppercase and lowercase letters to represent different variables. For example, cat, Cat, and CAT all represent the same variable.
- You cannot assign variables with the names of special SAS automatic variables (such as _N and ERROR) or variable list names (such as NUMERIC, CHARACTER, and ALL) to variables.
UPCASE specifies that the variable name follows the same rules as V7, except that the variable name is uppercase, as in earlier versions of SAS.
ANY specifies that SAS variable names must follow these rules:
- The name can begin with or contain any characters, including blanks, national, special, and multi-byte characters.
- The name can be up to 32 bytes long.
- The name cannot contain any null bytes.
- Leading blanks are preserved, but trailing blanks are ignored.
- The name must contain at least one character. A name with all blanks is not permitted.
- A variable name can contain mixed-case letters. SAS stores and writes the variable name in
- The same case is used in the first reference to the variable. However, when SAS processes a variable name, SAS internally converts it to uppercase. Therefore, you cannot use the same variable name with a different combination of uppercase and lowercase letters to represent different variables. For example, cat, Cat, and CAT all represent the same variable.
SAS Variable Types
- Character Variables
- Numeric Variables
Character variables can contain any value while numeric variables can contain only numeric values.
Missing values are represented as a blank in Character variables, whereas a period represents a missing value for numeric values.
Length of SAS Variable
Character variables can be up to 32,767 bytes long, and Numeric variables have a constant default length of 8.
Format of SAS Variable
Formats are the variable attributes that affect how data values are written or control how the value is displayed.
SAS has character, numeric and, date and time formats. You can also create and store your formats. You must select the appropriate format to write values using a particular form.
For example, to display the value 1234 as $1,234.00 in a report, you can use the
You have to specify the maximum width (w) of the value to be written and (d) is the number of decimal places.
For example, to display the value 5678 as 5,678.00 in the output, you can use the COMMA8.2 format, which specifies a width of 8, including two decimal places.
Informat in SAS
Informats read data values in certain forms into standard SAS values. It determines how data values are read into a SAS data set.
You must use informats to read numeric values that contain letters or other special characters.
For example, the numeric value $12,345.00 contains two special characters, a dollar
sign ($) and a comma (,).
In this case, you can use an informat to read the value while removing the dollar sign and comma and then store the resulting value as a standard numeric value.
SAS Variable Labels
Labels are a variable’s attributes, consisting of descriptive text up to 256 characters long.
An index is a separate file that you can create for a SAS data file to provide
direct access to a specific observation.
The index file has the same name as its data file and a member type of INDEX. Indexes can provide faster access to specific observations, particularly when you have a large data set.
The purpose of SAS indexes is to optimise WHERE expressions and facilitate BY-group processing.
Extended attributes are defined on a data set or on a variable that is user-defined metadata. For example, you can store a description of a variable or the formula used to produce the variable value or save a URL that specifies information about your data set.
The XATTR ADD Statement adds extended attributes to variables or data sets.
To Store a name: value pairs with a data set:
XATTR ADD DS attribute_name = attribute_value;
To Store a name: value pairs with a variable:
XATTR ADD VAR variable_name(attribute_name=attribute_value);
SAS output is the result of executing SAS programs. Most SAS procedures and some DATA step applications produce output.
There are three types of SAS output:
- SAS Log file
- SAS Procedure output file
- SAS console log file
SAS Log file
The SAS log file is generated when a SAS program is executed. It contains information about the processing of SAS statements. Notes are written to the SAS log, and any applicable error or warning messages as each program step executes.
SAS Procedure Output File
SAS sends the output to the procedure output file whenever a SAS program executes a PROC step. SAS procedure output is handled by the Output Delivery System (ODS).
SAS Console Log File
The console log is used when an error, warning or note has to be written to the SAS log, but the log is not available.
In conclusion, SAS programming is a powerful tool for data management, manipulation, and analysis. This article covered the basics of SAS, including the key components of SAS, such as SAS programs, SAS libraries, SAS files, SAS datasets, and SAS outputs.
We also discussed the prerequisites for learning SAS programming and the different types of SAS files. With this knowledge, you can get started on your journey to becoming a proficient SAS programmer.