SAS Tutorials – Basics of SAS Programming

What is SAS Programming?

SAS Programming is to use the SAS language for data management, data manipulation, and various data analysis tasks. It consists of outlines of SAS code to provide instructions to the SAS system for various data management or analytical tasks such as how to read, view, produce output, manipulate or analyze the data.

What is the prerequisite for learning SAS Programming?

The only prerequisite for learning SAS Programming is to have basic Statistics and SQL skills that will help you to understand SAS more effectively.

Components of SAS

The key components of SAS are the following:

SAS programming

SAS Programs

SAS programs are widely used for accessing, analyzing, managing or presenting the data. SAS programs are a combination of DATA STEP and PROC (procedure) steps.

Data Steps

The primary method for creating a SAS dataset is using the data step. The data step begins with the DATA statement and contains other programming statements. You can use programming statements for manipulating the existing SAS dataset or create a new dataset from raw data files.

Proc Steps

PROC steps are pre-defined procedures in SAS which used to analyze and process data in a SAS data set. PROC steps are also used for data munging which is the process of transforming data from one “raw” data form into another format.

  • Create new SAS data sets
  • List, sort, and provide summaries of data.
  • Generate statistical results.
  • Generate plots and charts.

SAS Libraries

Libraries are collections of files that are accessible by the SAS system. A library can be a physical or logical collection of files. SAS datasets are stored in SAS Libraries. There are 2 types of SAS Libraries.

  • Temporary Libraries
  • Permanent Libraries

Temporary Libraries

Temporary libraries in SAS are volatile which means any SAS files or datasets stored in the temporary library will be available only for the current SAS session and will be removed once the SAS session is closed. By default, SAS creates files in a temporary library known as WORK.

Permanent Libraries

Permanent SAS library refers to the files which are stored on an external storage medium of your computer. These files or datasets are not deleted when the SAS session terminates. You can work with the files in a permanent SAS library by specifying the libref as the first part of a two-level SAS filename.

SAS Libraries

Predefined SAS Libraries

By default, SAS has several libraries that are listed below.

Sashelp

SASHELP is a Read-only permanent library that contains sample data and other files that control how SAS works.

Sasuser

SASUSER is a permanent library that contains SAS files in the Profile catalog and that stores your personal settings. You might not have write access to the Sasuser directory if you are using SAS Studio or SAS University Edition. To verify whether you have Write access, run; the below code.

proc options option=rsasuser;
run;

If the result from the PROC OPTIONS procedure is NORSASUSER, then sasuser folder is writable. If the result from the PROC OPTIONS code is RSASUSER, then sasuser folder is Read-only.

Work

WORK is a temporary library for files that do not need to be saved from session to session. You can also define additional libraries. When you define a library, you indicate the location of your SAS files to SAS. After you define a library, you can manage SAS files within it.

SAS FILES

The SAS System creates and uses a variety of structured files called SAS files. These files are stored in the SAS data libraries. Learn more about SAS Libraries in the article Working with SAS libraries.

Types of SAS Files

SAS files stored in SAS data libraries are referred to as members of a library. Each member has a member type. The SAS System uses unique file extensions to differentiate between SAS files and external Windows files in a folder.

The extensions and member types are listed in the below table.

SAS File Extension Member Description
.sas .sas SAS program
.lst .lst Procedure output
.log .log SAS log file
.sas7sdat DATA SAS data file
.sas7sndx INDEX Data file index; not treated by the SAS System as a separate file
.sas7scat CATALOG SAS catalog
.sas7spgm PROGRAM Stored program (DATA step)
.sas7svew VIEW SAS data view
.sas7sacs ACCESS Access descriptor file
.sas7saud AUDIT Audit file
.sas7sfdb FDB Consolidation database
.sas7smdb MDDB Multi-dimensional database
.sas7sods SASODS Output delivery system file
.sas7sdmd DMDB Data mining database
.sas7ssitm ITEMSTOR Item store file
.sas7sutl UTILITY Utility file
.sas7sput PUTILITY Permanent utility file
.sas7sbak BACKUP Backup file

 

SAS Datasets

A data set is one of the types of SAS files. Data sets must have a name. Valid data set names can be 1 to 32 characters long and must begin with a letter or an underscore.

SAS Data set has two parts:

  • Descriptive Portion
  • Data portion
Parts of SAS dataset
SAS Datasets Properties

Descriptive portion

A SAS data set consists of two parts: a descriptor portion and a data portion. A SAS data set can also point to indexes, which enables SAS to locate rows in the data set more efficiently.

Once the data is in the form of a SAS data set, there is no need to specify the attributes of the data set or the variables in your program statements. SAS can obtain the information directly from the data set.

The descriptor portion of a SAS data set contains information about the data set, including the following:

  • Name of the data set
  • Date and time that the data set was created
  • Number of observations
  • Number of variables

Data Portion

The Collection of data values in SAS is in a table format.

Observation (Rows) in SAS

Observations (also called rows) in a SAS data set are collections of data values that usually relate to a single object.

Observation (Rows) in SAS
Observation (Rows) in SAS

Each row is a collection of data values related to a single object. Each column is a collection of values describing a particular characteristic. and for each of the variables in a data set, there are a series of attributes.

SAS Variables (Columns)

Variables (also called columns) in the data set are collections of values that describe a particular characteristic.

SAS Tutorials - Basics of SAS Programming
Variables (Columns) in SAS

Variable Attributes

The descriptor portion also contains information about the properties such as the variable’s name, type, length, format, informat, and label of each variable in the SAS data set.

Here is a listing of the attribute information in the descriptor portion of the SAS data set SASHELP.AIR

SAS Tutorials - Basics of SAS Programming
SAS Variable attributes

Variable Names

A variable name conforms to the SAS naming conventions and follows the same rules as SAS data set names.

Rules for Variable Names

  • They can be 1 to 32 characters long.
  • They must begin with a letter (A-Z, either uppercase or lowercase) or an underscore (_).
  • They can continue with any combination of numbers, letters, or underscores.

VALIDVARNAME= System Option

VALIDVARNAME system option is set to V7 (letters of the Latin alphabet, numerals, or underscores) by default.

If you would like to use characters other than the valid ones, you must specify VALIDVARNAME=ANY.

If the name includes a per cent sign (%) or an ampersand (&) or ven variable name with spaces, you must use single quotation marks in the name literal.

Example : ‘% of profit’n=percent;

VALIDVARNAME specifies the rules for valid SAS variable names that can be created and processed during a SAS session.

Syntax, VALIDVARNAME=

VALIDVARNAME= V7|UPCASE|ANY

V7 specifies that variable names must follow these rules:

  • SAS variable names can be up to 32 characters long.
  • The first character must begin with a letter of the Latin alphabet (A – Z, either uppercase or lowercase) or an underscore (). Subsequent characters can be letters of the Latin alphabet, numerals, or underscores.
  • Trailing blanks are ignored. The variable name alignment is left-justified.
  • A variable name cannot contain blanks or special characters except for an underscore.
  • A variable name can contain mixed-case letters. SAS stores and writes the variable name in the same case used in the first reference to the variable. However, when SAS processes a variable name, SAS internally converts it to uppercase. Therefore, you cannot use the same variable name with a different combination of uppercase and lowercase letters to represent different variables. For example, cat, Cat, and CAT all represent the same variable.
  • You cannot assign variables with the names of special SAS automatic variables (such as _N and ERROR) or variable list names (such as NUMERIC, CHARACTER, and ALL) to variables.

UPCASE specifies that the variable name follows the same rules as V7, except that the variable name is uppercase, as in earlier versions of SAS.

ANY specifies that SAS variable names must follow these rules:

  • The name can begin with or contain any characters, including blanks, national, special, and multi-byte characters.
  • The name can be up to 32 bytes long.
  • The name cannot contain any null bytes.
  • Leading blanks are preserved, but trailing blanks are ignored.
  • The name must contain at least one character. A name with all blanks is not permitted.
  • A variable name can contain mixed-case letters. SAS stores and writes the variable name in
  • The same case is used in the first reference to the variable. However, when SAS processes a variable name, SAS internally converts it to uppercase. Therefore, you cannot use the same variable name with a different combination of uppercase and lowercase letters to represent different variables. For example, cat, Cat, and CAT all represent the same variable.

SAS Variable Types

Character variables can contain any value. whereas numeric variables can contain only numeric values.

Missing values are represented as a blank in Character variables, whereas a period represents a missing value for numeric values.

Length of SAS Variable

Character variables can be up to 32,767 bytes long, and Numeric variables have a constant default length of 8.

Format of SAS Variable

Formats are the variable attributes that affect how data values are written or control how the value is displayed.

SAS has character, numeric, and date and time formats. You can also create and store your formats. To write values using a particular form, you must select the appropriate format.

For example, to display the value 1234 as $1,234.00 in a report, you can use the
DOLLAR9.2 format.

You have to specify the maximum width (w) of the value to be written and (d) is the number of decimal places.

For example, to display the value 5678 as 5,678.00 in the output, you can use the COMMA8.2 format, which specifies a width of 8, including two decimal places.

Informat in SAS

Informats read data values in certain forms into standard SAS values. It determines how data values are read into a SAS data set.

You must use informats to read numeric values that contain letters or other special characters.

For example, the numeric value $12,345.00 contains two special characters, a dollar
sign ($) and a comma (,).

In this case, you can use an informat to read the value while removing the dollar sign and comma and then store the resulting value as a standard numeric value.

Read: Ultimate Guide to SAS Formats and Informats

SAS Variable Labels

Labels are a variable’s attributes, consisting of descriptive text up to 256 characters long.

SAS Indexes

An index is a separate file that you can create for a SAS data file to provide
direct access to a specific observation.

The index file has the same name as its data file and a member type of INDEX. Indexes can provide faster access to specific observations, particularly when you have a large data set.

The purpose of SAS indexes is to optimise WHERE expressions and facilitate BY-group processing.

Extended attributes

Extended attributes are defined on a data set or on a variable that is user-defined metadata. For example, you can store a description of a variable or the formula used to produce the variable value or save a URL that specifies information about your data set.

The XATTR ADD Statement is used to add extended attributes to variables or data sets.

To Store a name: value pairs with a data set:

XATTR ADD DS attribute_name = attribute_value;

To Store a name: value pairs with a variable:

XATTR ADD VAR variable_name(attribute_name=attribute_value);

SAS Outputs

SAS output is the result of executing SAS programs. Most SAS procedures and some DATA step applications produce output.

There are three types of SAS output:

  • SAS Log file
  • SAS Procedure output file
  • SAS console log file

SAS Log file

The SAS log file is generated when a SAS program is executed. It contains information about the processing of SAS statements. Notes are written to the SAS log and any applicable error or warning messages as each program step executes.

SAS Procedure Output File

SAS sends the output to the procedure output file whenever a SAS program executes a PROC step. SAS procedure output is handled by the Output Delivery System (ODS).

SAS Console Log File

The console log is used when an error, warning or note has to be written to the SAS log, but the log is not available.

Every week we'll send you SAS tips and in-depth tutorials

JOIN OUR COMMUNITY OF SAS Programmers!

Subhro Kar is an Analyst with over five years of experience. As a programmer specializing in SAS (Statistical Analysis System), Subhro also offers tutorials and guides on how to approach the coding language. His website, 9to5sas, offers students and new programmers useful easy-to-grasp resources to help them understand the fundamentals of SAS. Through this website, he shares his passion for programming while giving back to up-and-coming programmers in the field. Subhro’s mission is to offer quality tips, tricks, and lessons that give SAS beginners the skills they need to succeed.