A comprehensive guide to PROC SQL in SAS (15 + Examples)10 min read

Like Tweet Pin it Share Share Email

PROC SQL is a SAS Procedure that combines the functionality of DATA and PROC steps into a single step.

PROC SQL can perform sorting of data, creating summaries of data, subsetting, joining (merge), concatenation of datasets, create new or calculated variables, printing the results or create a new table or view all in a single step.

PROC SQL in SAS can be used to retrieve, update, and report on information from SAS data sets or other database products.

The SELECT Statement and its Clauses

The purpose of a SELECT the statement is to retrieve data from underlying tables. Although it supports multiple clauses, the SELECT statement has only one clause which is required to be specified – FROM clause. All the remaining clauses are optional and can be used when needed.

To prevent syntax errors from occurring when using the SELECT statement, the clauses must be specified in the correct order as below.

PROC SQL options;
 SELECT column(s)
  FROM table-name | view-name
   WHERE expression
    GROUP BY column(s)
     HAVING expression
      ORDER BY column(s);


proc sql;
 select Origin, Make, Model, DriveTrain, MPG_CITY, count(Model) as ModelCount 
  from sashelp.cars where Drivetrain='All' group by Make having MPG_CITY > 20 
  order by origin, make;
Proc SQL in SAS

Ordering outputs by column

The PROC SQL has the ability to impose order in a table using the ORDER BY clause. It orders the query results according to the values in one or more selected columns specified after the FROM clause.

Rows of data can be ordered in ascending (default) or descending (DESC) for each column specified.

proc sql;
 select * from sashelp.class order by sex, 3, weight desc;
A comprehensive guide to PROC SQL in SAS (15 + Examples) 1

In the above example, the output is first arranged by sex, then within sex by height(3 specifies the column ordinal position) and within height in descending order by weight.

Grouping data using summary functions

The GROUP BY clause is used to aggregate and order groups of data using a designated column of the same value.

proc sql;
 select min(low) as Lowest, max(high) as highest, stock from sashelp.stocks 
  group by stock;
A comprehensive guide to PROC SQL in SAS (15 + Examples) 3

Grouping data and sorting

Without the ORDER BY clause, the SQL procedure automatically sorts the results from a GROUP BY query in the same order as specified in the GROUP BY clause.

When both the ORDER BY and GROUP BY are specified for the same column, no additional processing occurs to satisfy the request.

The Group BY clause first sorts the result in the grouping column and then aggregate the rows of the query by the same grouping column.

If the columns specified in the ORDER BY and GROUP BY are not the same, additional processing may be required which involves remerging summary statistics with the original data.

proc sql;
 select min(low) as Lowest, max(high) as highest, stock from sashelp.stocks 
  group by stock order by lowest;
A comprehensive guide to PROC SQL in SAS (15 + Examples) 5

Subsetting groups with the having clause

Using the combination of GROUP BY and HAVING clause, aggregated data can be filtered out from one group at a time instead of filtering one row at a time.

Suppose that you would like to find out only those product grouping that has an average price greater than 100 from the price data table, then using a HAVING clause restricts the number of groups selected and is always performed after the GROUP BY clause.

proc sql;
 select productname, avg(price) from sashelp.pricedata group by productname 
  having avg > 100;
A comprehensive guide to PROC SQL in SAS (15 + Examples) 7

Finding duplicate and Unique values

To select only one of several identical values in a table, SAS supports and processes the DISTINCT keyword and the UNIQUE constraint.

proc sql;
 select distinct make from sashelp.cars;

To find the count of unique values, you can use the distinct and count function together as below.

proc sql;
 select count(make), count(distinct make) from sashelp.cars;

The main difference between UNIQUE and DISTINCT in PROC SQL is that Unique ensures that all the values in a column are different while Distinct removes all the duplicate records when retrieving the records from a table. However, Unique can also be used to retrieve unique records from a table.

proc sql;
 create table subjects(subjid int unique , subject_name varchar(255), 
  subject_age int);

Since the Unique constraint is defined for Subjid, we cannot insert the same value for subjid.

proc sql;
 insert into subjects values (8, "Subject1", 37);

proc sql;
 insert into subjects values (8, "Subject1", 37);
ERROR: Add/Update failed for data set WORK.SUBJECTS because data value(s) do not comply with integrity constraint UN0001.

Counting the number of missing Values

The NMISS function of PROC SQL in SAS is used to find the count of missing values in a column. The advantage of this function is, it works both for the character as well as numeric data types.

select count(*) as count, nmiss(agechddiag) as MissingCount from sashelp.heart;

Character Operations and Functions

Character operations are used with character data. You can use most of the data step character functions within PROC SQL also.

proc sql outobs=5;
 select Make || "," || Model as MakeAndModel from sashelp.cars;
A comprehensive guide to PROC SQL in SAS (15 + Examples) 9

Finding the length of string using Proc SQL

The LENGTH function is used to find the length of the character string in Proc SQL. It returns the number equal to the number of characters in the argument.

proc sql outobs=5;
 select model, length(Model) as Modellength from sashelp.cars;
A comprehensive guide to PROC SQL in SAS (15 + Examples) 11

Finding the occurrence of a pattern using Index

To find the occurrence of a pattern, the INDEX function is used. The character string passed in the argument is searched from left to right for the first occurrence of the specified value.

If the string is found, the column position of the first character is returned otherwise a value of 0 is returned.

In the below example, we can find cars with a manual transmission by writing the following code. Also, note that the search is case sensitive.

proc sql outobs=5;
 select make, model from sashelp.cars where index(model, 'manual') > 0;
A comprehensive guide to PROC SQL in SAS (15 + Examples) 13

Extracting information from a string

Sometimes we may need to extract specific information from a string. We can use the SUBSTR function with a character column by specifying a starting position and the number of characters to extract.

In the following example, we can extract the first and last names from the name column by writing the following code.

proc sql outobs=5;
 select name,team, 
 substr(name,index(name,',')+1,length(name)) as FirstName,
 substr(name,1,index(name,',')-1) as LastName 
  from sashelp.baseball;
A comprehensive guide to PROC SQL in SAS (15 + Examples) 15

For extracting the first name, the INDEX function is used with SUBSTR to find the position of “,” in the Name column. The value returned is then passed as the second argument to the SUBSTR function specifying the starting position of the character to be extracted.

Similarly, for extracting las name, the INDEX function is used in the last argument of SUBSTR function which will return the ending position of “,”.

Phonetic matching – Sounds like operator (=*)

With sounds like operator, we can find names that sound alike or have spelling variations. This is performed using the soundex algorithm.

proc sql;
 select name,team,substrn(name,index(name,',')+1,length(name)) as 
 FirstName  from sashelp.baseball
  where calculated firstname =* ' Andre'
A comprehensive guide to PROC SQL in SAS (15 + Examples) 17

Finding the first non-missing value

The COALESCE function is used to find the first non-missing value in a column or list. It scans the argument from left to right and returns the first non-missing or null value. If all the values are missing, the result is also missing.

data custDetails;
 input custid Mobile Landline;
001 9800231356 9800231356
002 8800230356 .
003 . 9733231250
proc sql;
 select custid, coalesce(mobile, landline) format=best32. as Phone from 
A comprehensive guide to PROC SQL in SAS (15 + Examples) 19

In the above example, we are instructing SAS to show the Landline Number of the customer, if the Mobile number column is missing otherwise show the Mobile number.

Using the COALESCE function you can process these types of data with a single function call instead of writing multiple IF-THEN/ELSE statements.

Producing a ROW number

A unique feature of PROC SQL in SAS is the ability to obtain the row count using the MONOTONIC() function which is similar to the _n_ automatic variable in a data step.

proc sql outobs=5;
 select monotonic() as Row_Numbers, * from sashelp.class;
A comprehensive guide to PROC SQL in SAS (15 + Examples) 21

A row number can also be produced with the documented SQL procedure NUMBER. Unlike the MONOTONIC(), the NUMBER option does not create a new column in the table.

proc sql outobs=5 number;
 select * from sashelp.class;
A comprehensive guide to PROC SQL in SAS (15 + Examples) 23

Replicate a data set without data

Using PROC SQL in SAS, it is very easy to create a new empty data set while keeping all the structure of the original data set.

proc sql;
 create table class1 like sashelp.class;

Comments (0)

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.