PROC SQL is a powerful tool, but it can be challenging to debug queries. This blog post will discuss some Proc SQL options to debug queries.
1. INOBS= and OUTOBS= Options
INOBS – The INOBS= option limits the number of rows that PROC SQL processes. This option can be handy when developing a new query and testing on a limited number of records.
proc sql inobs=5;
select * from sashelp.heart;
The SAS log displays a warning when you use the INOBS option.
WARNING: Only 5 records were read from SASHELP.HEART due to INOBS= option.
OUTOBS – The OUTOBS= option restricts the number of rows PROC SQL displays or writes to a table. For example, if you specify OUTOBS=10 and insert values into a table using a query, then PROC SQL inserts 10 rows into the resulting table. OUTOBS= is similar to the SAS data set option OBS=.
proc sql outobs=6;
select * from sashelp.heart;
The SAS log displays a warning when you use the OUTOBS option.
WARNING: Statement terminated early due to OUTOBS=6 option.
INOBS and OUTOBS might not appear to differ in a simple query. However, sometimes it is essential to choose the correct option. For example, the average of a column with INOBS=10 returns only 10 values.
2. Limiting Iterations with the LOOPS= Option
In PROC SQL, the LOOPS= option specifies how many iterations will be performed inside the inner loop. The limit can be set to prevent excessive computer usage. For example, joining three large tables without meeting the join-matching conditions could result in a sizeable internal table that would be inefficient to process.
proc sql loops=5;
select p.prodnum, p.prodname, m.manunum, m.manuname, m.manucity from
sql2.products p, sql2.manufacturers m where p.manunum=m.manunum;
quit;
The SAS log displays a warning when you use the Loops= option.
WARNING: PROC SQL statement interrupted by LOOPS=5 option.
3. Checking Syntax with the NOEXEC Option and the VALIDATE Statement
NOEXEC – By using the NOEXEC option or the VALIDATE statement, you can verify the syntax of a PROC SQL step without executing it.
The NOEXEC option can only be used once in a PROC SQL statement. This option will ensure that the syntax of all queries in that PROC SQL step is checked for accuracy without executing them.
proc sql noexec;
select p.prodnum, p.prodname, m.manunum, m.manuname, m.manucity from
sql2.products p, sql2.manufacturers m where p.manunum=m.manunum;
quit;
The SAS log displays the following message if the query is valid and all referenced columns and tables exist.
NOTE: Statement not executed due to NOEXEC option.
SAS displays the standard error messages in the log if there are any errors in the query.
When you invoke the NOEXEC option, SAS checks the syntax of all queries in that PROC SQL step for accuracy but does not execute them.
VALIDATE – SELECT statements are checked for accuracy by the VALIDATE statement before being executed. A message is written to the SAS log if the syntax is correct. An error message is displayed if the syntax is incorrect.
proc sql;
validate
select prodnum, p.prodname, m.manunum, m.manuname, m.manucity from
sql2.products p, sql2.manufacturers m where p.manunum=m.manunum;
quit;
The SAS log displays the following message if the query is valid.
NOTE: PROC SQL statement has valid syntax.
SAS displays standard error messages in the log if there are errors in the query.
VALIDATE affects only the SELECT statement immediately following it, whereas the NOEXEC option affects the entire SELECT statement. If you are working with a PROC SQL query containing multiple SELECT statements, specify the VALIDATE keyword before each SELECT statement you want to check.
4. SELECT * with the FEEDBACK Option
The FEEDBACK option expands a SELECT * statement into all the selected variables, including any macro variables. This can be very useful to show the variable order when joining multiple tables. The SAS log displays the results of the expanded list of columns you have selected.
proc sql feedback;
select * from sashelp.class;
Expanded SELECT * Statement
5. Reset option
The RESET statement adds, drops, or changes one or more PROC SQL options without restarting the procedure. This example shows the Print option is turned on and then off.
proc sql noprint outobs=5;
select * from sashelp.class;
reset print;
select * from sashelp.class;
quit;
6. Timing PROC SQL with the STIMER Option
There are multiple ways to accomplish certain operations. For example, a join often operates like a subquery. The query that runs fastest is generally more efficient based on factors such as readability and maintenance. A SAS system option, STIMER shows you an entire procedure’s cumulative time.
PROC SQL STIMER shows how fast individual statements run in a PROC SQL step. You can optimize your query this way.
The example below compares the execution times of two queries. Both queries list the product names and manufacturer numbers from the products table with a manufacturer number of 500. The first query does this with a join; the second with a subquery. Comparing the run times of two queries shows the STIMER results from the SAS log.
proc sql stimer;
select prodnum, p.prodname, m.manunum, m.manuname, m.manucity from
sql2.products p, sql2.manufacturers m where p.manunum=m.manunum and m.manunum=500;
Select * from sql2.products where manunum in
(Select manunum from sql2.manufacturers where manunum=500);
Quit;
Log results:
A join query takes 0.01 seconds, while a subquery takes 0.05 seconds. A join runs faster than an equivalent subquery, regardless of many factors.
7. PROC SQL _METHOD Option
PROC SQL includes a powerful method option called METHOD. Since its implementation, many SAS ® SQL users have expressed very favourable comments for the value-added information it provides on the SAS Log. Furthermore, the METHOD option is well worth exploring due to the benefits of understanding the processes involved during specific PROC SQL operations, query evaluation, algorithm selection and use by the optimizer, or testing and debugging operations.
MSGLEVEL=I
By specifying MSGLEVEL= in an Options statement, users can control how much information the SAS System logs. MSGLEVEL= supports two possible values: N (default) for standard notes, warnings, and error messages; and I for details about sort, merge, and index processing.
Join algorithms
- Nested Loop – The optimizer selects this algorithm when Sort-Merge, Index, and Hash are eliminated.
- Sort-Merge – Used when the Index and Hash algorithms have been eliminated from consideration by the optimizer.
- Index – The optimiser selects a user-defined index when performance will be enhanced.
- Hash – Used by the optimizer when the smaller of the two tables can fit into the available memory
_METHOD
The _METHOD option returns several codes to the log. You can use these options to understand the choices made by the SQL optimizer, the index used, which joins to use and any sorting performed.
The SQL optimiser performs query operations using one of four join algorithms. The four algorithms include:
Code | Description |
---|---|
SQXCRTA | Create table as Select |
SQZSLCT | Select statement or clause |
SQXISL | Step loop join (Cartesian). |
SQXIM | Merge join operation. |
SQXINDX | Index join operation |
SQXIHSH | Hash join operation. |
SQXSORT | Sort operation. |
SQXSRC | Source rows from table. |
SQXFIL | Rows filtration. |
SQXSUMG | Summary stats (aggregates) with GROUP BY clause. |
SQXSUMN | Summary stats with no GROUP BY clause. |
Note: You need to specify the MSGLEVEL=I options to activate this setting.
These options can be used to debug any issues, for example, detect if the correct join strategy is being used, are my indexes being used in the join.
As part of the two-way equijoin, the _METHOD option is specified to show the processing hierarchy.
options msglevel=i;
proc sql _method;
select p.prodnum, p.prodname, m.manunum, m.manuname, m.manucity from
sql2.products p, sql2.manufacturers m where p.manunum=m.manunum and
p.manunum=500;
quit;
The SQL optimizer selected a hash join algorithm based on the join query.
8. METHOD AND _TREE OPTIONS WITH MSGLEVEL=I
The _TREE option provides a graphical representation of how the query optimizer executed the query. This option shows the query plan based on several factors (estimated CPU and I/O). There may be more efficient plans. Nevertheless, TREE can help debug.
options msglevel=i;
proc sql _method _tree;
select p.prodnum, p.prodname, m.manunum, m.manuname, m.manucity from
sql2.products p, sql2.manufacturers m where p.manunum=m.manunum and
p.manunum=500;
quit;