This article explains the data summarization by rows, columns and how you can use Predicates in PROC SQL to perform comparisons between two conditions or expressions.
The SQL procedure is a wonderful tool for summarizing (or aggregating) data. It provides several useful summary (or aggregate) functions to help perform calculations, descriptive statistics, and other aggregating operations in a SELECT statement or HAVING clause.
The below example uses the COUNT function with the (*) argument to produce a total number of rows, whether data is missing or not.
The asterisk (*) is specified as the argument to the COUNT function to count all rows in the PURCHASES table.
proc sql; select count(*) as row_count from sashelp.shoes; quit;
Unlike the COUNT(*) function syntax that counts all rows, whether data is missing or not, the next example uses the COUNT function with the (column-name) argument to produce a total number of non-missing rows based on the products column.
proc sql; select count(product) as non_missing_row_count from sashelp.shoes; quit;
Summarizing Data Down Rows
The SQL procedure can be used to produce a single aggregate value by summarizing data down rows (or observations).
Suppose you want to know the average sales for all Sport Shoes in the SASHELP.SHOES table containing a variety of products.
The following query computes the average product cost and produces a single aggregate value using the AVG function.
proc sql; select avg(sales) as average_sales format=dollar10.2 from sashelp.shoes where upcase(product) in ("SPORT SHOE"); quit;
Summarizing data across columns
When a computation is needed on two or more columns in a row, the SQL procedure can be used to summarize data across columns.
Suppose you want to know the remaining inventory after-sales for each product.
proc sql outobs=5; select product,sales,inventory, (inventory-sales) as remaining_inventory format=dollar10. from sashelp.shoes; quit;
Predicates are used in PROC SQL to perform direct comparisons between two conditions or expressions. Six predicates will be looked at:
- IS NULL
- IS MISSING
Selecting a range of values
The IN predicate creates an OR condition between each value and returns a Boolean value of True if a column value is equal to one or more of the values in the expression list.
proc sql outobs=5; select * from sashelp.shoes where upcase(product) = 'SANDAL'; quit;
Testing for null or missing values
The IS NULL predicate is used to select one or more rows by evaluating whether a column amount is missing or null.
The IS MISSING predicate performs identically to the IS NULL predicate by selecting one or more rows containing a missing value (null).
data shoes; set sashelp.shoes; if _n_ in (4, 7, 8, 9) then call missing(subsidiary); run; proc sql; select * from shoes where subsidiary is null; quit;
The only difference is that specifying IS NULL is the ANSI standard way of expressing the predicate and IS MISSING is commonly used in SAS.
For more information on missing values in SAS, see our guide on Working With Missing Values In SAS.
Finding patterns in a string (pattern matching % and _)
Constructing specific search patterns in string expressions can be achieved with the LIKE Predicate.
The % (percent sign) acts as a wildcard character representing any number of characters, including any combination of upper or lower case characters.
To find patterns in the product name (PRODUCT) containing the characters ”Men’ in the first 3 positions followed by any number of characters is specified with the following WHERE clause.
proc sql outobs=5; select * from sashelp.shoes where product like 'Men%'; quit;
The wildcard character “%” preceding and following the search word to select all products whose name contains the word “DRESS” in its name.
proc sql outobs=5; select * from sashelp.shoes where upcase(product) like '%DRESS%'; quit;
The resulting output contains product types such as “Men’s Dress” and any other products containing the word “Dress”.
LIKE predicate is used to check a column for the existence of trailing blanks.
% (Percent ) – The wildcard character % followed by a blank space is specified as the search argument.
_ (Underscore) – The wildcard _ is used to search for a specific number of characters and using the LIKE predicate the underscore (_) provides a way to pattern match character-by-character.
Thus, a single underscore (_) in a specific position acts as a wildcard placement holder for that position only.
Two consecutive underscores (__) act as a wildcard placement holder for those two positions.
Three consecutive underscores act as a wildcard placement holder for those three positions. And so forth.
In the next example, the first position used to search product type contains the character “S”, and the next six positions (represented with six underscores) act as a placeholder for any value.
proc sql outobs=5; select * from sashelp.shoes where upcase(product) like 'S______'; quit;
The next example illustrates a pattern search of the product name (PRODUCT) where the first position is represented as a wildcard; the second position contains the lowercase character “o”, followed by any combination of uppercase or lowercase characters.
proc sql outobs=5; select * from sashelp.shoes where product like '_o%'; quit;
Testing for the existence of a value
The EXISTS predicate is used to test for the existence of a set of values.
proc sql; select custnum, custname, custcity from customers c where exists (select * from purchases p where c.custnum=p.custnum); quit;
The EXISTS condition is an operator whose right operand is a subquery. The result of an EXISTS condition is true if the subquery resolves to at least one row.
The result of a NOT EXISTS condition is true if the subquery evaluates to zero rows.
Download the entire code from this post here.
So, this was our side on some of the Practical use of PROC SQl. We really hope that you must have found it useful.
Moreover, if you have any other suggestions, mention them in the comment section below. We would really take those lists in our further blog post.
Thanks for reading!
If you liked this article, you might also want to read A Comprehensive Guide To PROC SQL In SAS (15 + Examples) and Everything You Need To Know About SQL SET Operators as well.
Do you any tips to add Let us know in the comments.