Use Where Statement In SAS To Your Advantage
The WHERE statement selects observations in SAS data sets only, whereas the subsetting IF statement selects observations from an existing SAS data set or from observations that are created with an INPUT statement.
You can use a WHERE statement in SAS to subset the data if you are reading data from a SAS dataset.
data class; set sashelp.class; where sex='F'; run;
You may use a WHERE or a subsetting IF statement in this example. There are advantages to using a WHERE statement in SAS instead of a subsetting IF statement.
You have a larger alternative of operators that can be used with a WHERE statement, and if the input data set is indexed, the WHERE statement in SAS is likely more efficient.
You may also use a WHERE statement in a SAS procedure to subset the data being processed.
Note: IF statements are not allowed inside SAS procedures.
Using Operators with WHERE statement in SAS
Below is the list of operators that you can use with the WHERE statement in SAS.
IS MISSING and IS NULL
The IS MISSING and IS NULL operators are used with WHERE, ON, and HAVING expressions and can handle character or numeric variables.
It results in true if the expression results are missing and false if it is not missing.
They also work with the NOT operator.
data class; set sashelp.class; if age le 13 then call missing(age); run; *** use missing; proc print data=class; where age is missing; run; *** using IS NULL; proc print data=class; where age is null; run; *** using NOT MISSING; proc print data=class; where age is not missing; run; *** using NOT NULL; proc print data=class; where age is not NULL; run;
The BETWEEN AND expression matches all the values greater than or equal to the first value and less than or equal to the second value. This works with characters in addition to numeric variables.
data class; set sashelp.class; where weight between 60 and 90; run;
The NOT operator can be used with the WHERE statement in many other ways:
Let’s say you want to have like below and the where statements below are self-explanatory,
where not (score in (34, 44, 84))
where not (Score between 50 and 90)
where NOT(Section EQ “A”)
Multiple WHERE statements in SAS cannot be used in a DATA step like the IF statements. When SAS encounters a second WHERE statement in a DATA step, the first is replaced by the second.
WHERE AND also known as
WHERE ALSO will let you use multiple where statements in SAS to add more restrictions.
data class; set sashelp.class; where sex='M'; where also age > 12 ; run;
IN and NOT IN
The IN operator is a comparison operator that searches for character and numeric values equal to one of the values from a given list of values.
The list of values must be in parentheses, with each character value in quotation marks and separated by either a comma or blank.
For example, suppose you want to filter for SUVs, sedans or Wagon cars. You could specify the values as:
where strip(upcase(type)) in ('SUV','SEDAN','WAGON')
Note the use of strip and Upcase function in the type variable. This ensures that any leading or trailing spaces are removed, and the values will be converted to Uppercase before comparison.
Also, read the article on The Ultimate Guide To SAS Character Functions
In addition to this, you can use the logical operator
NOT to exclude values that are on the list.
where strip(upcase(type)) not in ('TRUCK', 'HYBRID', 'SPORTS')
You can also use a shorthand notation to specify a range of numbers to search. The range is specified by using the syntax M: N as a value in the list to search, where M is the lower bound and N is the upper bound.
You can replace the below statement in the form of WHERE IN shorthand notation.
where age in(13,14,15,16) where age in(13:16)
You can also read our article on How To Specify List Of Variables In SAS and How To Use The IN Operator In SAS Macro?
The CONTAINS expression matches any character value containing the given string.
data class; set sashelp.class; where name ? 'Al'; run;
NOT CONTAINS (^?)
data class; set sashelp.class; where name ^? 'J'; run;
The “contains” operator (?) and the “not contains” operator (^?) match a substring that appears anywhere in the target character variable.
LIKE operator is frequently used for pattern matching, that is, evaluating whether a variable value equals, begins with a specified character, or sounds like a specified value or pattern.
The LIKE expression uses two wildcard operators. When using the LIKE operator, the underscore(_) wildcard takes the place of a single character, whereas the % sign might be substituted for a string of any length (including a null string).
Simple use of Like operator
data womens_product; set sashelp.shoes; where product like "Women's Dress"; where also region like 'Pacific'; run;
Double quotes (“) are used here to include the apostrophe in “Women’s Dress”. Single quotes can be used when the text string does not contain an apostrophe or other single quotes.
Selecting Values that Begins with a Character String
The percentage sign (%) wildcard searches for character values that begin with, end with or even contain certain character strings. Below are some of the examples using wildcards.
where upcase(name) like "A%"
The above expression returns all names that begin with “a” and are followed by any characters of any lengths
where name like "_a%"
_ indicates that any character can contain at the beginning of the string followed by “a” as the second character and ending with any character of any length.
where name like "__n%"
Two underscores at the beginning indicate that the search should filter for any two characters at the beginning followed by “n”.
where name like "__n_"
The above expression would return character values with two characters at the beginning followed by n and any one character at the end.
Sounds Like Operator (=*)
The SOUNDS-LIKE operator is based on the SOUNDEX algorithm for identifying words that sound alike. The SOUNDEX algorithm is English-biased, so it’s not useful for languages other than English.
data employees; set datasets.employees; where lname=*"PATTERSON"; run;
You can download the dataset employees.sas7bdat used in the program.
Also, read 4 Functions for fuzzy matching in SAS.
Using a WHERE statement and a WHERE=option in a DATA Step.
Using the WHERE statement in SAS and a WHERE= dataset option in a DATA step is not a good practice. SAS ignores the WHERE statement when both are used in the same dataset.
Additionally, a WARNING appears in the SAS log indicating that the WHERE statement cannot be applied.