Data manipulation and management are vast fields in which SAS stands out as an incredibly versatile and potent tool. Where Statements in SAS are the key to its functionality, allowing users to refine their data operations with surgical precision.
Do you fully utilize its potential, however? Our post “Advanced Techniques: Mastering the Where Statement in SAS” will ensure just that. Learn how to make the most of this essential SAS statement, discover expert tricks, and elevate your expertise.
In this article, we will help you understand the nuances of this potent statement, helping you unlock the keys to enhancing data management.
Basic Data Subsetting using Where statement in SAS
You can use the WHERE statement in SAS to filter data when reading from a SAS dataset.
data class; set sashelp.class; where sex='F'; run;
You may use a WHERE or a subsetting IF statement in this example. There are advantages to using a WHERE statement in SAS instead of a subsetting IF statement.
You have a larger alternative of operators that can be used with a WHERE statement, and if the input data set is indexed, the WHERE statement in SAS is likely more efficient.
You may also use a WHERE statement in a SAS procedure to subset the data being processed.
Note: IF statements are not allowed inside SAS procedures.
SAS where statement with multiple conditions
In SAS, the
WHERE statement can accommodate multiple conditions by using logical operators such as
Let’s create a dataset named ‘Employee_data’ in SAS with ‘Name’ and ‘Salary’ as the variables.
DATA Employee_data; INPUT Name $ Salary; DATALINES; John 50000 Jane 80000 Alex 75000 Emma 27000 Ryan 71000 Mark 85000 Emily 24000 ; RUN;
Let’s consider an example where we want to extract data for employees earning more than $50,000 but less than or equal to $80,000 from the ‘Employee_data’ dataset.
DATA selected_data; SET Employee_data; WHERE Salary > 50000 AND Salary <= 80000; RUN;
In this case, the
WHERE statement is combined with the
AND operator. Only those records from ‘Employee_data’ where ‘Salary’ is greater than $50,000 and less than or equal to $80,000 will be selected and saved to the ‘selected_data’ dataset.
OR operator can be used to select records based on any of the conditions being met. For example, if we want to select employees with salaries either less than $30,000 or more than $70,000, the
WHERE statement would be as follows:
DATA selected_data; SET Employee_data; WHERE Salary < 30000 OR Salary > 70000; RUN;
NOT operator can be used to exclude records that meet certain conditions. If we want to select all employees who are not high earners (with salaries greater than $70,000), we can use the
WHERE statement as follows:
DATA selected_data; SET Employee_data; WHERE NOT (Salary > 70000); RUN;
Using Operators with WHERE statement in SAS
Below is the list of operators that you can use with the WHERE statement in SAS.
IS MISSING and IS NULL
The IS MISSING and IS NULL operators can handle character or numeric variables in WHERE, ON, and HAVING expressions.
It results in true if the expression results are missing and false if it is not missing.
They also work with the NOT operator.
data class; set sashelp.class; if age le 13 then call missing(age); run; *** use missing; proc print data=class; where age is missing; run; *** using IS NULL; proc print data=class; where age is null; run; *** using NOT MISSING; proc print data=class; where age is not missing; run; *** using NOT NULL; proc print data=class; where age is not NULL; run;
The BETWEEN-AND operator in SAS is a useful operator for filtering data within a specified range. It can be applied to both numeric and date variables, making it highly versatile for various data analysis tasks. Typically, the BETWEEN-AND operator is used within a WHERE statement to filter data that falls between two values.
Consider a simple example where we have a dataset called ‘sashelp.class’. Suppose we want to filter out names with weight between 60 and 90. The SAS code with
BETWEEN-AND would look like this:
data class; set sashelp.class; where weight between 60 and 90; run;
The NOT operator can be used with the WHERE statement in many other ways:
Let’s say you want to display the following values. The following WHERE statements are self-explanatory:
where not (score in (34, 44, 84))
where not (Score between 50 and 90)
where NOT(Section EQ “A”)
Multiple WHERE statements cannot be used in a DATA step as IF statements. If SAS encounters a second WHERE statement, it will replace the first.
The WHERE AND also known as WHERE ALSO will let you use multiple where statements in SAS to add more restrictions.
data class; set sashelp.class; where sex='M'; where also age > 12 ; run;
IN and NOT IN Operator in SAS
The SAS IN operator is a comparison operator that searches for character and numeric values equal to one of the values from a given list of values.
The SAS where in list of values must be in parentheses, with each character value in quotation marks and separated by either a comma or blank.
For example, suppose you want to filter for SUVs, sedans or Wagon cars. You could specify the values as:
where strip(upcase(type)) in ('SUV','SEDAN','WAGON')
Note the use of the
Upcase function in the type variable. This ensures that any leading or trailing spaces are removed, and the values will be converted to Uppercase before comparison.
Also, read the article on The Ultimate Guide To SAS Character Functions
In addition to this, you can use the logical operator
NOT to exclude values that are on the list.
where strip(upcase(type)) not in ('TRUCK', 'HYBRID', 'SPORTS')
You can also use a shorthand notation to specify a range of numbers to search. The range is specified by using the syntax M: N as a value in the list to search, where M is the lower bound and N is the upper bound.
You can replace the below statement in the form of WHERE IN shorthand notation.
where age in(13,14,15,16) where age in(13:16)
CONTAINS (?) Operator in SAS
The CONTAINS operator returns a true value if the character string on the left side of the operator contains the string on the right side.
data class; set sashelp.class; where name ? 'Al'; run;
NOT CONTAINS (^?)
^= (or NOT CONTAINS) returns a true value if the character string on the left side of the operator does not contain the string on the right side.
data class; set sashelp.class; where name ^? 'J'; run;
The “contains” operator (?) and the “not contains” operator (^?) match a substring that appears anywhere in the target character variable.
LIKE Operator in SAS
LIKE operator is frequently used for pattern matching, that is, evaluating whether a variable value equals, begins with a specified character, or sounds like a specified value or pattern.
The LIKE expression uses two wildcard operators. When using the LIKE operator, the underscore(_) wildcard takes the place of a single character, whereas the % sign might be substituted for a string of any length (including a null string).
A Simple use of the Like operator
data womens_product; set sashelp.shoes; where product like "Women's Dress"; where also region like 'Pacific'; run;
Double quotes (“) are used here to include the apostrophe in “Women’s Dress”. Single quotes can be used when the text string does not contain an apostrophe or other single quotes.
Selecting Values that Begin with a Character String
The percentage sign (%) wildcard searches for character values that begin with, end with or even contain certain character strings. Below are some of the examples using wildcards.
where upcase(name) like "A%"
The above expression returns all names that begin with “a” and are followed by any characters of any lengths
where name like "_a%"
_ indicates any character at start followed by “a” and ending with any char of any length.
where name like "__n%"
At the beginning of a search, two underscores indicate that any two characters followed by “n” should be filtered.
where name like "__n_"
The above expression would return character values with two characters at the beginning followed by n and any one character at the end.
Sounds Like Operator (=*)
The SOUNDS-LIKE operator is based on the SOUNDEX algorithm for identifying words that sound alike. The SOUNDEX algorithm is English-biased, so it’s not useful for languages other than English.
data employees; set datasets.employees; where lname=*"PATTERSON"; run;
You can download the dataset employees.sas7bdat used in the program.
Also, read 4 Functions for fuzzy matching in SAS.
Using a WHERE statement and a WHERE=option in a DATA Step.
Using the WHERE statement in SAS and a WHERE= dataset option in a DATA step is not a good practice. SAS ignores the WHERE statement when both are used in the same dataset.
Additionally, a WARNING appears in the SAS log indicating that the WHERE statement cannot be applied.
What is the difference between if and WHERE statements in SAS?
In SAS, both the
WHERE statements are used to filter data based on certain conditions. However, they operate at different points in the data processing stage and thus have distinct functionalities and uses.
- Execution Order: The
WHEREstatement operates at the time of data input before the Data Step begins. It reads only those observations from the dataset that meet the specified condition. In contrast, the
IFstatement operates after the Data Step has started, meaning it reads all observations into the program data vector before applying the condition. This makes the
WHEREstatement more efficient when working with large datasets as it reduces I/O processing time.
- Subsetting: While both statements can be used for subsetting datasets, the
WHEREstatement is generally used in both Data Steps and PROC Steps. The
IFstatement, on the other hand, can only be used in Data Steps.
- Conditions: The
WHEREstatement can only handle conditions involving existing variables in the dataset. The
IFstatement is more flexible and can work with conditions involving newly created variables within the same Data Step.
WHEREuses SQL-like syntax and supports operators like
IS MISSING, etc.
IFuses a different syntax and does not support these operators, but it can accommodate SAS functions within the condition.
For example, if you have a dataset named ‘Employee_data’ with ‘Name’ and ‘Salary’ as the variables, and you want to select employees who earn more than $50,000, you could use a
WHERE statement or an
IF statement as follows:
Using a WHERE Statement:
DATA selected_data; SET Employee_data; WHERE Salary > 50000; RUN;
Using an IF Statement:
DATA selected_data; SET Employee_data; IF Salary > 50000; RUN;
Frequently Asked Questions
What is the WHERE statement in SAS?
The u003ccodeu003eWHEREu003c/codeu003e statement in SAS is a data step statement used to select specific observations from a dataset based on certain conditions.
Can I use multiple conditions in a SAS WHERE statement?
Yes, the u003ccodeu003eWHEREu003c/codeu003e statement in SAS can handle multiple conditions using logical operators like u003ccodeu003eANDu003c/codeu003e, u003ccodeu003eORu003c/codeu003e, and u003ccodeu003eNOTu003c/codeu003e.
How can I select a range of values using a WHERE statement in SAS?
The u003ccodeu003eBETWEEN-ANDu003c/codeu003e operator can be used in a u003ccodeu003eWHEREu003c/codeu003e statement in SAS to select a range of values. For example, u003ccodeu003eWHERE Salary BETWEEN 40000 AND 60000;u003c/codeu003eTo clarify, the query will retrieve all records where the ‘Salary’ field is within the range of $40,000 to $60,000, inclusive.
What does the ^= operator do in a WHERE statement?
The u003ccodeu003e^=u003c/codeu003e operator is the u0022not containsu0022 operator in SAS. It is used to select observations where a certain character string is not present in a variable. For example, u003ccodeu003eWHERE Position ^= ‘Manager’;u003c/codeu003e would select all records where the ‘Position’ does not contain the word ‘Manager’.
Can I use the WHERE statement with date values in SAS?
Yes, you can use the u003ccodeu003eWHEREu003c/codeu003e statement with date values in SAS. For example, u003ccodeu003eWHERE HireDate BETWEEN ’01JAN2015’d AND ’31DEC2020’d;u003c/codeu003e would select all records where the ‘HireDate’ falls within the specified date range.
Can you use a WHERE statement in proc freq?
Yes, you can use a u003ccodeu003eWHEREu003c/codeu003e statement in u003ccodeu003ePROC FREQu003c/codeu003e in SAS to limit the analysis to a subset of data that meets the specified condition(s). The u003ccodeu003eWHEREu003c/codeu003e statement is used before the u003ccodeu003eTABLESu003c/codeu003e statement in the u003ccodeu003ePROC FREQu003c/codeu003e call.
In conclusion, mastering the Where Statement In SAS is pivotal for anyone seeking to harness the full potential of SAS programming for efficient data management. The ability to accurately apply and interpret
WHERE statements, whether used independently or in combination with logical operators, is an indispensable skill in today’s data-driven world. This article has provided detailed insights into the application of the
WHERE statement with multiple conditions, enabling you to implement complex logical operations with ease and precision.
Furthermore, we explored the difference between
WHERE statements, highlighting the unique strengths of the latter. The utility of the
WHERE statement in various PROC procedures like
PROC FREQ was also discussed, showing the breadth of its application. We trust that these advanced techniques have enriched your understanding and will empower your journey in SAS programming.