Where Statement in SAS

Mastering WHERE Statement in SAS

  • Post author:
  • Post category:Base SAS
  • Post comments:0 Comments
  • Reading time:24 mins read

Data manipulation and management are vast fields in which SAS stands out as an incredibly versatile and potent tool. Where Statements in SAS are the key to its functionality, allowing users to refine their data operations with surgical precision.

Do you fully utilize its potential, however? Our post “Advanced Techniques: Mastering the Where Statement in SAS” will ensure just that. Learn how to make the most of this essential SAS statement, discover expert tricks, and elevate your expertise.

In this article, we will help you understand the nuances of this potent statement, helping you unlock the keys to enhancing data management.

Basic Data Subsetting using Where statement in SAS

You can use the WHERE statement in SAS to filter data when reading from a SAS dataset.

Example:

data class; 
set sashelp.class; 
where sex='F'; 
run;
where statement

You may use a WHERE or a subsetting IF statement in this example. There are advantages to using a WHERE statement in SAS instead of a subsetting IF statement.

You have a larger alternative of operators that can be used with a WHERE statement, and if the input data set is indexed, the WHERE statement in SAS is likely more efficient.

You may also use a WHERE statement in a SAS procedure to subset the data being processed.

Note: IF statements are not allowed inside SAS procedures.

SAS where statement with multiple conditions

In SAS, the WHERE statement can accommodate multiple conditions by using logical operators such as AND, OR, and NOT.

Let’s create a dataset named ‘Employee_data’ in SAS with ‘Name’ and ‘Salary’ as the variables.

DATA Employee_data;
    INPUT Name $ Salary;
    DATALINES;
    John 50000
    Jane 80000
    Alex 75000
    Emma 27000
    Ryan 71000
    Mark 85000
    Emily 24000
    ;
RUN;

Let’s consider an example where we want to extract data for employees earning more than $50,000 but less than or equal to $80,000 from the ‘Employee_data’ dataset.

DATA selected_data;
   SET Employee_data;
   WHERE Salary > 50000 AND Salary <= 80000;
RUN;
Where statement with AND condition

In this case, the WHERE statement is combined with the AND operator. Only those records from ‘Employee_data’ where ‘Salary’ is greater than $50,000 and less than or equal to $80,000 will be selected and saved to the ‘selected_data’ dataset.

Similarly, the OR operator can be used to select records based on any of the conditions being met. For example, if we want to select employees with salaries either less than $30,000 or more than $70,000, the WHERE statement would be as follows:

DATA selected_data;
   SET Employee_data;
   WHERE Salary < 30000 OR Salary > 70000;
RUN;
Where statement with OR condition

The NOT operator can be used to exclude records that meet certain conditions. If we want to select all employees who are not high earners (with salaries greater than $70,000), we can use the WHERE statement as follows:

DATA selected_data;
   SET Employee_data;
   WHERE NOT (Salary > 70000);
RUN;
Where statement with NOT condition

Using Operators with WHERE statement in SAS

Below is the list of operators that you can use with the WHERE statement in SAS.

IS MISSING and IS NULL

The IS MISSING and IS NULL operators can handle character or numeric variables in WHERE, ON, and HAVING expressions.

It results in true if the expression results are missing and false if it is not missing.

They also work with the NOT operator.

data class;
    set sashelp.class;

    if age le 13 then
        call missing(age);
run;

*** use missing;

proc print data=class;
    where age is missing;
run;

*** using IS NULL;

proc print data=class;
    where age is null;
run;

*** using NOT MISSING;

proc print data=class;
    where age is not missing;
run;

*** using NOT NULL;

proc print data=class;
    where age is not NULL;
run;

BETWEEN AND

The BETWEEN-AND operator in SAS is a useful operator for filtering data within a specified range. It can be applied to both numeric and date variables, making it highly versatile for various data analysis tasks. Typically, the BETWEEN-AND operator is used within a WHERE statement to filter data that falls between two values.

Consider a simple example where we have a dataset called ‘sashelp.class’. Suppose we want to filter out names with weight between 60 and 90. The SAS code with BETWEEN-AND would look like this:

data class;
set sashelp.class;
where weight between 60 and 90;
run;

Between And

NOT Operator

The NOT operator can be used with the WHERE statement in many other ways:

Let’s say you want to display the following values. The following WHERE statements are self-explanatory:

  • where not (score in (34, 44, 84))
  • where not (Score between 50 and 90)
  • where NOT(Section EQ “A”)

SAME AND

Multiple WHERE statements cannot be used in a DATA step as IF statements. If SAS encounters a second WHERE statement, it will replace the first.

The WHERE AND also known as WHERE ALSO will let you use multiple where statements in SAS to add more restrictions.

data class;
set sashelp.class;
  where sex='M';
  where also age > 12 ;
run;

WHERE SAME AND

IN and NOT IN Operator in SAS

The SAS IN operator is a comparison operator that searches for character and numeric values equal to one of the values from a given list of values.

The SAS where in list of values must be in parentheses, with each character value in quotation marks and separated by either a comma or blank.

For example, suppose you want to filter for SUVs, sedans or Wagon cars. You could specify the values as:

where strip(upcase(type)) in ('SUV','SEDAN','WAGON')

Note the use of the strip and Upcase function in the type variable. This ensures that any leading or trailing spaces are removed, and the values will be converted to Uppercase before comparison.

Also, read the article on The Ultimate Guide To SAS Character Functions

In addition to this, you can use the logical operator NOT to exclude values that are on the list.

where strip(upcase(type)) not in ('TRUCK', 'HYBRID', 'SPORTS')

You can also use a shorthand notation to specify a range of numbers to search. The range is specified by using the syntax M: N as a value in the list to search, where M is the lower bound and N is the upper bound.

You can replace the below statement in the form of WHERE IN shorthand notation.

where age in(13,14,15,16)
where age in(13:16)

You can also read our article on How To Specify List Of Variables In SAS and How To Use The IN Operator In SAS Macro?

CONTAINS (?) Operator in SAS

The CONTAINS operator returns a true value if the character string on the left side of the operator contains the string on the right side.

data class;
set sashelp.class;
where name ? 'Al';
run;

Contains Operator

NOT CONTAINS (^?)

^= (or NOT CONTAINS) returns a true value if the character string on the left side of the operator does not contain the string on the right side.

data class;
set sashelp.class;
where name ^? 'J';
run;

Not Contains

The “contains” operator (?) and the “not contains” operator (^?) match a substring that appears anywhere in the target character variable.

LIKE Operator in SAS

LIKE operator is frequently used for pattern matching, that is, evaluating whether a variable value equals, begins with a specified character, or sounds like a specified value or pattern.

The LIKE expression uses two wildcard operators. When using the LIKE operator, the underscore(_) wildcard takes the place of a single character, whereas the % sign might be substituted for a string of any length (including a null string).

A Simple use of the Like operator

data womens_product;
 set sashelp.shoes;
  where product like "Women's Dress";
  where also region like 'Pacific';
run;
like operator

Double quotes (“) are used here to include the apostrophe in “Women’s Dress”. Single quotes can be used when the text string does not contain an apostrophe or other single quotes.

Selecting Values that Begin with a Character String

The percentage sign (%) wildcard searches for character values that begin with, end with or even contain certain character strings. Below are some of the examples using wildcards.

where upcase(name) like "A%"

The above expression returns all names that begin with “a” and are followed by any characters of any lengths

like operator
 where name like "_a%"

_ indicates any character at start followed by “a” and ending with any char of any length.

where wild cards
where name like "__n%"

At the beginning of a search, two underscores indicate that any two characters followed by “n” should be filtered.

Like Operator in SAS
where name like "__n_"

The above expression would return character values with two characters at the beginning followed by n and any one character at the end.

Like Operator in SAS

Sounds Like Operator (=*)

The SOUNDS-LIKE operator is based on the SOUNDEX algorithm for identifying words that sound alike. The SOUNDEX algorithm is English-biased, so it’s not useful for languages other than English.

data employees;
  set datasets.employees;
  where lname=*"PATTERSON";
run;

Sounds Like Operator

You can download the dataset employees.sas7bdat used in the program.

Also, read 4 Functions for fuzzy matching in SAS.

Using a WHERE statement and a WHERE=option in a DATA Step.

Using the WHERE statement in SAS and a WHERE= dataset option in a DATA step is not a good practice. SAS ignores the WHERE statement when both are used in the same dataset.

Additionally, a WARNING appears in the SAS log indicating that the WHERE statement cannot be applied.

What is the difference between if and WHERE statements in SAS?

In SAS, both the IF and WHERE statements are used to filter data based on certain conditions. However, they operate at different points in the data processing stage and thus have distinct functionalities and uses.

  1. Execution Order: The WHERE statement operates at the time of data input before the Data Step begins. It reads only those observations from the dataset that meet the specified condition. In contrast, the IF statement operates after the Data Step has started, meaning it reads all observations into the program data vector before applying the condition. This makes the WHERE statement more efficient when working with large datasets as it reduces I/O processing time.
  2. Subsetting: While both statements can be used for subsetting datasets, the WHERE statement is generally used in both Data Steps and PROC Steps. The IF statement, on the other hand, can only be used in Data Steps.
  3. Conditions: The WHERE statement can only handle conditions involving existing variables in the dataset. The IF statement is more flexible and can work with conditions involving newly created variables within the same Data Step.
  4. Syntax: WHERE uses SQL-like syntax and supports operators like BETWEEN-AND, CONTAINS, IS MISSING, etc. IF uses a different syntax and does not support these operators, but it can accommodate SAS functions within the condition.

For example, if you have a dataset named ‘Employee_data’ with ‘Name’ and ‘Salary’ as the variables, and you want to select employees who earn more than $50,000, you could use a WHERE statement or an IF statement as follows:

Using a WHERE Statement:

DATA selected_data;
   SET Employee_data;
   WHERE Salary > 50000;
RUN;

Using an IF Statement:

DATA selected_data;
   SET Employee_data;
   IF Salary > 50000;
RUN;

Frequently Asked Questions

  1. What is the WHERE statement in SAS?

    The u003ccodeu003eWHEREu003c/codeu003e statement in SAS is a data step statement used to select specific observations from a dataset based on certain conditions.

  2. Can I use multiple conditions in a SAS WHERE statement?

    Yes, the u003ccodeu003eWHEREu003c/codeu003e statement in SAS can handle multiple conditions using logical operators like u003ccodeu003eANDu003c/codeu003e, u003ccodeu003eORu003c/codeu003e, and u003ccodeu003eNOTu003c/codeu003e.

  3. How can I select a range of values using a WHERE statement in SAS?

    The u003ccodeu003eBETWEEN-ANDu003c/codeu003e operator can be used in a u003ccodeu003eWHEREu003c/codeu003e statement in SAS to select a range of values. For example, u003ccodeu003eWHERE Salary BETWEEN 40000 AND 60000;u003c/codeu003eTo clarify, the query will retrieve all records where the ‘Salary’ field is within the range of $40,000 to $60,000, inclusive.

  4. What does the ^= operator do in a WHERE statement?

    The u003ccodeu003e^=u003c/codeu003e operator is the u0022not containsu0022 operator in SAS. It is used to select observations where a certain character string is not present in a variable. For example, u003ccodeu003eWHERE Position ^= ‘Manager’;u003c/codeu003e would select all records where the ‘Position’ does not contain the word ‘Manager’.

  5. Can I use the WHERE statement with date values in SAS?

    Yes, you can use the u003ccodeu003eWHEREu003c/codeu003e statement with date values in SAS. For example, u003ccodeu003eWHERE HireDate BETWEEN ’01JAN2015’d AND ’31DEC2020’d;u003c/codeu003e would select all records where the ‘HireDate’ falls within the specified date range.

  6. Can you use a WHERE statement in proc freq?

    Yes, you can use a u003ccodeu003eWHEREu003c/codeu003e statement in u003ccodeu003ePROC FREQu003c/codeu003e in SAS to limit the analysis to a subset of data that meets the specified condition(s). The u003ccodeu003eWHEREu003c/codeu003e statement is used before the u003ccodeu003eTABLESu003c/codeu003e statement in the u003ccodeu003ePROC FREQu003c/codeu003e call.

Conclusion

In conclusion, mastering the Where Statement In SAS is pivotal for anyone seeking to harness the full potential of SAS programming for efficient data management. The ability to accurately apply and interpret WHERE statements, whether used independently or in combination with logical operators, is an indispensable skill in today’s data-driven world. This article has provided detailed insights into the application of the WHERE statement with multiple conditions, enabling you to implement complex logical operations with ease and precision.

Furthermore, we explored the difference between IF and WHERE statements, highlighting the unique strengths of the latter. The utility of the WHERE statement in various PROC procedures like PROC FREQ was also discussed, showing the breadth of its application. We trust that these advanced techniques have enriched your understanding and will empower your journey in SAS programming.

Every week we'll send you SAS tips and in-depth tutorials

JOIN OUR COMMUNITY OF SAS Programmers!

Subhro

Subhro provides valuable and informative content on SAS, offering a comprehensive understanding of SAS concepts. We have been creating SAS tutorials since 2019, and 9to5sas has become one of the leading free SAS resources available on the internet.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.