Ever found yourself stuck in a maze of conditional data analysis? Don’t worry; I’ve got your back. Today, let’s dive into the magical world of the Proc SQL Case When Statement in SAS. Trust me, it’s a game-changer!
The CASE statement will allow you to implement if-else in SQL, so you can use it to run the query conditionally.
The CASE
the statement will test the conditions mentioned in the WHEN
clause and the return the values mentioned in the THEN
clause.
When no conditions are satisfied, it will return the value mentioned in the ELSE
clause.
If there is no ELSE
part and no conditions are true, it returns NULL.
What’s the Big Deal with CASE WHEN?
Imagine you’re a chef, and CASE WHEN is your Swiss Army knife. It’s that versatile! You can slice and dice your data based on conditions, just like you’d chop veggies for a salad. 🥗
Here’s how the syntax looks:
Syntax:
Proc SQL CASE WHEN condition1 THEN result1 WHEN condition2 THEN result2 WHEN conditionN THEN resultN ELSE result END;
Consider a Payroll dataset. Want to create a new column called “new_code” based on the JobCode? Easy-peasy!
It is a simple Payroll dataset as below.
You can use multiple WHENs…THEN clauses and leave out the ELSE clause, which is optional.
The CASE expression in SQL evaluates the conditions listed in the WHEN clause. The code in the THEN clause is executed if the condition is true, and the code in the ELSE part is executed if the condition is false.
The most frequently used form of this statement is to write multiple “when” and “then” clauses under a single “case” expression.
Below is a case when sas example. Let’s create a new column, “new_code,” by putting conditions on the JobCode column as below.
proc sql;
SELECT *,
CASE
WHEN jobcode eqt 'FA' then 'FA'
WHEN jobcode eqt 'TA' THEN 'TA'
ELSE JOBcode
END AS new_code
FROM sasuser2.payrollchanges;
Note: The ELSE clause in the above code is optional. If you don’t mention it, then all the places where conditions in both the WHEN clauses are false will be filled with NULL
.
Here, we have used the Truncated string operator “EQT”. PROC SQL supports truncated string comparison operators. In a truncated string comparison, the comparison is performed after making the strings the same length by truncating the longer string to be the same length as the shorter string.
Another way to write CASE statement in Proc SQL
Another way is when the condition being evaluated is only in one column. SQL offers you a simplified way of writing CASE..WHEN
expression as below.
proc sql;
SELECT *,
CASE Gender
WHEN 'M' then 'Male'
WHEN 'F' THEN 'Female'
END AS Gender
FROM sasuser2.payrollchanges;
Simply write “CASE,” followed by the name of the column to which you want to apply the condition.
When you use the shorthand method, the conditions must all be equality tests. That is, they cannot use comparison operators or other operators.
SAS Case When with Multiple Conditions
You can add as many conditions as you like! It’s like adding spices to your dish. The more, the merrier!
Salary Categories, Anyone?
Let’s say you have a dataset with a ‘Salary’ column. You can categorize employees as ‘High’, ‘Medium’, or ‘Low’ based on their salary. How cool is that?
DATA Employee_data;
INPUT Name $ Salary;
DATALINES;
John 55000
Jane 45000
Alex 65000
Emma 32000
Ryan 28000
;
RUN;
PROC SQL;
CREATE TABLE Employee_Categories AS
SELECT Name, Salary,
CASE
WHEN Salary > 50000 THEN 'High'
WHEN Salary BETWEEN 30000 AND 50000 THEN 'Medium'
ELSE 'Low'
END AS Salary_Category
FROM Employee_data;
QUIT;
PROC PRINT;
In the above code, the Case When
statement is used to create a new ‘Salary_Category’ column based on different conditions. The logic of the code reads as:
- If an employee’s salary is greater than 50,000, they are categorized as ‘High’.
- If their salary lies between 30,000 and 50,000, they are classified as ‘Medium’.
- For all other cases, i.e., the salary is less than 30,000, they are marked as ‘Low’.
This flexible way of categorizing data is made possible using the SAS Case When multiple conditions.
The Power of OR
Using the CASE WHEN
statement combined with the OR
operator is a common scenario for tackling more complex conditions. This allows you to specify multiple conditions so that if any of them is met, specific action will be taken.
Below is a hypothetical example of how you might use CASE WHEN
with an OR
condition in a SAS Proc SQL
statement:
PROC SQL;
CREATE TABLE Employee_Bonus AS
SELECT Name, Salary,
CASE
WHEN (Salary > 70000 OR Salary < 30000) THEN 'Eligible for Bonus'
ELSE 'Not Eligible for Bonus'
END AS Bonus_Status
FROM Employee_data;
QUIT;
In this example, we’re examining an ‘Employee_data’ table and creating a new table ‘Employee_Bonus’. The new table includes a column called ‘Bonus_Status’. If an employee’s salary is more than 70,000 or less than 30,000, their status will be ‘Eligible for Bonus’. Otherwise, their status will be ‘Not Eligible for Bonus’.
Thus, you can see that combining CASE WHEN
with the OR
operator enables you to create a diverse array of conditions to better analyze and categorize your data.
CASE WHEN in SELECT statement
As you noticed in the above example, you can conditionally test the values in a single column and create a new column based on that. But the WHEN clause gives you the flexibility to add conditions on multiple columns as well.
For example, suppose that for all the orders where the JobCode is TA3 and the gender is female, you want to add 1000 in the salary column. At the same time, for all other combinations of JobCode and gender, you want to keep the Salary column unchanged.
Ideally, you need conditions on two columns — JobCode = "TA3"
, Gender = "F"
. Moreover, both conditions must be true to get the new values for salary. Hence, you need to join these conditions using a logical AND.
proc sql;
SELECT *,
CASE WHEN Jobcode='TA3'
THEN salary+1000
ELSE salary
END AS new_salary
FROM sasuser2.payrollchanges;
quit;
As you can see two conditions in the WHEN clause are joined with the keyword AND. Only when both the conditions are True, the code in the THEN clause is executed i.e. Salary+1000, as you can see in the above picture.
Here, ELSE new_Salary
keeps the Salary column unchanged for all other combinations of JobCode and Gender.
This scenario requires you to split the records into categories using if-else
logic, which is why CASE...WHEN
is the appropriate tool.
Categorize the data using CASE WHEN
Data can be split into multiple categories based on other numerical and non-numerical columns.
As opposed to the GROUP BY clause, which groups the rows into different groups, CASE-WHEN creates a new column and mentions a group name to which each row of the dataset belongs.
For example, you have the DateofHire for each employee in the payroll data. Suppose you want to categorize it as “less than 20 years,” “20+,” or “30+” years of experience based on the column DateofHire. You can use the CASE-WHEN statement below.
proc sql;
SELECT *,intck('year', dateofhire, today()) as years_of_Exp,
case
when calculated years_of_exp < 20 then 'Less than 20 Years'
when calculated years_of_exp between 20 and 30 then '20+'
else '30+'
end as years_of_Exp
FROM sasuser2.payrollchanges;
quit;
CASE WHEN with Aggregate Functions
This is one of the more complicated use cases for CASE statements, and using aggregate functions with it can be tricky.
Therefore, let’s try to understand it with an example.
Suppose you want to get the total number of employees for each JobCode which shows the number of male and female employees.
Ideally, you need to count the male and female employees that belong to a JobCode. But with a CASE statement, you don’t need to type each combination. You can simply use an aggregate function COUNT
as below.
proc sql outobs=5;
SELECT jobcode,
count(case when GENDER EQT 'M' then empid end) as Male,
count(case when GENDER EQT 'F' then empid end) as Female
FROM sasuser2.payrollmaster
group by jobcode
order by jobcode;
quit;
You should mention the CASE..WHEN..THEN
within the aggregate function, as above. Here, in the column Delivered_Orders, you are counting all the orders WHEN
Status is Delivered to get the final output as below.
CASE Statement in ORDER BY Clause
In SQL, the ORDER BY
clause arranges output datasets ascending or descending. You can sort data differently by using CASE within, giving you more flexibility.
For example, suppose you want to arrange all the ‘Male’ Employees in ascending order of Hiredate if they belong to the TA job code and ascending order of Birthdate if they belong to any other job code.
proc sql;
select empid,dateofhire,jobcode,dateofbirth,gender
FROM sasuser2.payrollmaster
where (jobcode LIKE 'TA%' OR jobcode LIKE 'FA%')
and gender EQ 'M'
order by case
when jobcode eqt 'TA' then dateofhire
else dateofbirth
end;
quit;
For example, suppose you want to arrange all the ‘Female’ Employees in ascending order of Hire date and all male employees in ascending order of Birthdate.
proc sql;
select empid,dateofbirth,dateofhire,gender
FROM sasuser2.payrollmaster
where jobcode IN ('TA1','FA1')
order by
(CASE when GENDER eqt 'F' then dateofhire end) desc, /* -- F -> dateofhire descending */
(CASE WHEN gender eqt 'M' then dateofbirth end) ASC; /* -- M -> dateofbirth ascending */
quit;
Use Distinct in Case When
When specifying criteria or conditions in SQL queries, you can ignore duplicates. To remove duplicates, you must use the DISTINCT keyword. For calculating the number of unique cases, it is often used in conjunction with the COUNT function.
Suppose you need to calculate the number of employees having distinct “locations” when the salary is greater than $25,000 for each division. See the snapshot of data below –
proc sql;
select division,count(distinct location) as unique_location,
count(case when salary gt 25000 then empid else '' end) as unique_criteria
from test
group by division order by division;
quit;
The above program computes the number of employees with distinct values in variable ‘location’ when the values of variable “salary” are greater than $25000.
The keyword DISTINCT is used to remove or ignore duplicate records.
Pivoting data using case when
The data you are dealing with is sometimes long (number of rows > number of columns) and you need it in a wide format (number of columns > number of rows).
In such cases, a CASE statement comes in handy.
For example, let’s find out how many employees are there for each division for USA, UK, Canada & Germany.
proc sql;
SELECT division,
COUNT(CASE WHEN country = 'USA' THEN empid
END) AS USA_Employees,
COUNT(CASE WHEN country = 'UNITED KINGDOM' THEN empid
END) AS UK_Employees,
COUNT(CASE WHEN country = 'CANADA' THEN EmpID
END) AS CANADA_Employees,
COUNT(CASE WHEN country = 'GERMANY' THEN EmpID
END) AS GERMANY_Employees
FROM sasuser2.empdata
GROUP BY division;
quit;
Using CASE..WHEN..THEN we created separate columns for each Country to get the expected output as below.
Frequently Asked Questions
1. How to use the if condition in proc SQL in SAS?
In SAS, you can use conditional statements within the Proc SQL procedure to perform different actions based on the outcome of a logical test. While the CASE WHEN statement is more frequently used in Proc SQL, you can achieve a similar effect of an IF-THEN-ELSE statement using the calculated column method.
PROC SQL;
CREATE TABLE Employee_Bonus AS
SELECT Name, Salary,
(Salary > 50000) AS High_Earner
FROM Employee_data;
QUIT;
2. How do I use a Case When statement with multiple conditions in SAS?
You can incorporate multiple conditions within a Case When statement in SAS. It acts similarly to a chain of IF-THEN-ELSE statements.
PROC SQL;
CREATE TABLE Employee_Categories AS
SELECT Name, Salary,
CASE
WHEN Salary > 50000 THEN 'High'
WHEN Salary BETWEEN 30000 AND 50000 THEN 'Medium'
ELSE 'Low'
END AS Salary_Category
FROM Employee_data;
QUIT;
3. Can I use “OR” and "AND"
operators in Case When
statements?
Logical operators such as OR and AND can be used within Case When statements to create more complex conditions.
PROC SQL;
CREATE TABLE Employee_Bonus AS
SELECT Name, Salary,
CASE
WHEN (Salary > 70000 OR Salary < 30000) THEN 'Eligible for Bonus'
ELSE 'Not Eligible for Bonus'
END AS Bonus_Status
FROM Employee_data;
QUIT;
4. How does SAS evaluate multiple conditions in a Case When statement?
SAS evaluates the conditions in a Case When the statement is sequentially from top to bottom. As soon as it encounters a true condition, it executes the corresponding SQL statement and exits the Case When block, ignoring the rest of the conditions.
5. Can a Case When statement be used in the WHERE clause in SAS?
No, a Case When statement cannot be used in the WHERE clause in SAS. The WHERE clause only accepts conditional expressions. However, you can use the outcome of a Case When statement in a WHERE clause by first creating a computed column in a subquery or a view.
Wrapping up
In conclusion, the Proc SQL Case When Statement is a powerful tool for data analysis in SAS. By allowing you to perform conditional data analysis, it enables you to gain deeper insights into your data and make informed decisions.
Whether you need to recode variables, create new ones, or manipulate data, this guide provides a solid foundation.
With the help of the Proc SQL Case When Statement, your data analysis journey will be both efficient and insightful.