Proc SQL Case When Statement: A Guide to Efficient Data Analysis
Discover how to effectively analyze your data using the Proc SQL Case When Statement. Learn tips and tricks for maximizing its potential for efficient data analysis and gain a better understanding of conditional data analysis
The CASE statement will allow you to implement if-else in SQL, so you can use it to run the query conditionally.
CASE statment will test the conditions mentioned in the
WHEN clause and the return the values mentioned in the
When no conditions are satisfied, it will return the value mentioned in the
If there is no
ELSE part and no conditions are true, it returns NULL.
Proc SQL CASE WHEN condition1 THEN result1 WHEN condition2 THEN result2 WHEN conditionN THEN resultN ELSE result END;
It is a simple Payroll dataset as below.
You can use multiple WHEN…THEN clauses and leave out the ELSE clause, which is optional.
The CASE expression in SQL evaluates the conditions listed in the WHEN clause. The code in the THEN clause is executed if the condition is true, and the code in the ELSE part is executed if the condition is false.
The most frequently used form of this statement is to write multiple “when” and “then” clauses under a single “case” expression.
For example, let’s create a new column, “new_code,” by putting conditions on the JobCode column as below.
proc sql; SELECT *, CASE WHEN jobcode eqt 'FA' then 'FA' WHEN jobcode eqt 'TA' THEN 'TA' ELSE JOBcode END AS new_code FROM sasuser2.payrollchanges;
Note: The ELSE clause in the above code is optional. If you don’t mention it, then all the places where conditions in both the WHEN clauses are false will be filled with
Here, we have used the Truncated string operator “EQT”. PROC SQL supports truncated string comparison operators. In a truncated string comparison, the comparison is performed after making the strings the same length by truncating the longer string to be the same length as the shorter string.
Another way to write CASE statement in Proc SQL
Another way is when the condition being tested is only on one column. SQL offers you a simplified way of writing
CASE..WHEN expression as below.
proc sql; SELECT *, CASE Gender WHEN 'M' then 'Male' WHEN 'F' THEN 'Female' END AS Gender FROM sasuser2.payrollchanges;
Simply write “CASE,” followed by the name of the column to which you want to apply the condition.
When you use the shorthand method, the conditions must all be equality tests. That is, they cannot use comparison operators or other types of operators.
CASE WHEN in SELECT statement
As you noticed in the above example, you can conditionally test the values in a single column and create a new column based on that. But, the WHEN clause gives you the flexibility to add conditions on multiple columns as well.
For example, suppose that for all the orders where the JobCode is TA3 and the gender is female, you want to add 1000 in the salary column. At the same time, for all other combinations of JobCode and gender, you want to keep the Salary column unchanged.
Ideally, you need conditions on two columns —
JobCode = "TA3",
Gender = "F". Moreover, both conditions must be true to get the new values for salary. Hence, you need to join these conditions using a logical AND.
proc sql; SELECT *, CASE WHEN Jobcode='TA3' THEN salary+1000 ELSE salary END AS new_salary FROM sasuser2.payrollchanges; quit;
As you can see two conditions in the WHEN clause are joined with the keyword AND. Only when both the conditions are True, the code in the THEN clause is executed i.e. Salary+1000, as you can see in the above picture.
ELSE new_Salary keeps the Salary column unchanged for all other combinations of JobCode and Gender.
This scenario requires you to split the records into categories using
if-else logic, which is why
CASE...WHEN is the appropriate tool.
Categorize the data using CASE WHEN
Data can be split into multiple categories based on other numerical as well as non-numerical columns.
As opposed to the GROUP BY clause, which groups the rows into different groups, CASE-WHEN simply creates a new column and mentions a group name to which each row of the dataset belongs.
For example, you have the DateofHire for each employee in the payroll data. Suppose you want to categorize it as “less than 20 years,” “20+,” or “30+” years of experience based on the column DateofHire. You can use the CASE-WHEN statement below.
proc sql; SELECT *,intck('year', dateofhire, today()) as years_of_Exp, case when calculated years_of_exp < 20 then 'Less than 20 Years' when calculated years_of_exp between 20 and 30 then '20+' else '30+' end as years_of_Exp FROM sasuser2.payrollchanges; quit;
CASE WHEN with Aggregate Functions
This is one of the more complicated use cases for CASE statements, and using aggregate functions with it can be tricky.
Therefore, let’s try to understand it with an example.
Suppose, you want to get the total number of employees for each JobCode which shows the number of male and female employees.
Ideally, you need to count the male and female employees that belong to a JobCode . But with a CASE statement, you don’t need to type each and every combination. You can simply use an aggregate function
COUNT as below.
proc sql outobs=5; SELECT jobcode, count(case when GENDER EQT 'M' then empid end) as Male, count(case when GENDER EQT 'F' then empid end) as Female FROM sasuser2.payrollmaster group by jobcode order by jobcode; quit;
You should mention the
CASE..WHEN..THEN within the aggregate function, as above. Here, in the column Delivered_Orders, you are counting all the orders
WHEN Status is Delivered to get the final output as below.
CASE Statement in ORDER BY Clause
In SQL, the
ORDER BY clause arranges output datasets ascending or descending. You can sort data differently by using CASE within
ORDER BY, giving you more flexibility.
For example, suppose you want to arrange all the ‘Male’ Employees in ascending order of Hiredate if they belong to the TA job code and ascending order of Birthdate if they belong to any other job code.
proc sql; select empid,dateofhire,jobcode,dateofbirth,gender FROM sasuser2.payrollmaster where (jobcode LIKE 'TA%' OR jobcode LIKE 'FA%') and gender EQ 'M' order by case when jobcode eqt 'TA' then dateofhire else dateofbirth end; quit;
For example, suppose you want to arrange all the ‘Female’ Employees in ascending order of Hire date and all male employees in ascending order of Birthdate.
proc sql; select empid,dateofbirth,dateofhire,gender FROM sasuser2.payrollmaster where jobcode IN ('TA1','FA1') order by (CASE when GENDER eqt 'F' then dateofhire end) desc, /* -- F -> dateofhire descending */ (CASE WHEN gender eqt 'M' then dateofbirth end) ASC; /* -- M -> dateofbirth ascending */ quit;
Use Distinct in Case When
When specifying criteria or conditions in SQL queries, you can ignore duplicates. To remove duplicates, you must use the DISTINCT keyword. For calculating the number of unique cases, it is often used in conjunction with the COUNT function.
Suppose you need to calculate the number of employees having distinct “locations” when the salary is greater than $25,000 for each division. See the snapshot of data below –
proc sql; select division,count(distinct location) as unique_location, count(case when salary gt 25000 then empid else '' end) as unique_criteria from test group by division order by division; quit;
The above program computes the number of employees with distinct values in variable ‘location’ when the values of variable “salary” are greater than $25000.
The keyword DISTINCT is used to remove or ignore duplicate records.
Pivoting data using case when
The data you are dealing with is sometimes long (number of rows > number of columns) and you need it in a wide format (number of columns > number of rows).
In such cases, a CASE statement comes in handy.
For example, let’s find out how many employees are there for each divison for USA,UK, Canada & Germany.
proc sql; SELECT division, COUNT(CASE WHEN country = 'USA' THEN empid END) AS USA_Employees, COUNT(CASE WHEN country = 'UNITED KINGDOM' THEN empid END) AS UK_Employees, COUNT(CASE WHEN country = 'CANADA' THEN EmpID END) AS CANADA_Employees, COUNT(CASE WHEN country = 'GERMANY' THEN EmpID END) AS GERMANY_Employees FROM sasuser2.empdata GROUP BY division; quit;
Using CASE..WHEN..THEN we created separate columns for each Country to get the expected output as below.
In conclusion, the Proc SQL Case When Statement is a powerful tool for data analysis in SAS. By allowing you to perform conditional data analysis, it enables you to gain deeper insights into your data and make informed decisions.
Whether you are a beginner or an experienced SAS programmer, understanding and utilizing the Proc SQL Case When Statement will help you work more efficiently and effectively with your data.
Whether you are using it to recode variables, create new variables, or perform complex data manipulations, this guide has provided you with a solid foundation for getting started.
With the help of the Proc SQL Case When Statement, your data analysis journey will be both efficient and insightful.