How To Create New Columns In PROC SQL Without A Data Step?

How to Create New Columns in PROC SQL Without a Data Step?

  • Post author:
  • Post category:PROC SQL
  • Post comments:0 Comments
  • Reading time:13 mins read

Do you want to learn how to use the proc SQL calculated keyword in SAS? This lets you create new variables on the fly using simple expressions. You can use it to perform calculations or transformations on your data without having to create them beforehand. This can save you time and memory, especially when working with large datasets.

In this article, we will show you how to use the proc SQL calculated feature in SAS with step-by-step instructions and examples.

What is the proc SQL calculated Keyword in SAS?

The proc SQL calculated keyword is a way to create new variables using arithmetic or logical expressions. You can use any valid SAS operators, functions, or constants in your expressions. You can also reference other calculated variables that are defined earlier in the same select clause.

To use the proc SQL calculated keyword, you need to follow these steps:

1. Write a proc SQL statement and select the variables you want to use in your calculation.

2. Use the keyword calculated followed by an equal sign and an expression to create a new variable. You can use the as keyword to give the new variable a label for display purposes.

3. Use the new variable in your where, group by, order by, or having clauses as needed. You can also use it in your output or print statements.

For example, let’s say you want to calculate the body mass index (BMI) of some patients based on their weight and height. BMI is a measure of body fat based on weight and height. It can help you assess your health risks related to obesity.

To calculate BMI, you need to divide weight by height squared. You can use the proc SQL calculated feature to create a new variable called bmi and assign it the value of weight divided by height squared.

Here is how you would write the proc SQL statement:

/* Use sashelp.class dataset */
proc sql;
    create table bmi_table as
    select Name, Sex, Age, Height, Weight,
           (Weight / (Height * Height)) * 703 as bmi_calculated /* Multiply by 703 to convert units to inches and pounds */
    from sashelp.class;
quit;
proc SQL calculated
proc SQL calculated

The AS keyword is optional, but it is recommended to use it to assign a name to the new column. Otherwise, SAS will assign a default name such as _COL1.

When to Use the Calculated keyword?

The calculated feature is useful when you want to create new columns that are based on existing ones, and you don’t want to create a separate data step for that. This can save you some coding time and reduce the number of intermediate data sets.

In the WHERE Clause for Aggregated Columns

Standard SQL does not allow using column aliases in a WHERE clause since the column value may not have been determined yet during evaluation. However, SAS allows using calculated values in the WHERE clause to filter based on aggregated columns, which deviates from the standard SQL behaviour.

Suppose you want to select only the students that have a BMI of more than 29.5. You can use the calculated keyword like this:

proc sql;
    create table filtered_bmi_table as
    select Name, Sex, Age, Height, Weight,
           (Weight / (Height * Height)) * 703 as bmi_calculated /* Multiply by 703 to convert units to inches and pounds */
    from sashelp.class
    where calculated bmi_calculated > 18.5;
quit;
Using PROC SQL Calculated in the Where Clause for Aggregated Columns
Using PROC SQL Calculated in the Where Clause for Aggregated Columns

Using the calculated Keyword with GROUP BY and HAVING Clauses in SAS

You might want to calculate the total sales for each country and then filter out the cities that have total sales less than a certain threshold. In this case, you can use the PROC SQL calculated keyword as below.

proc sql;
    create table filtered_sales as
    select Country, sum(Actual) as total_sales
    from sashelp.prdsale
    group by Country
    having calculated total_sales > 1000;
quit;

proc print data=filtered_sales;
run;
Using PROC SQL Calculated in the havingClause for Aggregated Columns
Using PROC SQL Calculated in the having Clause for Aggregated Columns

For Calculations Dependent on Other Calculated Columns

SAS allows you to use calculated to create a new column that depends on another calculated column within the same query.

For example, suppose you want to create another column called PROFIT that is equal to REVENUE minus COST. You cannot use the calculated keyword like this:

data sales;
   input product $ price quantity cost;
   datalines;
   A 10 5 8
   B 15 3 12
   C 20 4 16
   D 25 2 20
   ;
run;

proc sql;
	select product, price, quantity, (price*quantity) as revenue, (calculated revenue-cost) 
		as profit from sales;
quit;
Calculations Dependent on Other Calculated Columns
Calculations Dependent on Other Calculated Columns

Using PROC SQL Calculated in Subqueries

SAS allows the use of calculated in subqueries for filtering.

proc sql;
    select *
    from (
        select Name, Height, Weight, (Weight / (Height * Height)) * 703 as BMI
        from sashelp.class
    ) as subquery
    where calculated BMI > 18.5; 
quit;
PROC SQL Calculated in Subquerie
PROC SQL Calculated in Subqueries

Scenarios Where the calculated Keyword Cannot Be Used in PROC SQL in SAS

  1. In ORDER BY Clause Without SELECT:
/* This won't work */
proc sql;
    select Name
    from sashelp.class
    order by calculated Height*Weight;  /* Invalid use */
quit;
ERROR: It appears that the CALCULATED variable Height was referenced before it was defined.
ERROR: The following columns were not found as CALCULATED references in the immediate query: Height.
  1. In Joins You can’t use calculated to refer to a calculated column in a join condition.

The calculated keyword is designed to refer to calculated columns within the same query, not across different tables or subqueries in a join operation.

/* Create two simple datasets */
data table1;
    input Name $ Age;
    datalines;
    Alice 25
    Bob 30
    Carol 35
    ;
run;

data table2;
    input Name $ Age;
    datalines;
    Dave 30
    Eve 35
    Frank 40
    ;
run;

/* Attempt to use calculated in a join condition */
proc sql;
    create table joined_table as
    select A.Name, B.Age
    from table1 as A, table2 as B
    where A.Age + 5 as NewAge = calculated NewAge;  /* This won't work */
quit;

Important Considerations When Using the calculated keyword in PROC SQL in SAS

Before using the calculated keyword in PROC SQL, it is important to be aware of certain limitations and considerations.

  1. Order of Evaluation of Expressions

The calculated feature does not change the order in which SAS evaluates expressions. Expressions are evaluated from left to right, following the standard order of precedence for operators and functions.

Example:

proc sql;
select (Price * Quantity) as Revenue
from sales_data;
quit;

In this example, SAS evaluates the multiplication (Price * Quantity) first and then assigns the result to the new column Revenue.

Note: To change the order of evaluation, you can use parentheses to group your expressions.

  1. Data Type and Length of New Column

The calculated feature does not affect the data type or length of the new column. SAS automatically determines these based on the expression or function that defines the new column.

Example:

proc sql;
select (Price * Quantity) as Revenue format=8.2
from sales_data;
quit;

Here, the data type and length of Revenue are determined by those of Price and Quantity. If you want to specify a different format, you can use the format keyword as shown.

  1. Handling of Missing Values

The calculated feature does not change how SAS handles missing values in the new column. Missing values are handled based on the expression or function that defines the new column.

Example:

proc sql;
select coalesce(Price, 0) * coalesce(Quantity, 0) as Revenue
from sales_data;
quit;

In this example, if either Price or Quantity is missing, SAS would assign a missing value to Revenue. To handle missing values differently, you can use functions like COALESCE or NMISS.

Note: To change the handling of missing values, you can use conditional expressions or functions like COALESCE or NMISS.

These are some important considerations to keep in mind when using the calculated feature in PROC SQL in SAS.

Conclusion

The calculated feature is a handy way to create new columns in a SELECT statement without having to create them in a separate data step. However, you should be aware of its limitations and considerations before using it.

In this blog post, I showed you how to use the calculated feature in PROC SQL, when to use it, and what are some important notes to consider. I hope you found this post useful and informative. If you have any questions or comments, please feel free to share them below.

Every week we'll send you SAS tips and in-depth tutorials

JOIN OUR COMMUNITY OF SAS Programmers!

Subhro

Subhro provides valuable and informative content on SAS, offering a comprehensive understanding of SAS concepts. We have been creating SAS tutorials since 2019, and 9to5sas has become one of the leading free SAS resources available on the internet.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.