SQL Set operators are conceptually derived from the mathematical set principle. The three fundamental set operators are
All three, with variations (in particular, OUTER UNION) and options, could be carried out in PROC SQL for combining data from a number of tables in the SQL procedure.
JOINS VS SQL SET OPERATORS
Before we delve into the main points of PROC SQL set operators, let’s set up the basic distinctions between joins and set operators.
DATA first; A = 1; RUN; DATA second; B = 2; RUN;
So each of these tables has one row and one column. You can use a PROC SQL step to combine the two through a simple cross join:
SELECT * FROM first, second ;
Now take a look at the UNION, which is the simplest type of probably the most extensively used of the set operators. The code to combine our two tables is:
SELECT * FROM first UNION SELECT * FROM second ;
UNION is inserted between two SELECTs (each of which has, as it should, a subordinate FROM clause).
A set operator works on the outcomes of two SELECTs. This is not like a join, which is carried out inside the FROM clause of a single SELECT.
See the two numeric values, this time arranged vertically rather than horizontally. This
demonstrates the fundamental difference between joins and set operators.
Joins align rows and accrete columns whereas set operators align columns and accrete rows.
This is something of an oversimplification in fact. SQL just isn’t a matrix language and provides comparatively little symmetry between rows and columns. So the distinction is drawn here between joins and set operators is barely a basis for the main points to observe.
Rules for SET operators
Set operators adhere to fundamental guidelines of operation.
- If a SELECT statement consists of multiple set operator, set operators will likely be applied in the order specified.
- By default, duplicate rows are eliminated from the results.
- To allow duplicates, the ALL option should be specified with a set operator.
- Arguments are evaluated from left to right.
- Set operators can be used in:
- Derived tables
- View definitions
- INSERT with SELECT clause
SQL Operators and Precedence
Set operators adhere to an order of precedence. The following precedence rules apply:
- When multiple set operator is specified, each is applied in the order specified:
- Top to bottom
- to right
- The default order of precedence for processing set operators follows:
- UNION and/or EXCEPT
- When parentheses are specified, the default order of precedence could be altered.
Data Type Compatibility
The alignment of columns in these examples has laboured smoothly because the aligned columns have matched with respect to the data type (numeric or character). Since column alignment is an important aspect of virtually all the set operators, it’s worth exploring this a bit more.
DATA num; id = 3; worth = 0; RUN; DATA char; id = 4; worth = 'abc'; RUN;
VALUE is numeric in data set NUM but a character in data set CHAR. So once you
try a DATA step concatenation with
DATA both; SET num char; RUN;
SAS will give this log message:
The new data set (BOTH) is created but contains no observations.
If you run the parallel SQL code:
CREATE TABLE both AS SELECT * FROM num OUTER UNION CORRESPONDING SELECT * FROM char ;
ERROR: Column 2 from the first contributor of OUTER UNION is not the same
type as to its counterpart from the second.
Unlike the DATA step, PROC SQL doesn’t even create an empty table on this scenario.
There is only one set operator which is resistant to data type mismatches as a result of it does no column alignment; that’s the OUTER UNION operator without the CORRESPONDING option.
Accessing Rows from the Intersection of Two Queries (INTERSECT)
The INTERSECT operator creates query results that consist of all of the distinctive rows from the intersection of the two queries. Put another approach, the intersection of two queries (A and B) is represented by C, which signifies that the rows which are produced occur in both A and B
proc sql; select * from sashelp.prdsale where precise < 20 intersect select * from sashelp.prdsale where prodtype = "OFFICE"; quit;
- It is assumed that the tables in each query are structurally an identical as a result of the wildcard character is specified within the SELECT assertion.
- The INTERSECT operator produces rows which are widespread to each query.
Accessing Rows from the Combination of Two Queries (UNION)
The UNION operator preserves all the distinctive rows from the combination of queries.
The result is identical as if an OR operator is used to combine the results of each question. Put another approach, the union of two queries (A and B) represents rows in A or in B or in both A and B.
The UNION operator automatically eliminates duplicate rows from the results, until the ALL keyword, is specified as a part of the UNION operator. The column names assigned to the results are derived from the names in the first query.
In order for the union of two or more queries to achieve success, each query should specify the same number of columns of the same or compatible types.
Type compatibility signifies that column attributes are defined the same way. Because column names and attributes are derived from the primary table, data types should be of the same type. The data types of the result columns are derived from the source table(s).
proc sql; select * from sashelp.prdsale where actual < 5 intersect select * from sashelp.prdsale where prodtype = "OFFICE"; quit;
Concatenating Rows from Two Queries (OUTER UNION)
The OUTER UNION operator concatenates the results of two queries. As with a DATA step or PROC APPEND concatenation, the results include rows which are combined vertically.
Put another way, the outer union of two queries (A and B) represents all rows in both A and B with no overlap.
The OUTER UNION operator automatically concatenates rows from two queries with no overlap, until the CORRESPONDING (CORR) keyword is specified as a part of the operator.
proc sql; create table combine as SELECT name,age,height FROM sashelp.class where age <12 OUTER UNION SELECT name,age,weight FROM sashelp.class where age < 12; quit;
The column names which are assigned to the results are derived from the names in the first query. In the following example, the CORR keyword enables tables columns with the same name and attributes to be overlaid.
proc sql; create table combine as SELECT name,age,height FROM sashelp.class where age <12 OUTER UNION corr SELECT name,age,weight FROM sashelp.class where age < 12; quit;
Comparing Rows from Two Queries (EXCEPT)
The EXCEPT operator compares rows from two queries to find out the modifications made to the primary table that aren’t present in the second table.
The following results show new and altered rows in the first table that is in the second table however don’t show rows which were deleted from the second table.
When working with two tables that consist of similar data, you need to use the EXCEPT operator to find out new and modified rows.
data subjects_original; input name $ city $; datalines; AAA Pune BBB Delhi CCC Mumbai EEE Kolkata ; run; data subjects_new; input name $ city $; datalines; AAA Patna BBB Delhi CCC Mumbai DDD Bangalore ; run; proc sql; create table checkdata as select * from subjects_original except select * from subjects_new; quit;
The EXCEPT operator is used to determine rows in the first table (or a question), however, shouldn’t be used to determine rows in the second table (or query).
It additionally uniquely identifies rows which have modified from the primary table to the second table. Columns are compared within the order that they appear in the SELECT statement.