Using the Compare function in SAS for comparing strings
The COMPARE function in SAS lets you compare two-character values and with optionally available modifiers, you’ll be able to ignore cases and truncate a longer value to the length of a shorter value before making the comparison.
COMPARE function in SAS lets you compare two-character values. With optionally available modifiers, you’ll be able to ignore cases and truncate a longer value to the length of a shorter value before making the comparison.
To demonstrate the COMPARE function, suppose you must verify analysis codes that begin with C450.
One downside is that a few of the data could have the C in lowercase.
You need to match codes that begin with C450 and are followed by a period and, optionally, further digits resembling C450.100.
While this can be a comparatively simple activity using typical DATA step programming, you’ll be able to accomplish the comparison in a single statement using the COMPARE function.
Take a look at the following program:
data test1; input code $10.; datalines; V450 c450 c450.100 C900 ; run; data test; set test1; compareValue=compare(code,'C450','i:'); if compare(code,'C450','i:') eq 0 then Match = 'Yes'; else Match = 'No'; run;
- The first two arguments of the COMPARE function are the two character values you
need to compare.
- The third argument is the option that lets you specify modifiers.
imodifier is used to ignore the case.
- The colon (
:) modifier is used to truncate the longer string to the length of the shorter string before making the comparison.
COMPARE returns a 0 if there’s a match (after applying the modifiers) and a non-Zero value if the two values differ.
The value returned tells you the first character in the two strings that is different. Observe the compare value for observations 1 and 4. The value 1 for observation 1 tells that the 1st character is different, whereas observation 4 tells that the 2nd character is different.
The sign of this value tells you which of the two values comes first in the collating sequence.
In practice, you merely need to know if the function returns a Zero or not.
Be cautious whenever you use the colon modifier. When SAS computes the shorter string length, it includes trailing blanks.
Here is an example:
data test2; String1 = 'ABC'; String2 = 'ABCXYZ'; Compare1 = compare(String1,String2,':'); Compare2 = compare(trim(String1),String2,':'); run;
- String1 is ABC followed by trailing blanks. When you use the colon modifier to compare this value to String2, SAS sees the length of both strings as equal to 6.
- Using the TRIM function to remove the trailing blanks while comparing is always a good practice.
- For the value of Compare2, SAS trims String2 to a length of 3 (the length of String1 after you strip off the trailing blanks) before making the comparison.
If you are curious about why the value of Compare1 is –4, here is why: The two strings differ in the fourth character. Because a blank comes before a Z in the collating sequence, the value is negative.