Most SAS programmers depend on the
COMPRESS function in SAS for cleaning up troublesome string data. The third ‘modifier’ argument, added in Version 9, might not be as acquainted.
If you’ve been programming in SAS® for any length of time, you’ve most likely used the COMPRESS function in SAS to clean up input data or extract a helpful tidbit from a string or variable name.
Unless you’re the kind of one who rigorously reads the documentation for each update or browses many convention papers, you won’t be aware of the hidden superpowers now you can use.
With Version 9, SAS added a 3rd “modifier” argument to the COMPRESS function– and it could do some superb things:
Compress Function in SAS with the third Argument
Most SAS programmers first learn the COMPRESS function when they need to remove extraneous spaces or different troublesome characters from strings.
We attempt to work with a string, find things we don’t need and remove them:
This works until the subsequent data arrives with new extraneous characters and our code complains or crashes once more. We put further characters within the second argument:
Now things run again. However, new data might introduce extra issues, and the cycle repeats….
If used correctly, the second argument in the Compress function can reduce this repeated modification. Various options cover entire classes of characters so that we can write generalised compression statements.
Using these options alone or with one another because the third argument generalises the conventional compress behaviour, that is, it removes the entire class from the string.
For instance, the code we had above could be generalised as:
The P modifier removes all punctuation (observe missing the second argument)
You can use as many options as you want together. You can leave the second argument blank if the final option is adequate, or it’s also possible to include particular items within the second argument.
The A and P modifiers are used together that remove all punctuation, all alphabetic characters, and the digit “0.”
Below is the list of some modifiers you can use to expand the functionality of the compress function.
|A||Removes alphabetic characters||compress(‘A_B vC,D|m&@’,,’A’)||_,|&@|
|D||Removes digits||compress(‘A_B B5vC,D|m&@’,,’D’)||A_BBvC,D|m&@|
|F||Removes the underscore character and English letters||compress(‘A_2B vC,D|m&@’,,’F’)||2,|&@ a5=,|&@|
|H||Removes horizontal tab||compress(‘A BvC,D|m&@’,,’H’)||ABvC,D|m&@|
|I||ignores the case of the characters to be kept or removed.||compress(‘ABCcDcd’,’cd’,’I’)||AB|
|K||Keeps the characters in the list instead of removing them.||compress(‘ABCcDcd’,’cd’,’K’)||ccd|
|L||Removes lowercase letters||compress(‘ABCcDcd’,,’L’)||ABCD|
|N||Removes digits, the underscore character, and English letters||compress(‘A_2B vC,D|m&@’,,’N’)||,|&@|
|P||Removes punctuation marks||compress(‘A_2B vC,D|m&@’,,’P’)||A2BvCDm|
|S||Removes space characters (blank, horizontal tab, vertical tab, carriage return, line feed, form feed, and NBSP (‘A0’x, or 160 decimal ASCII)||compress(‘A_2B v C,D|m&@’,,’S’)||A_2BvC,D|m&@|
|T||Trims trailing blanks from the first and second arguments.||compress(‘ abcd’,,’T’)||abcd|
|U||Removes uppercase letters||compress(‘ABvC,D|m&&@@’,,’U’)||v,|m&&@@|
|X||Removes hexadecimal characters||compress(‘ABvC,D|m&&@@’,,’X’)||v,|m&&@@|
How to use the compress function in a SAS macro?
You can use the compress function in a SAS macro by using
<a href="https://www.9to5sas.com/sas-macros/">%sysfunc</a>. An example would be if you want to keep only digits in a macro variable, you can use the below line of code.
%let fname = 'JAN2020_012020'; %let onlydigits=%sysfunc(compress(&fname,,kd)); %put &onlydigits;
Using the %CMPRES, %QCMPRES Autocall Macros
Two SAS Autocall macros can compress multiple blanks and remove leading and trailing blanks.
QCMPRES macros compress multiple blanks and remove leading and trailing blanks.
If the argument might contain a special character or mnemonic operator listed below, use %QCMPRES.
& % ' " ( ) + − * / < > = ¬ ^ ~ ; , # blank AND OR NOT EQ NE LE LT GE GT IN
CMPRES returns an unquoted result, even if the argument is quoted. QCMPRES produces a result with the following special characters and mnemonic operators masked, so the macro processor interprets them as text instead of as elements of the macro language:
%let a=15; %let b=5; %let sum=%nrstr(%eval(&a + &b)); %put QCMPRES: %qcmpres(&sum); %put CMPRES: %cmpres(&sum);
QCMPRES: %eval(&a + &b) CMPRES: 20