Scan function sas

How to use the SAS SCAN Function?

  • Post author:
  • Post category:Base SAS
  • Post comments:0 Comments
  • Reading time:25 mins read

This article thoroughly explores the significant features of the SCAN function in SAS, covering its syntax, application, an array of modifiers, and important factors to keep in mind. It intricately delves into concrete examples such as identifying the nth or penultimate word within a string and parsing elongated character strings with the assistance of DO LOOPS.

By providing numerous code examples, this guide articulates the diverse scenarios of the SCAN function’s effective application. This guide aims to enhance your proficiency in utilizing the SCAN function.

The SCAN function within SAS is a potent tool, purposefully engineered for breaking a string down into distinct words. This function allows users to set their delimiters, which are then used to separate words or characters within a string. Some common examples of delimiters include spaces, commas, or predefined character sets.

The operation of the SCAN function can be succinctly explained as a left-to-right dissection of a string, enabling the retrieval of a specific word within the string. Bear in mind, that the SCAN function is only one amongst an array of SAS functions designed to streamline the control and alteration of string data, distinguishing its properties to prevent confusion in subsequent content.

The SCAN and SCANQ functions are tools primarily used to break down text into individual words. To clarify, SCAN employs a certain default delimiter to separate words differently from SCANQ, which boasts added functionalities to enhance its use. A case to point would be the differing delimiters used in the two functions.

Principally, the variance between the default delimiters in SCAN and SCANQ does hold significance. How so, you may ask. The answer lies in the fact that this discrepancy directly impacts the respective functionality and results of these tools.

For your understanding, let’s shed some light on SCANQ’s “added functionalities”. These added features truly elevate its use, setting it apart from SCAN. But remember, understanding where, when, and how to use the right tool hinges on comprehending the significance of these very distinctions.

Therefore, let’s strive to simplify complex ideas and communicate most straightforwardly and contextually possible.

SAS SCAN Function

In the context of SAS programming, the SCAN function plays a vital role in identifying and extracting specific words from a character string.

This extraction is achieved through the use of specified delimiters which break the string into words. Here, a character string is a set of characters that forms the data, while specified delimiters are the unique symbols or spaces that separate each word in the string.

To better illustrate, let’s consider an example where we have a sentence, and the goal is to extract a certain word.

The SCAN function will use the delimiters to break the sentence into words and extract the required word.

Keep reading as we delve into the various aspects of the SCAN function, its modifiers, usage examples, and important points to remember.

An important feature of the SCAN function in SAS is that the default length of the returned variables is set to 200 characters. This is done to ensure adequate space for most string components. If you’re working with shorter or longer string components, the length can be adjusted according to your specific needs.

For further details on how the length and precision impact on SAS variables, take a look here.

Syntax of the SCAN Function:

SCAN(character-value, n-word <,'delimiter-list'>,<modifiers>)
  • The n-word is the nth “word” in the string.
  • An ‘n’ value greater than the number of words returns a value with no characters.
  • For negative ‘n’ values, the character value is scanned from right to left, and a value of zero is invalid.

List of Modifiers for the SCAN Function

Modifications have the potential to alter the workings of the SCAN function, and any modifier applied will influence the result.

Have a look at the following table which showcases the modifiers you can use with the SCAN function. This table includes a set number of unique modifiers, each defined for your convenience.

After discussing these modifiers, we will explore their practical applications in subsequent sections.

Modifiers Description
a adds alphabetic characters to the list of characters.
b scans backward from right to left instead of left to right, regardless of the sign of the count argument.
c adds control characters to the list of characters.
d adds digits to the list of characters.
f adds an underscore and English letters to the list of characters.
g adds graphic characters to the list of characters. Graphic characters are characters that, when printed, produce an image on paper.
h adds a horizontal tab to the list of characters.
i ignores the case of the characters.
k causes all characters not in the list of characters to be treated as delimiters.
l adds lowercase letters to the list of characters.
m Multiple consecutive delimiters can be specified with the m modifier, and delimiters at the start or end of the string argument refer to zero-length words.
n adds digits, an underscore, and English letters to the list of characters.
o processes the charlist and modifier arguments only once, rather than every time the SCAN function is called.
p adds punctuation marks to the list of characters.
q q modifier is used when you want to ignore delimiters inside substrings enclosed in quotation marks.
r removes leading and trailing blanks from the word that SCAN returns.
s adds space characters to the list of characters (blank, horizontal tab, vertical tab, carriage return, line feed, and form feed).
t trims trailing blanks from the string and charlist arguments.
u adds uppercase letters to the list of characters.
w adds printable (writable) characters to the list of characters.
x adds hexadecimal characters to the list of characters.
List of Modifiers for the SCAN Function

In the context of the SCAN function, a ‘modifier’ refers to a parameter that can change how the function behaves.

To use these modifiers, it is crucial to place them as the fourth element in the function. On the other hand, the third element in the function is reserved for the delimiter, which determines the character that separates your data.

But remember, SAS won’t mistake your modifier for the delimiter! So, if the delimiter is ‘,’ and your modifier is ‘K’, your function would look like this: SCAN(text, count, ‘,’, ‘K’). So, keep this in mind next time you’re using the SCAN function in SAS. It will not treat your modifier as the delimiter, regardless of how you arrange it.

Exploring the Scan Function: How to Identify the nth Word – a Left to Right Approach.

Commencing this examination, we cast our focus onto the SCAN function and the pivotal process of identifying the nth word within a string, counting from left to right.

The concept of the ‘nth word’ is integral to understanding the SCAN function’s modus operandi: based on the value of ‘n’ entered, the function transverses the string from left to right, returning the ‘n’ numbered word as the output.

Let’s think of a string as a row of standing dominoes; with the SCAN function, we are counting them from left to right. For example, in the string “The cat sat on the mat”, if we input ‘3’ as ‘n’ in the SCAN function, the output would be “sat” – the third word from the left.

The beauty of the SCAN function lies in its dynamic nature – changing the input for ‘n’ will yield different words, offering flexible control over the function’s output. Exploring this further, employing ‘2’ in place of ‘3’ in our previous example would output “cat” instead of “sat”.

data _null_;
text = "Kenny Green flies brown kites";
third_word = scan(text,3);
put third_word=;
run;

Output:

fifth_word=flies

Before the code, let’s understand the task at hand. The ‘scan’ function is used here to extract the third word from the string “Kenny Green flies brown kites”.

Each line of code performs a specific function, generously commented on for explanation. On running this code, the output will print the third word in the sentence.

To conclude, the SCAN function operates as a word-by-word scanner, offering a streamlined way to identify and return the nth word of a string when counting from left to right. Remember, the power of this function rests largely on the value of ‘n’, which guides the scanning process.

Using the Scan function to find the Second Last Word – Right to Left Scan

In the SAS SCAN function, we can utilize a negative count to select words from right to left within a string. Specifically, a count of -2 will select the second to last word in our string.

To elaborate, let’s consider our TEXT variable. To extract the word “brown”, we use a count of -2. A simple example of this is demonstrated below:

This code, when executed, will select and display the word “brown” from our TEXT variable. The output, therefore, will be the specific word ‘brown’, being the second to last word in our string. This practical application aims to connect the theory of the SAS SCAN function with its output.

data _null_;
text = "Kenny Green flies brown kites";
second_last_word = scan(text,-2);
put second_last_word=;
run;

Output:

second_last_word=brown

The SCAN function in SAS can be used in various ways, particularly when it comes to reading character strings. One method you can utilize is reading directly from right to left. This becomes particularly useful if you’re attempting to extract the last word from a string.

To achieve this, the SCAN function can be directed to read from the end with the use of negative values. To illustrate, here is a simple code example: SCAN(text,-1). This approach simplifies the extraction of significant data points from strings. Remember to specify the direction of the scan clearly in your code for maximum clarity.

Alternatively, you can use the “b” modifier available with the SCAN function rather than using a negative count. By specifying the “b” argument with the SCAN function, you can tell SAS to read from right to left instead of the default left to right.

data _null_;
    text="Kenny Green flies brown kites";
    second_last_word=scan(text,2," ","b");
    put second_last_word=;
run;

Points to remember while using the SCAN Function:

  • If the length is not defined previously, it defaults to 200 bytes for the created variable.
  • missing value is returned if fewer than n words are in the string.
  • delimiters at the beginning or end of the string argument are ignored
  • Two or more delimiters that are contiguous are handled as a single delimiter.
  • When the SCAN function is used, Any character or set of characters can serve as delimiters.
  • If n is negative, SCAN selects the word in the character string starting from the string’s end.

Handling Different Word Delimiters in the SCAN Function

The default word delimiter for the SCAN function is the space, but the SCAN function still works even with commas as the delimiter.

data _null_;
text = "Kenny,Green,flies,brown,kites";
third_word = scan(text,3);
put third_word=;
run;

Output:

third_word=flies

The third argument of the SCAN function is designed for specifying a unique delimiter. This comes into play when your data set includes a separator between words that aren’t included in the default list of delimiters.

To illustrate this, if your data contains a unique character such as ‘#’, it can be set as the custom delimiter utilizing the third argument. Remember, it’s crucial to surround your custom delimiter with quotes in the SCAN function for it to function correctly.

In the section dealing with different word delimiters in the SCAN function of character strings, we consider the case of using a plus sign (+). When adopting the plus sign (+) as your delimiter, it’s important to enclose it in quotations, serving as the third argument to the SCAN function.

In practical terms, it looks something like this: scan(character_string, 1, “+”); where the plus sign (+) is passed as the third argument. Making this adjustment ensures an accurate interpretation of your string by the SCAN function.

One of the key functions in SAS (Statistical Analysis System) is SCAN which uses special delimiters. The SCAN function separates a string into several parts based on specified delimiters. Below, we have a range of special characters often utilized as delimiters

blank ! $ % & ( ) * + , – . / ; < ^ :

You can use the third argument with the SCAN function to specify a custom delimiter when your data contains a delimiter between words that aren’t listed in the default list.

For example, if the words in your character string have a plus sign (+), you must enclose the plus sign in quotations as the third argument to the scan function.

The syntax below demonstrates how to select the fifth word from a plus sign delimited character string:

data _null_;
text = "Kenny+Green+flies+brown+kites";
fifth_word = scan(text,5,"+");
put fifth_word=;
run;

Output:

fifth_word=kites

How to handle multiple Delimeters in the SCAN Function?

SAS will use not just one but all delimiters in the default list by default. Therefore, the result might not be expected when your data contains multiple delimiters.

So, you may also want to force SAS to use only one of the default delimiters in some cases. In the below dataset, the names variable contains a list of first, last, and middle names.

Multiple Delimeters in the SCAN Function
Multiple Delimeters in the SCAN Function

The structure is as follows: <last name><blank><middlename><comma><firstname>. You would like to create a first name from this data.

Because commas and spaces are default delimiters, if we run the SCAN function without specifying a delimiter in firstname1, SAS will use “Slyke” as the first word.

To correct this, we can tell SAS only to use the comma as a delimiter so that “Van Slyke” will become the last name and Andy will be the given name:

data one;
    input names $25.;
    firstname1=scan(names,2);
    firstname=scan(names,2,",");
    datalines;
    Van Slyke, Andy 
    Thomas,Andres
    Robidoux, Billy Jo
    Mr. Bruce,Brenly
    Bob,Horner
    ;
run;

Output:

Multiple Delimeters in the SCAN Function
Multiple Delimeters in the SCAN Function

We receive the desired result in our output data with “Andy” in the firstname variable now that the blanks are no longer regarded as delimiters and just the commas are.

Using SCAN with DO LOOPS to Parse Long Character Strings

When combined with a simple DO LOOP and SAS, the SCAN function makes it easy to parse each word from a character string into separate variables.

In the below example dataset, you would like to parse out each word from the letters variable into five separate variables.

SCAN with DO LOOPS to Parse Long Character Strings
SCAN with DO LOOPS to Parse Long Character Strings
data letter;
    input letter $50.;
    call symputx('count', count(letter, ",") + 1);
    datalines;
A,B,C,D,E,F,G
;
run;

A macro variable is created within the data step to hold the count of letters. This macro variable is later used to define the array elements dynamically.

For more details, see our guide on Creating macro variables from the SAS dataset and Essential guide on using Arrays in SAS

The code below uses a DO LOOP to scan the letters variable and then create the variables letter1 to letter7.

data one;
    set letter;
    array letters[&count] $15 letter1-letter&count;
    do i=1 to dim(letters);
        letters[i]=scan(letter, i, ", ");
    end;
    drop i;
run;

Output:

SCAN with DO LOOPS to Parse Long Character Strings
SCAN with DO LOOPS to Parse Long Character Strings

As you can see in the output data shown partially below, we now have five new MODEL variables, with one word per variable:

Separating the comma-separated list horizontally using the SCAN Function

Sometimes you may need to create separate rows from one value in a column. For example, consider a single variable holding comma-separated email ids.

In this scenario, you can use the SCAN function, where each email ID can be separated as a single row.

data email_list;
	infile datalines truncover;
	input emails $100.;
	datalines;
[email protected],[email protected],[email protected],[email protected],[email protected]
;
run;
Comma sepearted email iDS
Comma Separate email IDs
data email_list2(keep=new_emails);
	set email_list;

	do i=1 by 1 while (scan(emails, i, ',') ^=' ');
		new_emails=scan(emails, i, ',');
		output;
	End;
run;

Output:

Separating the comma-separated list horizontally using SCAN Function
Separating the comma-separated list horizontally using the SCAN Function

SCANQ Function

SCANQ is also used to extract a specified word from a character expression like the SCAN function.

Syntax:

SCANQ(character-value, n-word <,'delimiter-list'>)

Differences between SCAN and SCANQ

Below are some of the differences between SCAN and SCANQ

  1. Default Delimeter set – If you don’t specify a delimiter, SCANQ will utilize white space characters as default delimiters (blank, horizontal and vertical tab, carriage return, line feed, and form feed).
  2. A value of 0 for the word count does not result in an error message. when you use the SCANQ
  3. SCANQ also ignores delimiters enclosed in quotation marks.

Example:

data _null_;
    text="Kenny,'Green,flies',brown,kites";
    third_word1=scan(text, 3);
    third_word2=scan(text, 3,",","q");
    third_word3=scanq(text, 3,",");
    put third_word1=third_word2=third_word3=;
run;

Output:

third_word1=flies' third_word2=brown third_word3=brown

If the delimiter is a comma, you have to use the delimiter argument in the SCANQ function.

The q argument can be used with the SCAN function to ignore the delimiter inside the quotation mark.

Difference between SCAN and SUBSTR Function

SCAN is used to extract words from a list of words that are separated by delimiters. SUBSTR is used to extract a part of the word by specifying a starting location and the part’s length.

When to use the SCAN function?

Use the scan function when you know the order of the words in the character value.
When you know that the starting position of the word varies.
Some delimiter separates the words.

So, this was our guide on the SCAN function. We hope that you have found it helpful. Do you have any tips to add? Let us know in the comments.

Thanks for reading!

Every week we'll send you SAS tips and in-depth tutorials

JOIN OUR COMMUNITY OF SAS Programmers!

Subhro

Subhro provides valuable and informative content on SAS, offering a comprehensive understanding of SAS concepts. We have been creating SAS tutorials since 2019, and 9to5sas has become one of the leading free SAS resources available on the internet.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.