Working with missing values in SAS is one of the most common tasks for a SAS programmer. There are many techniques and tools associated with using missing values. Knowing these techniques will help you work more efficiently using values.
The two fundamental missing value types are numeric and character missing values. A character missing values are represented using a blank (” “) while missing numeric values are represented with a dot ( . ).
Note: For numeric variables, the dot (.) should not be quoted in an assignment statement for the variable. This will create a new character variable or character to numeric conversion.
/*Character variable*/ age='.' /*Numeric missing value*/ age= .;
Numeric missing values are essentially minus infinity and are smaller than any non-missing value.
If you perform any arithmetic operations on a missing value, it will result in a missing value. However, missing values will be ignored during calculations performed using numeric functions such as SUM, MEAN, etc.
Special Missing Values in SAS
A period, or dot, is commonly used to represent a missing value for a numerical variable. Apart from the dot, SAS can store 27 special missing values in numerical variables.
They are the dot-underscore (._), and dot-letter (.A through.Z). Note that these special values are case insensitive. That is, .A=.a .B=.b .C=.c etc.
SAS identifies it as a variable name if you do not begin a special numeric missing value with a period.
Therefore, to use a special numeric missing value in a SAS expression or assignment statement, you must begin the value with a period, followed by the letter or underscore, as in the following example:
While printing the special missing value, SAS only prints the letter. When data values contain characters in numeric fields that you want SAS to interpret as special missing values, use the MISSING statement to specify those characters.
Order of Missing Values
The numeric missing value dot (.) is sorted first, followed by an underscore (_) then the special numeric missing value .A, and then the special missing value .Z.
|A-Z||special missing values A (smallest) through Z (largest)|
Detecting Missing Values
There are several techniques and functions available in SAS to detect missing values. See the example input datasets below which have numeric, special and character missing values.
data test; input a b c d $; infile datalines truncover; datalines; 1 2 3 A . 4 . 3 4 .a c . . . 4 6 8 E .s .v .z F 5 1 .f ; run;
To check for Numeric Missing values, you can use the If statement below.
if a=. then put "Missing";
To check for all 28 numeric missing values (. , ._ , .A through .Z) including numeric missing values, use the following code.
If a <=.z then PUT "Missing";
To check for character Missing values you can use the If statement as below.
if a= ' ' then put "Missing"
MISSING= System Option
With the MISSING= you can specify the character to print for missing numeric values. You can specify only one character you want to replace with the default missing values in SAS. Single or double quotation marks are optional. The MISSING= system option does not apply to special missing values such as .A and .Z
In the below example, we have specified M instead of the default dot(.).
options missing='M; data test3; set test; run
Even if you replace the default missing value, you can still use .(dot) and’ ‘to filter or perform any operations on missing values.
Functions that handle MISSING values
It accepts either a character or numeric variable as the argument and returns 1 if the argument contains a missing value; else, it returns zero. The Missing function can detect numeric, character, and even special missing values.
data test2; set test; missing=missing(a); run;
NMISS Function in SAS
data test2; set test; nmiss=nmiss(a, b, c); run;
CMISS Function in SAS
data test2; set test; nmiss=nmiss(a,b,c); cmiss=cmiss(a,b,c); run;
This function returns the number of non-missing values in a list of numeric variables.
data test2; set test; nmiss=nmiss(a,b,c); cmiss=cmiss(a,b,c); non_missing = n(a,b,c); run;
This function selects the first non-missing value in a list of variables.
data test2; set test; nmiss=nmiss(a,b,c); cmiss=cmiss(a,b,c); non_missing = n(a,b,c); first_non_miss = coalesce(a,b,c); run;
With this function, you can explicitly initialize or set a variable value to be missing.
data test3; set test; if _n_ = 4 then call missing(a,b,c,d); run;
Missing values in SAS procedures
PROC FREQ: In PROC FREQ, percentages are calculated, excluding the missing values. If you need to include the total observations that are both missing and non-missing, use the “/MISSING” option on the tables statement.
proc freq data= test; tables a / MISSING; run;
PROC MEANS : The PROC MEANS procedure only generates statistical data on non-missing values. Use the NMISS option to calculate the number of missing values.
Proc Means Data = test N NMISS; Var a -- c ; Run;
Use the MISSING option in PROC MEANS to see the number of observations having a missing value for the classification variable.
data class; set sashelp.class; if age < 14 then call missing(age); run; Proc Means data = class N NMISS MISSING; Class Age ; Var age -- weight; Run;
Deleting Missing Values
Once you have found the missing values, you may want to remove them as a part of your data cleaning tasks.
How to delete numeric values?
data test3; set test; if a = . then delete; run;
If you want to remove ALL rows with ANY missing values, you can use the NMISS functions below.
data test3; set test; if nmiss(of _numeric_) > 0 then delete; run;
How to delete character values?
data test3; set test; if d= '' then delete; run;
To delete all character values, you can use the below codes.
if cmiss(of _character_) > 0 then delete;
To delete all character and numeric values, you can use the below codes.
if cmiss(of _all_) > 0 then delete;