Working with missing values in SAS is one of the most common tasks for a SAS programmer. There are many techniques and tools associated with using missing values. knowing these techniques will help you to work more efficiently using values.
The two basic types of missing values are numeric missing values and character missing values. Character missing values are represented using a blank (‘ ‘) while numeric missing values with a dot ( . ).
For numeric variables, the dot (.) should not be quoted in an assignment statement for the variable as this will result in creating a new character variable or character to numeric conversion.
age = '.'
age = .
Numeric missing values are essentially minus infinity and are therefore smaller than any non-missing value.
If you perform any arithmetic operations on a missing value it will result in a missing value. Missing values will be ignored during calculations performed using Numeric Functions such as SUM, MEAN, etc.
Special Missing values
In addition to the dot (. ) and blank (” “), there are 28 other types of numeric missing values. These are called as special missing values and are assigned by preceding each of the 26 letters of the alphabet (a through z) as well as the underscore with a period.
Note that these special missing values are case insensitive. That is, .A=.a .B=.b .C=.c etc.
To use a special numeric missing value, you must begin with a period followed by the letter or underscore symbol.
While printing the special missing value, SAS only prints the letter. When data values contain characters in numeric fields that you want SAS to interpret as special missing values, use the MISSING statement to specify those characters.
Order of Missing Values
|A-Z||special missing values A (smallest) through Z (largest)|
Detecting Missing Values
There are several techniques and functions available in SAS to detect missing values. See the example input datasets below which has numeric, special and character missing values.
data test; input a b c d $; infile datalines truncover; datalines; 1 2 3 A . 4 . 3 4 .a c . . . 4 6 8 E .s .v .z F 5 1 .f .t 6 8 H; run;
data test2; set test; if a=. then b="Numeric Missing"; If a <=.z then c="Special Missing Value"; run;
To check for Numeric Missing values you can use the If statement as below.
if a=. then put "Missing";
To check for all 28 numeric missing values (. , ._ , .A through .Z) including numeric missing values, use the following code.
If a <=.z then PUT "Missing";
To check for character Missing values you can use the If statement as below.
if a= ' ' then put "Missing"
MISSING= System Option
MISSING= you can specify the character to print for missing numeric values. You can specify only one character that you want to replace with the default missing values in SAS. Single or double quotation marks are optional. The MISSING= system option does not apply to special missing values such as
In the below example, we have specified M instead of the default dot(.).
options missing='M; data test3; set test; run
Even though if you replace the default missing value, you can still use .(dot) and ‘ ‘ to filter or perform any operations on missing values.
Functions that handle MISSING values
MISSING: It accepts either a character or numeric variable as the argument and returns 1 if the argument contains a missing value else it returns zero. The Missing function can detect numeric, character and even special missing values.
data test2; set test; missing=missing(a); run;
NMISS: It returns the number of missing values in the specified list of numeric variables. Character values will be converted automatically if the argument value is missing.
CMISS: It counts the number of arguments that are missing in its argument list and works for both character and numeric variables without requiring character values to be converted to numeric.
data test2; set test; nmiss=nmiss(a,b,c); cmiss=cmiss(a,b,c); run;
N: This function returns the number of non-missing values in a list of numeric variables.
data test2; set test; nmiss=nmiss(a,b,c); cmiss=cmiss(a,b,c); non_missing = n(a,b,c); run;
COALESCE: This function is used to select the first non-missing value in a list of variables.
data test2; set test; nmiss=nmiss(a,b,c); cmiss=cmiss(a,b,c); non_missing = n(a,b,c); first_non_miss = coalesce(a,b,c); run;
CALL MISSING: With this function, you can explicitly initialize or set a variable value to be missing.
data test3; set test; if _n_ = 4 then call missing(a,b,c,d); run;
Missing Values in SAS procedures
PROC FREQ: In PROC FREQ, percentages are calculated excluding the missing values. If you need to include the total observations that are both missing and non-missing, use the “
/MISSING” option on the tables statement.
proc freq data= test; tables a / MISSING; run;
PROC MEANS : The PROC MEANS procedure also generates statistical data on non-missing values only. Use the
NMISS option to calculate the number of missing values.
Proc Means Data = test N NMISS; Var a -- c ; Run;
To see the number of observations having a missing value for the classification variable, use the
MISSING option in PROC MEANS.
data class; set sashelp.class; if age < 14 then call missing(age); run; Proc Means data = class N NMISS MISSING; Class Age ; Var age -- weight; Run;
Deleting Missing Values
Once you have found out the missing values u may want to remove them as a part of data cleaning tasks.
How to delete numeric values?
data test3; set test; if a = . then delete; run;
If you want to remove ALL Rows with ANY missing values, then you can use the NMISS functions as below.
data test3; set test; if nmiss(of _numeric_) > 0 then delete; run;
How to delete character values?
data test3; set test; if d= '' then delete; run;
To delete all character values you can use the below codes.
if cmiss(of _character_) > 0 then delete;
To delete all character and numeric values you can use the below codes.
if cmiss(of _all_) > 0 then delete;