Modifying SAS Dataset using SAS Modify Statement

Modifying SAS Dataset using SAS Modify Statement

The SAS Modify Statement allows the modification of an existing SAS dataset via a DATA Step without creating a new or replacement dataset. It is faster and more straightforward than alternative SQL approaches when appropriately used.

When you submit a data step with a merge, update, or set statement, and if the output data set already exists, SAS creates a second copy of the output data set. Once execution is complete, SAS deletes the original copy of the data set. As a result, the original data set is replaced by a new dataset with the same name. 

The new data set can contain different variables than the original data set. In addition, the attributes of the variables in the new data set can be different from those in the original data set.

When a data step is used in conjunction with a SAS modify statement, the input and output data sets must be identical. SAS does not create a second copy of the data but updates the data set in place. You can add new variables to the program data vector (PDV), but they are not written to the dataset. 

Therefore, the set of variables in the data set does not change when the data is modified.

When you use the SAS MODIFY statement, there is an implied REPLACE statement at the bottom of the DATA step instead of an OUTPUT statement. 

Using the MODIFY statement, you can update the following: 

  • Every observation in a data set 
  • Observations using a transaction data set and a BY statement
  • Observations located using an index

SAS Modify Statement to Modify All Observations in a SAS Data Set

When all of the observations in a SAS data set need to be changed in the same way, you can use the MODIFY statement and specify the change with an assignment statement.

This example replaces the value of the variable JOBCODE for all observations in payroll2 dataset.

data payroll2;
   modify payroll2;
   jobcode="SCP";
run;

NOTE: The data set “WORK.PAYROLL2” has been updated. There were 12 observations rewritten, 0 observations added and 0 observations deleted.

The following output shows the data before the MODIFY statement.

Modifying SAS Dataset using SAS Modify Statement

The following output shows the data after the MODIFY statement. You can see that the values in the jobcode have been updated.

SAS Modify Statement to Modify All Observations in a SAS Data Set
SAS Modify Statement to Modify All Observations in a SAS Data Set

SAS Modify Statement to Modify Observations Using a Transaction Data Set

While you can use a MODIFY statement to update all observations in a data set, there are situations when only selected observations must be updated. In this situation, you can update a master SAS dataset with values from a transaction data set by using the MODIFY statement in conjunction with a BY statement.

data payroll_update;
	input IdNum $ Jobcode $;
	datalines;
1639 SCP
1221 SCP
1350 PT1
;
run;

data payroll2;
	modify payroll2 payroll_update;
	by IdNum;
run;
SAS Modify Statement to Modify Observations Using a Transaction Data Set

The BY statement matches observations from the transaction data set to those from the master data set.

When the MODIFY statement reads an observation from the transaction data set, it uses dynamic WHERE processing (SAS generates a WHERE statement internally) to find the matching observation in the master data set.

The matching observation in the master data set can be updated, deleted, or appended. The observation is automatically replaced by default.

Note: It is important to note that neither data set requires sorting because the MODIFY statement employs WHERE processing to find matched observations. However, in the case of large files, it is preferable to have both the master and transaction data sets sorted or indexed.

What happens when a match is not found in the master dataset?

data payroll_update;
	input IdNum $ Jobcode $;
	datalines;
1639 SCP
1221 SCP
1350 PT1
1499 PT1
;
run;

data payroll2;
	modify payroll2 payroll_update;
	by IdNum;
run;
ERROR: The TRANSACTION data set observation does not exist on the MASTER data set.
ERROR: No matching observation was found in MASTER data set.
idnum=1499 gender= jobcode=PT1 salary=. birth=. hired=. FIRST.idnum=1 LAST.idnum=1 _ERROR_=1 _IORC_=1230013 _N_=4
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 1 observations read from the data set WORK.PAYROLL2.
NOTE: The data set WORK.PAYROLL2 has been updated. There were 3 observations rewritten, 0 observations added and 0 observations
deleted.
NOTE: There were 4 observations read from the data set WORK.PAYROLL_UPDATE.
data payroll_update;
	input IdNum $ Jobcode $;
	datalines;
1639 SCP
1221 SCP
1350 PT1
1499 PT1
;
run;

data payroll2;
	modify payroll2 payroll_update;
	by IdNum;
	if _iorc_ = 0 then replace;
run;
idnum=1499 gender= jobcode=PT1 salary=. birth=. hired=. FIRST.idnum=1 LAST.idnum=1 _ERROR_=1 _IORC_=1230013 _N_=4
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 1 observations read from the data set WORK.PAYROLL2.
NOTE: The data set WORK.PAYROLL2 has been updated. There were 3 observations rewritten, 0 observations added and 0 observations deleted.
NOTE: There were 4 observations read from the data set WORK.PAYROLL_UPDATE.

Handling Duplicate Values

When you use the SAS MODIFY statement and BY statements to update a data set, WHERE processing begins at the top of the master data set, finds the first match, and updates it. It would help if you also understood the implications of having duplicate values in the master or transaction data sets.

When duplicate values of the BY variable exist in the master dataset,

If the master data set has duplicate values for the BY variable, only the first observation in the group of duplicate values is updated.

SAS Modify Statement

When duplicate values of the BY variable exist in the transaction dataset

Assume the transaction dataset contains duplicate BY variable values. In that case, the transactions overwrite, with only the most recent transaction in the group being updated in the master dataset.

Modifying SAS Dataset using SAS Modify Statement

Handling Missing Values

If the transaction data set contains missing values, SAS does not replace them with missing values in the master data set unless they are special missing values.

Note: A special missing value is a form of missing numeric value that enables you to specify several types of missing data using the alphabetic characters A-Z or an underscore. The MISSING statement in the DATA step is used to specify special missing values. Additional information is available in SAS Missing Values: Everything You Need to Know.

The UPDATEMODE= option in the MODIFY statement allows you to specify how missing values in the transaction dataset are handled.

MODIFY master-data-set transaction-data-set
UPDATEMODE=MISSINGCHECK | NOMISSINGCHECK;

The following provides an explanation of the syntax:

Master-data-set: It is the SAS data set’s name that you wish to modify.

Transaction-data-set: The SAS data set name contains the modified values.

MISSINGCHECK: It prevents missing values in the transaction data set that would be used to replace values in the master dataset unless the missing values are special. The default value is MISSINGCHECK.

NOMISSINGCHECK: It replaces missing values in the transaction data set with values from the master data set. Special missing values in the transaction data set continue to override the values in the master data set.

If you want the resulting value in the master data set to be a regular missing value, use a single underscore (_) in the transaction data set to represent missing values. In the master data set, the resulting value is a period (.) for missing numeric values and a blank for missing character values.

data payroll_update;
	input IdNum $ gender $ Jobcode $ Salary Birth hired;
	informat birth hired date7.;
	format birth hired date7.;
	datalines;
1639 M TA3 . 13DEC55 17NOV93
1221 M TA2 . 12SEP66 04JUN87
1350 F FA3 33000 31AUG55 29JUL91
;
run;
proc sort data=payroll_update;
	by idnum;
run;
data payroll2;
	modify payroll2 payroll_update updatemode=nomissingcheck;
	by idnum;
run;

Modifying Observations Located by Observation Number

The POINT= option in the Modify statement allows you to name a variable from another data source whose value is the number of the observation in the master data set that you want to modify.

The SAS MODIFY Statement retrieves observations from the data set that you are modifying using the values of the POINT= variable.

It is good programming practice to validate the value of the POINT= variable and to check the status of the automatic variable ERROR.

When using POINT=, be cautious since failure to terminate the DATA step might lead the DATA step to enter a continuous loop. Use a STOP statement to check for an invalid POINT= variable value.

This example reads the data set payroll_update, determines which observation number in paylist2 to update based on the value of OBS, and performs the update. This example explicitly specifies the update activity by using an assignment statement to replace the value of jobcode with the value of newjobcode.

data payroll_update;
	input obs newJobcode $;
	datalines;
1 SCP
2 SCP
3 PT1
;
run;

data paylist2;
	set payroll_update;
	modify paylist2 point=obs nobs=max_obs;

	if _error_=1 then
		do;
			put 'ERROR occurred for TOOL_OBS=' obs / 'during DATA step iteration' _n_ / 
				'OBS value might be out of range.';
			_error_=0;
			stop;
		end;
	jobcode=newJobcode;
run;

Below is the Note in the SAS log, which confirms that 3 observations in the master dataset has been updated.

NOTE: The data set WORK.PAYLIST2 has been updated. There were 3 observations rewritten, 0 observations added and 0 observations deleted.

Modifying Observations Located by Observation Number
Modifying Observations Located by Observation Number

Modifying Observations Located by an Index

You have learned to use a BY statement to access values that you want to update in a master data set by matching. When you have an indexed master data set, you can use the index to access the observations that you want to update directly. To do this, you use the following statements:

  • Using the SAS MODIFY statement with the KEY= option to name an index to identify the observations for updating.
  • A SET or INPUT statement that reads a transaction data set containing variables with similar names and values specified in the index.

Updating with an index is different from updating using a BY statement. When you use the MODIFY statement with the KEY= option to name an index, the following must occur:

  • You must explicitly specify the update that you want to happen.
  • Each observation in the transaction data set must have a matching observation in the master data set. 
proc datasets library=work;
	modify payroll2;
	index create idnum / unique;
	run;

data payroll_update;
	input IdNum $ newJobcode $;
	datalines;
1639 SCP
1221 SCP
1350 PT1
;
run;

data payroll2;
	set payroll_update;
	modify payroll2 key=idnum;
	jobcode=newJobcode;
run;
results of Updating the Jobcode Field By Using an Index
Results of Updating the Jobcode Fields By Using an Index

Controlling the Update Process

When the DATA step contains a SAS MODIFY statement, SAS writes the current observation to its original place in the SAS dataset. The REPLACE statement at the bottom of the DATA step implicitly performs this action.

Use the OUTPUT, REPLACE or REMOVE statement to override this behaviour.

OUTPUT adds the current observation to the end of the dataset.
REPLACE writes the current observation to the exact location in the data set.
REMOVE removes the current observation from the data set.

Example:

If the SAS data set transaction Donors has a variable named type that has values of AB, and representing blood group types, you can submit the following program to do one of the following:

  • delete rows where the value of type is B
  • update rows where the value of type is A
  • Append rows where the value of type is O.
data donors_update;
	input id units_update;
	datalines;
1129 48	
1129 50	
1129 57	
2304 16
2486 63	
;
run;

proc datasets library=work;
	modify donors;
	index create id;
	run;

data donors;
	set donors_update;
	modify donors key=id;
	units=units_update+5;

	if type='B' then
		remove;
	else if type='A' then
		replace;
	else if type='O' then
		output;
RUN;

NOTE: The data set WORK.DONORS has been updated. There were 3 observations rewritten, 1 observation added and 1 observation deleted.

Modifying SAS Dataset using SAS Modify Statement
Donors Dataset before Update
Modifying SAS Dataset using SAS Modify Statement
Donors Dataset after Update

Using IORC with %SYSRC

When you use the SAS MODIFY statement with a BY statement or KEY= option to update a data set, error checking is essential. Error checking enables you to perform updates or not, depending on the outcome of the I/O condition.

The automatic variable IORC (Input Output Return Code) is created when you use the MODIFY statement with the BY statement or KEY= option. The value of IORC is a numeric return code that indicates the status of the most recently executed I/O operation.

SYSRC autocall macro and the IORC automatic variable to control I/O condition. This technique helps prevent unexpected results that could go undetected.

When you use %SYSRC, you can check the value of IORC by specifying one of the mnemonics listed below.

Mnemonic Meaning
_DSENMR The observation in the transaction data set does not exist in the master data set (used only with the MODIFY and BY statements).
_DSEMTR Multiple transaction data set observations do not exist in the master data set (used only with the MODIFY and BY statements).
_DSENOM No matching observation (used with the KEY= option).
_SOK The observation was located.

Example

Suppose you use the MODIFY statement with the KEY= option to update a SAS data set. The observation is updated when IORC has the value SOK in the program below. However, when _IORC has the value DSENOM, no matching observation is found, so the observation is appended to the data set by the OUTPUT statement, and _ERROR is reset to 0 in the do group.

data donors_update;
	input id units_update;
	datalines;
1129 52	
2904 24	
;
run;

proc datasets library=work;
	modify donors;
	index create id;
	run;

data donors;
	set donors_update;
	modify donors key=id;

	if _iorc_=%sysrc(_sok) then
		do;
			units=units_update;
			replace;
		end;
	else if _iorc_=%sysrc(_dsenom) then
		do;
			units=units_update;
			output;
			_error_=0;
		end;
run;

NOTE: The data set WORK.DONORS has been updated. There were 2 observations rewritten, 1 observation added and 0 observation deleted.

Controlling the Update Process

The difference in UPDATE and MODIFY statements in SAS

When we MODIFY a SAS data set, we replace observations or parts of them in an existing data set. Modifying a SAS data set is similar to updating a SAS data set, but the following differences exist:

  • Modifying cannot create a new data set, while updating can. Though it is not a limitation, it results in less disk space being used.
  • Unlike updating, modifying does not require that the master data set or the transaction data set be sorted.
  • Modify statement does not give an error if duplicate BY values exists in the master or transaction dataset, Unlike Merge and UPDATE.
  • Modify statement preserves the integrity constraint, but update statement would not keep. So using an update statement, you lose the integrity constraint, and the base data set is overwritten by the other data set.

You can refer to this  SAS website, which summarizes the differences between MERGE, MODIFY and UPDATE

Summary:

Update and Modify have a lot of things in common but have distinct differences. We hope this post will help you get the hang of the Modify Statement.

The following rules always apply when duplicate values exist:

  • If there are multiple observations with the same value for the BY variable in the master data set, MODIFY changes only the values for the first occurrence. For UPDATE statement is that processing does not continue if duplicate values of the BY variable are detected in the master data set.
  • Suppose there are multiple observations with the same value for the BY variable in the transaction data set. In that case, MODIFY performs operations consecutively, in a manner simi1ar to that of the UPDATE statement.

Download the SAS code used in this post from here.

Thanks for reading!

Do you have any tips to add? Let us know in the comments?

Please subscribe to our mailing list for weekly updates. You can also find us on Instagram and Facebook.

Every week we'll send you SAS tips and in-depth tutorials

JOIN OUR COMMUNITY OF SAS Programmers!

Subhro

Subhro provides valuable and informative content on SAS, offering a comprehensive understanding of SAS concepts. We have been creating SAS tutorials since 2019, and 9to5sas has become one of the leading free SAS resources available on the internet.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.