Skip to content
9to5sas
  • Index
  • Glossary
Facebook Twitter Instagram Email RSS

  • Start Here
  • Base SAS
  • Advanced SASExpand
    • SAS Macros
    • PROC SQL
  • SAS/STATSExpand
    • SAS Analytics
    • Statistics
  • SAS Programs
9to5sas

9to5sas » Base SAS » Proc Sort Options in SAS

Proc Sort Options in SAS

BySubhro Posted onJanuary 26, 2020January 13, 2023 Last Updated:January 13, 2023
0 Comments

Sorting the data is always a resource-intensive operation. Therefore, using PROC SORT efficiently can save you both time and computing resources.
There are a number of options associated with PROC SORT that can be used not only to control the performance and capabilities of the procedure but also to the resulting data set.

Proc Sort Options in SAS

Sorting the data is always a resource-intensive operation. Therefore, using the PROC SORT procedure efficiently can save you both time and computing resources.

There are a number of options associated with PROC SORT that can be used to control the performance and capabilities of the procedure as well as the resulting data set.

Page Contents

  • The NODUPREC Option in Proc Sort
  • NODUPKEY and DUPOUT options
  • NOUNIQUEKEY and UNIQUEOUT options
  • OVERWRITE
  • PRESORTED
  • Maintaining the Relative Order of Observations in Each BY Group
    • EQUALS Option
    • NOEQUALS Option
  • Handling of observations with duplicate keys
  • SORTSEQ

The NODUPREC Option in Proc Sort

The NODUPREC option (as well as its aliases NODUPLICATES and NODUP) is used far too often.

You might think that using this option will remove all duplicate observations, and although this is what it nominally is supposed to do, it does not necessarily cause PROC SORT to remove all duplicate observations.

In fact, it only removes duplicate observations that are adjacent after sorting.

When the sorting process results in a data set in which duplicate observations are not next to each other (they do not come one after the other sequentially), they will not be detected and the duplicate observation(s) will not be removed.

data SortExample;
input Subject visit labdt:mmddyy8. sodium chloride;
format labdt mmddyy8.;
datalines;
200 1 07/06/2006 140 103
200 2 07/13/2006 144 106
200 1 07/06/2006 140 103
200 4 07/13/2006 140 103
200 4 07/13/2006 140 103
200 5 07/07/2006 142 104
;
run;
Proc Sort
Input dataset
proc sort data=SortExample out=sorted noduprecs;
by subject;
run;
Proc Sort

The row Observation number 4 is removed, and a NOTE is added to the log. Observation numbers 1 and 3 are also duplicates but are ignored by the option.

NOTE: There were 6 observations read from the data set WORK.SORTEXAMPLE.
NOTE: 1 duplicate observations were deleted.
NOTE: The data set WORK.SORTED has 5 observations and 5 variables.

When key fields in the BY statement are sufficient to form a primary key that can uniquely identify the observations within a BY group, it will make the NODUPREC option work as we would hope that it would.

In the last example, we can get rid of the duplicate record by adding the BY statement to VISIT, LABDT, and SUBJECT.

proc sort data=SortExample out=sorted noduprecs;
by subject visit labdt;
run;

You may also specify the _all_ option, which will remove the duplicate rows.

NOTE: There were 6 observations read from the data set WORK.SORTEXAMPLE.
NOTE: 2 duplicate observations were deleted.
NOTE: The data set WORK.SORTED has 4 observations and 5 variables.

NODUPKEY and DUPOUT options

The NODUPKEY option checks and eliminates observations with duplicate BY values keeping only the first occurrence in the BY group.

proc sort data=SortExample nodupkey;
by visit;
run;
Proc Sort

Note that, all the duplicate records within the same BY group (Visit) have been removed.

When the NODUPREC or the NODUPKEY options are used, the LOG will note when observations are removed.

However, which observations were removed will not be written in the LOG.

If you want to be able to see these observations, the <strong>DUPOUT</strong>= option can be used to save the duplicate observations into a separate data set

proc sort data=SortExample out=sorted nodupkey dupout=dup1;
by visit;
run;

NOUNIQUEKEY and UNIQUEOUT options

The NOUNIQUEKEY option checks and eliminates observations from the output data set that has a unique sort key.

A sort key is unique when the observation containing the key is the only observation within a BY group.

proc sort data=SortExample nouniquekey out=dupsreq uniqueout=uniqreq;
by visit;
run;
Proc Sort
Duplicate Observation from the dupreq dataset

Duplicate records were deleted from the original dataset. You can save the duplicate records in another dataset by using the OUT option.

If option OUT is not used, the original dataset is overwritten and will contain only the duplicate records.

The <strong>UNIQUEOUT</strong>= option can be used with the NOUNIQUEKEY option. UNIQUEOUT= SAS-data-set specifies the output data set for observations that will contain unique records.

Proc Sort
Unique records from UNIQUEOUT option

OVERWRITE

The OVERWRITE option will enable you to delete the input data set before the replacement output data set of the same name is populated with observations.

Using this option can reduce disk space requirements.

PRESORTED

The PRESORTED checks within the input data set to determine whether the sequence of observations is in order before sorting is done.

Use the PRESORTED option when you know or strongly suspect that a data set is already in order according to the key variables that are specified in the BY statement.

Using this option can eliminate the unnecessary overhead of the cost of sorting the data set.

Maintaining the Relative Order of Observations in Each BY Group

EQUALS option specifies the order of the observations in the output data set and it maintains the relative order of the observations from within the input data set to the output data set for observations with identical BY variable values.

NOEQUALS does not necessarily preserve this order in the output data set.

The SORTEQUAL and NOSORTEQUALS are the two system options that control whether the first observation in a group is selected or not.

SORTEQUAL specifies that observations with identical BY variable values are to retain the same relative positions in the output data set as in the input data set.

NOSORTEQUALS says that no resources should be used to control the order of observations in the output data set that have the same value for a BY variable.

EQUALS Option

The EQUALS option sorts the observations by the value of the first variable and maintains the relative order.

To understand this, let’s look at the example below.

data One;
   input ID 1 Number 3-5;
   datalines;
5 421
5 336
1 209
1 564
3 711
3 343
4 212
4 616
;
title "original";
proc print;
Proc Sort Options in SAS
proc sort data=one out=one_sorted equals;
   by ID;
run;
proc print data=one_sorted;
   title 'Sorting with EQUALS Option';
run;
Proc Sort Options in SAS

Using the EQUALS option, which is also the default, the Sort procedure maintains the order of the By variables which is also the first observation in the BY group.

proc sort data=example nodupkey noequals;
 by id;
run;
Proc Sort
Using the NOEQUALS option

NOEQUALS Option

 By default, the order of data in the same by-groups is preserved. The NOEQUALS option sorts the observations by the value of the first variable, but it doesn’t keep the order of the variables’ relative positions.

Proc Sort Options in SAS

Note that sorting with the EQUALS option versus sorting with the NOEQUALS option causes a different sort order for the observations where ID=3.

Handling of observations with duplicate keys

PROC SORT with the NODUPKEY option removes records with identical keys. However, this has been enhanced, and now you have the ability to choose which of the observations is kept, and you can send the duplicates that aren’t kept to a dataset.

The following examples use a dataset containing customer ID numbers and the quantity purchased in a month.

data orders;
input custid qty month $6.;
datalines;
10270 5 JAN
12230 4 JAN
19323 7 JAN
19323 3 FEB
22779 2 FEB
22779 2 FEB
28819 4 FEB
29252 1 FEB
30457 9 MAR
30457 1 APR
30457 3 MAY
30457 2 MAY
31918 1 MAY
31918 1 MAY
30457 2 JUN
;
run;
Proc Sort
Input dataset

Let’s suppose, you need to find customers who have an order quantity of more than one for a given month.

You can proc sort step as below which will sort the data based on custid and month and the unique orders are kept in the dataset. The duplicate observation is sent to a different dataset using the dupout option.

proc sort data=orders out=sort1 nodupkey dupout=dups;
by custid month;
run;
Proc Sort
The contents of dups dataset

Note, the output in the dataset specified in DUPOUT option gives us the 1st records for each BY group. What if we want to have all the duplicate values in the output dataset?

Using, the NOUNIQUE key and UNIQUE out option you can have all the duplicate values as below which is also the output we want. i.e all customers who have more than one order quantity in a month.

proc sort data=orders nouniquekeys out=alldups uniqueout=uniques;
 by custid month;
run;
Proc Sort Options in SAS
The contents of alldups dataset

This combination can be used instead of the First and last dataset variables and in a few lines of code. Below are all the unique records that are stored in the unique dataset.

Proc Sort

SORTSEQ

PROC SORT uses the ‘collating sequence’ to determine the sorted order of values.

There are two commonly used collating sequences, EBCDIC (for mainframe systems) and ASCII (for most other machines running Operating Systems like Windows and UNIX).

You have long been able to select one or the other of these two different collating sequences by specifying the EBCDIC or ASCII options on the PROC SORT statement.

The SORTSEQ option allows you to further refine the way the selected collating sequence is used.

This includes subsets or locals within a national collating sequence. Even without changing the base collating sequence, the SORTSEQ option can be beneficial.

You can get more information on using Linguistic Collation options of Proc sort in the below article.

Linguistic Collation: Everyone Can Get What They Expect

Read: Default Sorting Order of Characters in SAS

Every week we'll send you SAS tips and in-depth tutorials

JOIN OUR COMMUNITY OF SAS Programmers!

Check your inbox or spam folder to confirm your subscription.

Subhro

Subhro Kar is an Analyst with over five years of experience. As a programmer specializing in SAS (Statistical Analysis System), Subhro also offers tutorials and guides on how to approach the coding language. His website, 9to5sas, offers students and new programmers useful easy-to-grasp resources to help them understand the fundamentals of SAS. Through this website, he shares his passion for programming while giving back to up-and-coming programmers in the field. Subhro’s mission is to offer quality tips, tricks, and lessons that give SAS beginners the skills they need to succeed.

Facebook Twitter Linkedin

Post navigation

Previous Previous
A comprehensive guide to PROC SQL in SAS (15 + Examples)
NextContinue
How to transpose a SAS dataset using the Proc Transpose procedure

SAS Tips in your inbox

Subscribe to 9to5sas for timely SAS tips and news delivered each month.
Learn about the latest articles, and code samples to keep your SAS skills fresh!

Your subscription is successful!

Recent Posts

  • Concatenate strings in SAS: The CAT Functions Demystified
  • 5 Techniques for Quickly Removing Leading Zeros in SAS
  • Troubleshoot Your Proc SQL Code Like a Pro with These 7 Automatic Macro Variables
  • 7 PROC SQL Options You Should Use to Debug Queries
  • How To Use The SAS Proc SQL Order By Statement?
9to5sas
  • Privacy Policy
  • Disclaimer
  • About
  • Contact
Facebook Twitter Instagram RSS Email Pinterest

Copyright © 2023 9TO5SAS, All rights reserved.

Scroll to top
  • 9to5sas Blueprint
  • About
  • About
  • Acceptable use policy
  • calculator
  • Confirm email
  • Contact
  • Contact
  • Cookie Policy
  • DISCLAIMER
  • Getting Started with SAS
  • Glossary
  • Index
  • Post #13801
  • Privacy Policy
  • Privacy policy
  • SAS Programs
  • Styles
  • Subscription confirmed
  • Terms and conditions
  • Thank You