SAS Index

Using Index in SAS to speed up programs

An index in SAS is used to sort your data without physically sorting it logically. If you are sorting and then re-sorting data to accomplish merges, you may find indexes useful.

Creating an Index in SAS

Indexes are created using PROC DATASETS or PROC SQL; they can be created in a DATA step.

Indexing is storing the order of the data like physically sorting If there is an index for a dataset, SAS will be able to access it and allows us to use the data set with the appropriate BY statement without sorting the data.

SAS Indexes are stored in a separate file, and the size of this file can be substantially large, especially as the number of indexes, observations, and variables used to form the indexes are more. However, Indexes can substantially speed up processes.

Creating an index using proc datasets

PROC DATASETS LIBRARY=libref;
MODIFY SAS-data-set;
INDEX CREATE varlist / UNIQUE NOMISS
UPDATECENTILES = ALWAYS | NEVER | integer;
  • libref is the SAS data library that is to be modified
  • INDEX CREATE statement is the list of variables for which you want to create an index.
  • The UNIQUE option specifies that key variable values must be unique within the SAS data set.
  • The NOMISS option specifies that no index entries will be built for observations with missing key variable values.
  • The UPDATECENTILES option allows you to override when SAS updates the index’s centiles.

The UNIQUE, NOMISS, and UPDATECENTILES options are optional.

There are two types of Indexes- Simple and Composite.

A simple index consists of one variable, whereas a composite index contains multiple variables.

data subjects;
	length subject_name $3;

	do subject_id=100 to 100000;

		do j=1 to 3;
			substr(subject_name, j)=byte(int(65+26*ranuni(0)));
		end;
		output;
	end;
run;

Creating a single index

proc datasets library=work;
modify subjects; 
index create subject_id / unique; 
run;

Creating Multiple Index

proc datasets library=work;
modify subjects;
index create idname=(subject_id subject_name) / nomiss;
run;

Creating an index using PROC SQL

CREATE  INDEX index-name ON data-set-name(varlist)

Note that the UNIQUE, NOMISS, and UPDATECENTILES options are not available while creating an index using PROC SQL.

/*Single Index*/
proc sql;
	create unique index subject_id on work.subjects;
quit;

/*Multiple Index*/
proc sql;
	create index idname on work.subject(subject_id subject_name);
quit;

Creating an Index using DATA STEP

DATA data-set-name(INDEX=(varlist / <UNIQUE><NOMISS><UPDATECENTILES = | ALWYAYS | NEVER integer>
));
  • Data-set-name is the name of the new SAS data set name. Varlist is the name of the key variable.
  • The UNIQUE, NOMISS, and UPDATECENTILES options can be used while creating an index in the data step.
/*Single Index*/
data SubjectIndexed(index=(subject_id / unique));
set work.subject;
/*SAS Statements */
run;

/*Multiple Index*/
data SubjectIndexed(index=(idname=(subject_id subject_name) / nomiss));
set work.subjects;

/* SAS Statements…*/ 
run;

Using the index in SAS

Once any of the above methods create an Index, you will want to use them. You can use the index in any of the four places described below.

WHERE statement in a DATA or PROC step

BY statement in a DATA or PROC step.

KEY option on a SET or <a href=”https://www.9to5sas.com/sas-modify-statement/” target=”_blank” rel=”noreferrer noopener”>MODIFY statement</a>

Using Index in a WHERE statement

The WHERE statement can be used in DATA and PROC steps as in the below example.

data subject1;
 set subjects;
 where subject_id eq 5678;
run;

Using the BY Statement

Using an index in a BY statement is simple. When the BY variable is an index, the index is automatically used, and the data does not need to be sorted.

data class(index=(age));
 set sashelp.class;
run;

data class2; 
set class; 
by age; 
if first.age; 
run;

Using the KEY= Option

KEY= option can be used to look up a value when an index exists on only the data set that contains the values to be looked up. The KEY= option on the SET statement identifies an index to be used.

Please refer to the article below, which has an example of using the KEY= option for performing table lookup.

More on Indexes

SAS doesn’t need to use an existing index just because Index has been created, even when using a WHERE or BY statement.

SAS first analyzes if using an index will be more efficient than reading the entire data set sequentially.

There are a lot of factors that are considered before using an index. Some of these are dataset size, index availability, and centile information.

Hence, If SAS predicts that it will be more efficient to use a specific index to return observations than to read the entire data set, then it will use that index. Otherwise, it will read the entire data set sequentially to fetch the observations.

Mostly SAS makes a good decision to use an index or not. However, its internal algorithms can sometimes decide not to use Index, considering the resources consumed when reading a large subset of data via an index are greater than reading the entire SAS data set.

Override the SAS System decision about whether to use an index.

The IDXNAME= and IDXWHERE= dataset options override SAS’ default index usage.

<strong>IDXNAME</strong> – Using the IDXNAME dataset option will prevent SAS from determining if the specified index is the best or if a sequential search might be more resource-efficient.

<strong>IDXWHERE</strong>= This option enables you to override the SAS System decision about whether to use an index.

You can use any one of the above but not both.

data subject1;
   set subjects (idxname=idname);
   where idname < 100;
run;

Preventing SAS from using an Index

To prevent SAS from using an index, you can use the IDXWHERE=NO option

data subject1;
	set subjects (idxwhere=NO);
	where idname < 100;
run;

SAS automatically uses an index when you specify the KEY option on a SET or a MODIFY statement.

How to find if SAS is using the index

Using the below system option, SAS will print the Index used for any operation.

options msglevel=i;

Message in the LOG

INFO: Index idname selected for WHERE clause optimization.

Check for the presence of the Index in the SAS

The index of datasets information is available in DICTIONARY.INDEX and SASHELP.VINDEX, from which you can view to get all the useful information about indexes. Alternatively, you can also use the proc content procedure to check if an index exists for a dataset.

PROC SQL outobs=5;
  SELECT *
  FROM dictionary.indexes;
  QUIT;

DATA TEST;
SET SASHELP.VINDEX;
run;

Output:

SAS INDEX

Dropping an Index in SAS

To drop an index, specify the index’s name after the DROP statement as below.

proc sql;
	drop index idanme from students;
quit;

Important Points on Index in SAS

An index cannot be created on a SAS view.

An index cannot be created on a SAS view.

Since resources are required to create indexes, these resources should be considered. It is good to keep the number of indexes to a minimum to reduce disk storage and update costs.

Avoid creating an index on variables with low cardinality or a small number of distinct values. For example, Gender will have two values Male and Female.

Use indexes for queries that retrieve a relatively small subset of rows—less than 15%.

Do not create more than one index based on the same column as the primary key.

Every week we'll send you SAS tips and in-depth tutorials

JOIN OUR COMMUNITY OF SAS Programmers!

Subhro

Subhro provides valuable and informative content on SAS, offering a comprehensive understanding of SAS concepts. We have been creating SAS tutorials since 2019, and 9to5sas has become one of the leading free SAS resources available on the internet.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.