Importing data into SAS is one of the basic concepts you need to know in order to work with data manipulation or analytics.
SAS can read data from almost any source. Common data sources can be raw text files, Microsoft Office Excel spreadsheets, Access databases, and some of the popular database systems such as DB2 and Oracle.
Methods to import data into SAS
There are 3 popular methods for importing data into SAS.
- Using the
To access a file from any programming language, a link is needed between the program and the desired file. Therefore, the INFILE statement is used in SAS to generally link to raw files.
INFILE also work with other SAS statements such as FILENAME, DATALINES, PUT and INPUT to provide extensive data input and output in the DATA step.
INFILE file-specification ;
file-specification: identifies the source of the input data records, which is an external file or in-stream data.
Options: There are a large number of options that you can use on the infile statement.
Using the INFILE/FILE statement
Because the INFILE statement identifies the file to read, it must execute before the INPUT statement that reads the input data records.
We have a text file with data on vegetable seeds. Each line of the file contains the following pieces of information (separated by spaces):
vegetable Name, vegetable code, days to germination and number of seeds.
In SAS, each piece of information is called a variable (column in other systems). A few sample lines from the file are shown below:
Cucumber 50104-A 55 30 195 Cucumber 51789-A 56 30 225 Carrot 50179-A 68 1500 395
In this example, each line of data produces what SAS calls an observation (also referred to as a row in other systems).
data veg; infile "c:\books\learning\veggies.txt"; input Name $ Code $ Days Number Price; run;
title "List of Vegetables"; proc print data=veg; run;
The INPUT statement shown here is one of the methods that SAS has for reading raw data.
The default data delimiter for SAS is the blank. SAS can also read data separated by any other delimiter (for example, commas, tabs) with a minor change to the INFILE statement.
When you use the list input method for reading data, you only need to list the names you want to give each data value. SAS calls these as variable names.
The INFILE statement tells SAS where to find the data and the INPUT statement contains the variable names you want to associate with each data value.
The dollar sign($) following variable names tells SAS that values for Vegetable Name and code are character values.
Finally, the DATA step ends with a RUN statement.
Importing data into SAS using Proc Import
To read excel data in SAS you can use the Proc import procedure.
PROC IMPORT DATAFILE=filename OUT=sas-dataset DBMS= data-source-identifier REPLACE;
SHEET= sheet-name GETNAMES=Yes/No DATAROW=n;
DATAFILE=filename: specifies the complete path and filename or fileref for the input file. A fileref is a SAS name that is associated with the physical location of the output file.
OUT= SAS data-set: This tells SAS to create a dataset with the name of your choice.
DBMS=data-source-identifier: It specifies the type of data to import. To import a DBMS table, specify DBMS = using a supported database identifier.
REPLACE: This is also an optional argument. If you specify the REPLACE option, it overwrites an existing SAS data set. If you do not specify REPLACE, the IMPORT procedure does not overwrite an existing file and a NOTE is written in the log with the below message:
NOTE: Import canceled. Output dataset dataset-name already exists. Specify REPLACE option to overwrite it.
SHEET: When you have to import data from a specified sheet in excel, you can use this option to specify the sheet name. By default, SAS will import sheet1 from excel.
GETNAMES: By default, SAS imports the first row of an excel sheet as the Variable name for the SAS variable. Similarly, if you specify No option, it will tell SAS, not to use the first row of data as variable names and instead, SAS creates variable names as VAR1, VAR2 and so on.
DATAROW= It is is used to specify the starting row from where SAS would import the data. If this option is not specified, SAS will import data starting from the 1st row of excel.
An important point to note here is:
- When GETNAMES=YES, DATAROW must be greater than or equal to 2.
- When GETNAMES=NO, DATAROW must be greater than or equal to 1
RANGE= For specifying the range of rows and columns of an excel sheet, use the range option with the sheet range as arguments.