FIRST and LAST variables in SAS

FIRST and LAST variables in SAS

  • Post author:
  • Post category:Base SAS
  • Post comments:0 Comments
  • Reading time:10 mins read

FIRST and LAST processing are temporary variables created automatically by a DATA step when a BY statement is used.

The values of these Boolean variables will either be 1 for true or 0 for false.

  • FIRST.variable: This variable gets a value of 1 the moment a new group begins within your sorted dataset (based on your BY variables). For all other records within that group, its value remains 0.
  • LAST.variable: This variable acquires a value of 1 for the final record of each group within your sorted dataset. Otherwise, it stays at 0.

Note: Input Data set must be sorted or indexed before applying FIRST. and LAST. Variables.

A Simple Example

Suppose you have a dataset with sales figures organized by ‘Region’ and ‘Month’. Here’s how to use FIRST. and LAST. variables to calculate the first sale and total sales per region:

data transactions;
input customer_id $ date mmddyy10. purchase_amount;
format date date9.; 
datalines;
A001 01/20/2024 25.50
B005 02/15/2024 12.99
A001 02/28/2024 45.00
C012 01/12/2024 30.75
B005 01/24/2024 8.50
;
run;

proc sort data=transactions;
  by customer_id; 
run;

data first_last_purchases;
  set transactions;
  by customer_id;

  if first.customer_id then first_purchase_date = date;
  if last.customer_id then last_purchase_date = date;

  /* Output if we want to see the results */
  if first.customer_id or last.customer_id; 
  format first_purchase_date last_purchase_date date9.;
 run; 
proc print;

FIRST and LAST variables in SAS

Explanation:

  1. Sorting (Required): We sort by customer_id to group the data for FIRST. and LAST. to work correctly.
  2. Detecting First/Last:
    • if first.customer_id then ... : When a new customer ID appears, we store the date in first_purchase_date.
    • if last.customer_id then ... : When it’s the last record for a customer ID, we store the date in last_purchase_date .
  3. Output (Optional): The last if statement outputs rows only when it’s either the first or last record of each customer, making it easier to see the results.

Calculating Differences Between Records: Let’s say you have stock prices over time. You could use FIRST. and LAST. variables to calculate the change in price from the beginning to the end of each day, week, or month:

Example 1 : Calculating difference between records

data stock_data;
  infile datalines truncover;
  input ticker_symbol $4. date :yymmdd10. price best32.; 
  format date date9.; /* Apply the date format for output */
datalines;
AAPL 2024-02-22 150.50
TSLA 2024-02-21 205.75
AAPL 2024-02-21 148.30
MSFT 2024-02-22 250.22
GOOG 2024-02-22 1200.15
AMZN 2024-02-21 97.25
GOOG 2024-02-21 1195.00
AMZN 2024-02-22 98.50 
FB   2024-02-22 175.32
;
run;
proc sort data=stock_data;
by ticker_symbol date;
quit;
data price_change;
  set stock_data;
  by ticker_symbol;
  
  retain first_price last_price;
  if first.ticker_symbol then first_price = price;
  if last.ticker_symbol then do;
    last_price = price;
    price_change = last_price - first_price;
    output;
  end;
run;
Calculating difference between records

Explanation:

  • The RETAIN statement prevents first_price and last_price from being reset to missing at the beginning of each iteration of the data step.
  • if first.ticker_symbol ...: If it’s the first record of a ticker group, store the ‘price’ in first_price.
  • if last.ticker_symbol ...: If it’s the last record of a ticker group:
    • Store the ‘price’ in last_price.
    • Calculate price_change (last price – first price).
    • output;: Write a record to the ‘price_change’ dataset.

Example 2: Calculating Running Totals by Group

data sales;
  input month date9. sales;
  format month monyy7.;
  datalines;
01Jan2024 100
01Jan2024 150
01Feb2024 200
01Feb2024 250
01Mar2024 300
;
run;
proc sort data=sales;
  by month;
run;
proc print;
data sales_with_total;
  set sales;
  by month;
  retain total_monthly 0 total_running 0;
  if first.month then total_monthly = 0; /* Initialize total for each month */
  total_monthly + sales; /* Calculate monthly total */
  total_running + sales; /* Update running total */
  if last.month then do;
    output; /* Output total for each month */
    total_monthly = .; /* Reset total for next month */
  end;
run;
proc print;
Calculating Running Totals by Group

Explanation:

  • retain ...;: Prevents total_monthly and total_running from resetting to missing at the start of each data step iteration.
  • if first.month ...: If it’s the first record of a month, reset total_monthly to 0.
  • total_monthly + sales;: Accumulates sales for the current month.
  • total_running + sales;: Accumulates the overall running total.
  • if last.month ...: When it’s the last record of a month:
    • output;: Writes a record to the ‘sales_with_total’ dataset (containing the calculated totals).
    • total_monthly = .;: Resets total_monthly for the next month.

Example 3: Finding Maximum and Minimum Values

data grades;
  input student_id grade;
  datalines;
1 90
1 85
2 80
2 95
3 75
3 85
;
run;

data max_min_grade;
  set grades;
  by student_id;
  retain max_grade min_grade;
  if first.student_id then do;
    max_grade = grade;
    min_grade = grade;
  end;
  else do;
    if grade > max_grade then max_grade = grade;
    if grade < min_grade then min_grade = grade;
  end;
  if last.student_id;
run;
Finding Maximum and Minimum Values

Example 4: Calculating Group wise Average

proc sort data=sales;
by region;
quit;
data region_average;
  set sales;
  by region;
  retain total count;
  if first.region then do;
    total = 0;
    count = 0;
  end;
  total + sales;
  count + 1;
  if last.region then do;
    average = total / count;
    output;
  end;
run;
proc print;
Calculating Group wise Average

Example 5 : Identifying Duplicate records

You can use first and last variables to identify duplicate records without using the proc sort procedure.

data customer;
  input customer_id $;
  datalines;
A
A
B
C
C
C
;
run;

data duplicate_records;
  set customer;
  by customer_id;
  if first.customer_id then count = 1;
  else count + 1;
  if last.customer_id and count > 1 then output;
run;
Identifying Duplicate records

Key Points to Keep in Mind

  • Always remember to sort your data using a PROC SORT and a BY statement for FIRST. and LAST. variables to function correctly.
  • FIRST. and LAST. variables are temporary; they exist within the DATA step and aren’t added to the final output dataset unless you explicitly create new variables to store their values.

Every week we'll send you SAS tips and in-depth tutorials

JOIN OUR COMMUNITY OF SAS Programmers!

Subhro

Subhro provides valuable and informative content on SAS, offering a comprehensive understanding of SAS concepts. We have been creating SAS tutorials since 2019, and 9to5sas has become one of the leading free SAS resources available on the internet.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.