The Complete Guide to SAS Arrays

Are you looking to become a more efficient Data Step programmer? Do you often need to perform the same manipulation on multiple variables? If so, arrays are a great tool to simplify your SAS code and improve your programming efficiency. By using arrays, you can execute complex data manipulation tasks, allowing you to manipulate multiple variables with DO LOOPs and carry out a variety of data transformations with limited lines of code.

This article will address different types of arrays and walk through different a variety of examples where arrays can be used on a variety of data manipulation tasks.

In particular this article will cover:

  1. Overview of Arrays
  2. One-Dimensional Arrays
    – Using arrays to perform a repetitive calculation
    – Creating new variables with arrays
    – Manipulating character variables with arrays
    ​- Defining array bounds
  3. Implicit Arrays and DO OVER
  4. Multi-dimensional arrays

Software​Before we continue, make sure you have SAS Studio or SAS 9.4 installed. Don’t have the software? Download SAS Studio now. It’s free!​

SAS Studio

Data SetsA variety of data sets from the SASHELP library are used throughout this article. The datasets used include:

  1. SASHELP.APPLIANC – Sales Time Series for 24 Appliances by Cycle
  2. SASHELP.PRICEDATA – Simulated monthly sales data
  3. SASHELP.CARS – Data about 2004 cars
[Don’t have the software yet? Download SAS Studio here for free.]

Array Overview

​In order to take advantage of SAS arrays, you first need to have a basic understanding of DO LOOPs. For a complete guide on SAS DO LOOPs, see The Complete Guide to Do-loop, Do-while and Do-Until found here.

First, let’s walkthrough the different components of  a SAS array. The most commonly used array type is the explicit SAS Array, which can be broken down into 6 main components:

arrayarray-name {X} $ length array-elements initial-values

Each array statement must at minimum contain these 3 elements:

  1. Array-name: The name of the array
  2. X: the number of elements in the array
  3. Array-elements: the list of variables to be grouped within the array

Optionally, the array statement can also include:

  1. $: A dollar sign ($) to denote character variables in the array
  2. length: A length value to declare a common length for elements in the array
  3. initial-value(s): An initial value to assign to element(s) in the array

In the next section, we will walkthrough a simple array example to help you better understand the structure of SAS arrays.


One-Dimensional Arrays

​The simplest form of SAS arrays are a one-dimensional arrays. In one-dimension arrays, a grouping of SAS variables is grouped under a single array. Once variables are grouped under a single array, you can easily perform the same calculation on all the variables with just a few lines of code.

Let’s look at an example where we perform the same task both with and without SAS arrays to compare and contrast the two methods.

Example 1A – Performing a Repetitive Calculation on Multiple Variables, Without an Array

In many cases, you often need to perform the same calculation on multiple similar variables. This type of task is well suited for arrays because it can greatly reduce the amount of code you need to write.

In the SASHELP dataset APPLIANC, the number of units sold for 24 appliances by cycle are stored in 24 variables, UNITS_1 to UNITS_24. Let’s say for example due to a computer glitch you need to add 3 units sold to the first 10 appliances (i.e. the UNITS_1 to UNITS_10 variables).

To demonstrate how arrays can simplify your code, let’s first look at how this calculation can be done without using arrays. Adding 3 units to each UNIT_# variable is a simple arithmetic operation, but as you know it can become quite long and repetitive as there are 10 unit variables in the APPLIANC dataset which each need to have their values modified.

Below is the basic Data Step code to complete this task. Two PROC PRINT statements are also added to allow for an easy comparison of the first 10 observations of the original and modified datasets:

data applianc;
 set sashelp.applianc;

 units_1 = units_1 + 3;
 units_2 = units_2 + 3;
 units_3 = units_3 + 3;
 units_4 = units_4 + 3; 
 units_5 = units_5 + 3;
 units_6 = units_6 + 3;
 units_7 = units_7 + 3;
 units_8 = units_8 + 3;
 units_9 = units_9 + 3;
 units_10 = units_10 + 3;
run;

proc print data = sashelp.applianc (obs=10);
 var units_1-units_10;
 title “First 10 records of unmodified SASHELP.APPLIANC dataset”;
run;

proc print data = applianc (obs=10);
 var units_1-units_10;
 title “First 10 records of modified APPLIANC dataset”;
run;

As you can see the resulting PROC PRINT outputs shown below, we have successfully added 3 units to each of the UNITS_1 to UNITS_10 variables:

Picture

Example 1B – Performing a Repetitive Calculation on Multiple Variables, With an Array

 To simplify this task with SAS array programming, we need to define a single array which will group all the UNITS_# variables together that we wish to modify. This array will be defined as follows:

  1. Array name: units_sold
  2. Number of elements: (*) – the asterisks can be used in place of an explicit number which tells SAS to count the number of array elements for you
  3. Array elements: units_1-units_10

After defining the array, a DO LOOP needs to be set up to loop through each of the 10 elements and then increase the number of units sold by 3 for each appliance. 

The complete syntax is as follows:data applianc_array;
 

set sashelp.applianc;

 array units_sold{*} units_1-units_10;

 do i = 1 to 10;
  units_sold{i} = units_sold{i}+3;
 end;
run;

proc print data = sashelp.applianc (obs=10);
 var units_1-units_10;
 title “First 10 records of unmodified SASHELP.APPLIANC dataset”;
run;

proc print data = applianc (obs=10);
 var units_1-units_10;
 title “First 10 records of modified APPLIANC_ARRAY dataset”;
run;

When compared to the original SASHELP.APPLIANC dataset, you can now see that each of the values for the UNIT_# variables has been incremented by 3. The output of the PROC PRINT statements comparing the first 10 records of the original and modified dataset is shown below:​

Picture

Example 2 – Creating New Variables with an Array

The PRICEDATA dataset in the SASHELP library contains simulated data of the prices of 17 different products. The prices of the 17 different products are stored in the variables PRICE1-PRICE17 in USD. To convert these prices to Canadian Dollars, we would need to multiple by approximately 1.26 (based on the current exchange rate).

Without using arrays, let’s first look at how we would do this with traditional Data Step code. In the following code, a new PRICE_CAD# variable is created by multiplying the original PRICE# variable by 1.26 to convert the values from USD (United States Dollars) to Canadian Dollars (CAD). Finally, a PROC PRINT is used to print out the first 3 PRICE variables and their newly created Canadian Dollar equivalents so we can see the differences in the calculated values:
data pricedata_cad;
 set sashelp.pricedata;

 price_cad1 = price1*1.26;
 price_cad2 = price2*1.26;
 price_cad3 = price3*1.26;
 price_cad4 = price4*1.26;
 price_cad5 = price5*1.26;
 price_cad6 = price6*1.26;
 price_cad7 = price7*1.26;
 price_cad8 = price8*1.26;
 price_cad9 = price9*1.26;
 price_cad10 = price10*1.26;
 price_cad11 = price11*1.26;
 price_cad12 = price12*1.26;
 price_cad13 = price13*1.26;
 price_cad14 = price14*1.26;
 price_cad15 = price15*1.26;
 price_cad16 = price16*1.26;
 price_cad17 = price17*1.26;

run;

proc print data=pricedata_cad;
 var price1-price3 price_cad1-price_cad3;
run;

As you can see the partial output shown below, the creation of the new variables in CAD was successful:

Picture

​Since we are working with 17 variables, this task requires lots of repetitive and tedious code since we are essentially repeating the exact same calculation 17 times on similar variables. This type of programing is also error prone since it requires lots of coding. A task such as this is a great candidate for a SAS array as it will greatly reduce the lines of code required.

In the array version of this data step we need to define two arrays since we would like to have a variable for both the Canadian dollar price and the US dollar price.

In this example we are creating 2 basic arrays to group both the original PRICE# variables and the newly created PRICE_CAD# variables. The first array is defined as follows:

  1. Name: PRICE_CAD
  2. Number of elements: 17
  3. List of variables: PRICE_CAD1-PRICE_CAD17

Similarly, the second array is defined as follows:

  1. Name: PRICE_USD
  2. Number of elements: 17
  3. List of variables: PRICE-PRICE17

After using the above information to construct our arrays, we define a simple DO LOOP to iterate through each of the 17 variables and perform the USD to CAD conversion. The converted values are stored in the PRICE_CAD# variables, as defined by our price_cad{} array and the original USD values are retrieved from the PRICE# variables as defined by our price_usd{} array.

As before, a PROC PRINT is also used to display the first 3 PRICE variables:

data pricedata_cad_array;
 set sashelp.pricedata;

 array price_cad{17} price_cad1-price_cad17;
 array price_usd{17} price1-price17;

 do i = 1 to 17;
  price_cad{i} = price_usd{i}*1.26;
 end;
run;

proc print data=pricedata_cad_array;
 var price1-price3 price_cad1-price_cad3;
run;

Before examining the output, the first two iterations of the loop are broken down below to help you gain a better understanding of what’s happening at each iteration.

Picture
Picture

As you can see in the partial output shown below, the result is the same as the previous example without using arrays, but now with less SAS code.

Picture

RELATED POSTS

PROC SQL: Using SAS

INSTRUCTIONS FOR CREATING TABLES AND USING CODE EXAMPLES Create the tables using the DATA stepsRead More

Top 10 Most Powerful Functions for PROC SQL

PROC SQL is not only one of the many SAS procedures and also a distinctiveRead More

  • PROC IMPORT
  • PROC Export
  • Automatically assign libraries at startup
  • SAS Shortcuts