15.1 - One-to-One Reading

Printer-friendly versionPrinter-friendly version

One-to-one reading combines two or more SAS data sets, one "to the right" of the other into a single "fat" data set. That is, one-to-one reading combines observations from two or more data sets into a single observation in a new data set. For example, suppose the data set patients contains three variables: patient ID number (ID), gender (Sex), and age of the patient (Age):

and the data set scale contains three variables: ID (number), Height (in inches) and Weight (in pounds):

Then, when we one-to-one read the two data sets, we get what I like to call a "fat" data set, called say one2oneread:

in which the second data set gets placed to the "right" of the first data set. Note that the observations are combined based on their relative position in the data set. The first observation of patient is combined with the first observation of scale to create the first observation in one2oneread; the second observation of patient is combined with the second observation of scale to create the second observation in one2oneread; and so on. The DATA step stops after it reads the last observation from the smallest data set. Therefore, the number of observations in the new data set always equals the numbers of observations in the smallest data set you name for one-to-one reading.

Example 15.1. The following program uses one-to-one reading to combine the patients data set with the scale data set:

Of course, the first two DATA steps just read in the respective patients and scale data sets. The meat of the one-to-one read takes place in the third (and last) DATA step, in which we see two SET statements. The first SET statement tells SAS first to read the contents of the patients data set into the program data vector, and then the second SET statement tells SAS to read the contents of the scale data set into the program data vector.

Launch and run the SAS program, and review the output to convince yourself that the data sets are combined as described. You should note, in particular, that SAS does indeed stop reading after reaching the last observation in the scale data set. Hence, the combined data set, one2oneread, contains six observations, the number of observations in the smallest of the two data sets. Note, too, that the position of the variables in the one2oneread data set directly corresponds to the order in which the SET statements appear in the DATA step. Because the scale data set appears to the right of the patients data set in the SET statement, the variables from the scale data set appear to the right of the variables from the patients data set in the combined one2oneread data set.

Example 15.2. The following program uses one-to-one reading to combine the patients data set with the scale data set in the reverse order from that of the previous program:

Note, here, that the SET statement for the scale data set appears first in the DATA step, followed by the SET statement for the patients data set. Launch and run the SAS program, and review the output to convince yourself that the variables from the patients data set appear to the right of the variables from the scale data set in the combined one2oneread2 data set.

Hmmm... but what about the fact that the ID variable appears in both the patients and scale data sets, but it appears only once in the combined one2oneread2 data set? How does SAS handle the situation? Well... in general, if the data sets contain variables that have the same names, the values that are read in from the last data set overwrite the values that were read in from earlier data sets. Let's take a look at a contrived example to illustrate this point.

Example 15.3. The following program uses one-to-one reading to combine the one data set with the two data set to create a new data set called onetwo:

As you review the first two DATA steps, in which SAS reads in the respective one and two data sets, note that the two data sets share two variables, namely ID and VarB. The third (and last) DATA step tells SAS to combine the two data sets using the one-to-one reading method. Let's walk our way through how SAS processes the DATA step. At the end of the compile phase, SAS will have created a program data vector containing the variables from the one and two data sets in the order in which they appear in the DATA step:

During the first iteration of the DATA step, the first SET statement reads one observation from data set one:

Then, the second SET statement reads one observation from data set two. The values for ID and VarB in data set two overwrites the values for ID and VarB in data set one:

Being at the end of the first iteration of the DATA step, SAS writes the contents of the program data vector as the first observation in the onetwo SAS data set. Upon returning to the top of the DATA step, the program data vector looks like this at the beginning of the second iteration of the DATA step:

Recall that SAS retains the values of variables that were read from a SAS data set with the SET statement. Now, the first SET statement reads the second observation from the one data set:

And, the second SET statement reads the second observation from data set two. Again, the values for ID and VarB in data set two overwrites the values for ID and VarB in data set one:

Being at the end of the second iteration of the DATA step, SAS writes the contents of the program data vector as the second observation in the onetwo SAS data set. Because there are no more observations in the two data set, processing stops. That is, the DATA step does not read the third observation from the one data set.

Now, launch and run the SAS program, and review the output to convince yourself that the one and two data sets are combined as described.

One more comment. Note that although each of our one-to-one reading examples involved combining just two data sets, you can specify any number of SET statements when one-to-one reading ... and therefore the sky's the limit.