Lesson 17: Using the OUTPUT and RETAIN statements
When processing any DATA step, SAS follows two default procedures:
- When SAS reads the DATA statement at the beginning of each iteration of the DATA step, SAS places missing values in the program data vector for variables that were assigned by either an INPUT statement or an assignment statement within the DATA step. (SAS does not reset variables to missing if they were created by a SUM statement, or if the values came from a SAS data set via a SET or MERGE statement.)
- At the end of the DATA step after completing an iteration of the DATA step, SAS outputs the values of the variables in the program data vector to the SAS data set being created.
In this lesson, we'll learn how to modify these default processes by using the OUTPUT and RETAIN statements:
- The OUTPUT statement allows you to control when and to which data set you want an observation written.
- The RETAIN statement causes a variable created in the DATA step to retain its value from the current observation into the next observation rather than it being set to missing at the beginning of each iteration of the DATA step.
Learning objectives & outcomes
Upon completing this lesson, you should be able to do the following:
- use a RETAIN statement to tell SAS to retain the value of a variable from one iteration of the data step to the next
- know which kind of variables SAS automatically retains
- use a RETAIN statement to compare values across observations
- understand how the RETAIN statement works and therefore be able to program successfully with it
- use the "FIRST." and "LAST." variables in conjunction with an OUTPUT statement in order to collapse multiple observations in a data set into a single observation
- use a SUM statement to accumulate totals across a set of observations
- use a "LAST." variable in conjunction with BY-group processing, a RETAIN statement, and an OUTPUT statement in order to transpose a data set
- use an OUTPUT statement to tell SAS to output the current observation when the output statement is processed
- use an OUTPUT statement to write observations to multiple data sets
- use an OUTPUT statement to control output of observations to data sets based on certain conditions
- understand that if you plan to use any OUTPUT statements in a DATA step, you must use OUTPUT statements to program all of the output for that step
- understand that assignment statements must precede OUTPUT statements
- use the today( ) function to determine today's date
Our "to do" list for this lesson
In order to complete the lesson you should:
- Read the lesson pages that follows.
- Type up your answers to the homework problems in a Word file named homework17_yourPSUloginid. By now you should be used to the format. If your PSU user id is xyz123, then name your file homework17_xyz123. Upload the file to the Lesson #17 Homework Dropbox.
- Post any questions or comments you have concerning the lesson's material to the Lesson #17 General Discussion Board.
- Take the Lesson #17 Mastery Quiz. Remember two things: i) You have 20 minutes to complete the quiz, and ii) as soon as you hit the "submit" button, your answers are submitted and graded, and the quiz becomes closed to you.