20.4 - Modifying List Input

Printer-friendly versionPrinter-friendly version

List input can be made even more versatile by using what is called modified list input. Modified list input entails using either the ampersand (&) modifier or the colon (:) modifier:

  • The ampersand (&) modifier allows you to read character values that contain embedded blanks.
  • The colon (:) modifier allows you to read nonstandard data values and character values that are longer than eight characters, but which have no embedded blanks.

Let's take a look at an example in which modified list input would be useful.

Example 20.13. The following program attempts to use list input to read the populations of the ten most populous cities in the United States into a temporary SAS data set called citypops, but the program fails:

launch SAS program

 

 

In reviewing the data, the first thing you might notice is that this particular input data file contains a header row:

City  Yr2000Popn

that reports the content of each record. You may often find yourself in a situation in which someone has handed you such a data file, that is, one containing headings in addition to the columns of data. In general, that's a good thing, since then you know for sure what each record contains. It creates a problem though for reading in the data unless you tell SAS to disregard the heading information. That's just what the FIRSTOBS = 2 option in the INFILE statement tells SAS to do. It tells SAS to begin reading data at line 2 instead of the default line 1.

Moving past the header row, you should note the important features of the data. The longest city name is 12 characters. Some of the cities — New York, for example — contain embedded blanks. There are two blank spaces between the city names and their populations. Finally, because the population values contain commas, they are nonstandard values that require an informat during input. Given this list of features, it shouldn't be surprising then that the standard list input style used in the INPUT statement fails.

Launch and run the SAS program, and review the output to convince yourself that SAS encounters a serious problem when attempting to read the data into the citypops data set.

The Ampersand (&) Modifier

Because the ampersand (&) modifier allows us to use list input to read character values containing single embedded blanks, it is the tool that we will want to use to read in the city names.

Example 20.14. The following program uses list input modified with an ampersand (&) toread in the city and the population values of the ten most populous cities in the United States in the year 2000:

launch SAS program

 

 

Comparing this program to the previous program you should note four differences:

  1. The LENGTH statement tells SAS, in the compile phase, to define the city variable as a character variable, and to expect the city names to be as long as 12 characters.
  2. The ampersand (&) that follows the city variable in the INPUT statement tells SAS that the city values may contain one or more single embedded blanks. Because the ampersand modifier is used, SAS will read the city value until two or more consecutive blanks are encountered. That is a very important point ... when you use ampersand modified list input, the values that you are reading in must be separated by two or more consecutive blanks. You cannot use any other delimiter to indicate the end of each field.
  3. The commas have been removed from the population values so that SAS can read in the population values using unmodified (standard) list input for the pop2000 variable.
  4. A FORMAT statement has been added to the PRINT procedure just so that the pop2000 values are displayed with commas.

Launch and run the SAS program, and review the output to convince yourself that the values for both the city and pop2000 variables are read in properly.

Example 20.15. Rather than using a LENGTH statement to define the type and length of the city variable, we can place a $w. character informat right in the INPUT statement. The only difference between the following program and the previous one is that the LENGTH statement has been removed, and the $12. character informat has been inserted into the INPUT statement immediately following the city variable's ampersand (&) modifier:

launch SAS program

 

 

Launch and run the SAS program, and review the output to convince yourself that the values for both the city and pop2000 variables are again read in properly.

The Colon (:) Modifier

The colon (:) modifier allows us to use list input to read nonstandard data values and character values that are longer than eight characters, but which contain no embedded blanks. The colon (:) indicates that values are read until a blank (or other delimiter) is encountered, and then an informat is applied. If an informat for reading character values is specified, the w value specifies the variable's length, overriding the default length of 8.

Example 20.16. The following program uses the colon (:) modifier to tell SAS to expect commas when reading in the values for the pop2000 variable:

launch SAS program

 

 

Comparing this program to the previous program you should note just two differences:

  1. The commas have been added back into the population values so that we can see how to use the colon (:) modifier to read in nonstandard data values while still using list input.
  2. The colon (:) and comma. informat that follows the pop2000 variable in the INPUT statement tells SAS to expect population values to contain nonstandard characters — commas, in this particular instance. As illustrated here, we need not specify a w value when using the COMMAw.d informat here. That's because list input just reads each value until a blank is detected. (This differs from using a numeric informat with formatted input, in which we must specify a w value in order to tell SAS how many columns to read.)

Launch and run the SAS program, and review the output to convince yourself that the values for both the city and pop2000 variables are again read in properly.

Comparing Formatted Input and Modified List Input

It is important to keep in mind that informats work differently in modified list input than they do in formatted input. So, let's emphasize the point! With formatted input, the informat determines both the length of character variables and the number of columns that are read. The same number of columns are read from each record. For example, the following INPUT statement using formatted input:

input @1 City $12. @15 Pop2000 comma10.;

uses the $12. character informat to tell SAS to set the length of the city variable to 12 as well as to read columns 1 to 12 when reading in these data values:

The informat in modified list input, on the other hand, determines only the length of the modified variable, not the number of columns that are read. Here:

 input city & $12. pop2000 : comma.;

the raw data values are read until two consecutive blanks are encountered when reading in these data values suitable for list input: