Statistical
Software > SAS/WIN Web > Manipulating
SAS Data
SAS Online Tutorial
VIII. Manipulating SAS data
Often your data is not exactly
in the form you need it to be to do your analysis. You may need to create
new variables by combining existing variables, or you may want to analyze
only part of your data, or the data has missing value codes
that are not the default SAS missing value codes. These kinds of manipulation
of your data can be performed within the SAS system in the DATA
step.
Algebraic and logical transformations
Using the minidat
SAS dataset created in the previous section, suppose you wish to analyze
a variable which is the difference between pulse2 and pulse1. Also for
SAS the default missing value codes are blank for character variables
and period for numeric variables. In the description of the minidat.dat
data, we have missing value codes of 9 for smoke and activity and 0 for pulse1, pulse2 and weight. SAS does not recognize these values
as missing because they are not the default value of a period.
Type or copy/paste the following DATA step to assign the missing values and create the new variable. We assign the missing values first so that the newly created variable
will be missing when either of the variables used in creating it is missing. The frequency
and means procedures are used to confirm that we have assigned missing values and
created the new variable correctly.
data minifix; set minidat;
if smoke=9 then smoke=.;
if activity=9 then activity=.;
if pulse1=0 then pulse1=.;
if pulse2=0 then pulse2=.;
if weight=0 then weight=.;
pulsdif=pulse2-pulse1;
proc freq data=minifix;
tables smoke activity;
proc means data=minifix;
var pulse1 pulse2 weight pulsdif;
run;
The DATA statement begins the data step and tells SAS to call the new
SAS dataset minifix (in the WORK library). The SET statement says to use the temporary SAS dataset
minidat to create the new SAS dataset minifix. (Note: Instead of the SET
statement, we could have used the INFILE and INPUT statement to re-read
the ascii data file minidat.dat.)
The five IF statements assign a period (missing value code) to
variables smoke and activity whenever their value is 9, and to variables pulse1, pulse2 and weight when their value is 0. The next statement creates new variable pulsdif
as the difference between pulse2 and pulse1. We then run the
frequency and means procedures to check our results. The output should
be as follows:

Analysis of subsets of the data (Use of the WHERE statement)
Often when doing analysis,
we wish to analyze only a portion of our complete data. For example, we
wish to produce means for only the males from our minifix SAS dataset,
created in previous section and then repeat the analysis for the females.
The WHERE subcommand added to a procedure statement can be used to do
this. Type or copy/paste the following procedure and where statements
into the Editor window:
proc means data=minifix; where sex=1;
var pulse1 pulse2 pulsdif;
proc means data=minifix; where sex=2;
var pulse1 pulse2 pulsdif;
run;
The WHERE statement included after the first proc means statement limits
the proc means to only cases where sex is equal to 1 (males). The second
WHERE statement limits it to cases where sex is equal
to 2 (females). The result of these analyses are shown here:
Proc means for the male SAS
dataset:

Proc means for the female SAS dataset:
Home | Back | Next
|