Statistical Consulting Center - UMass AMherst
 

Home
Site Licenses
Computer Labs
OITUNIX
Direct Purchase

Statistical Software
Instructional Materials

What's New
Current Releases
Patches


Resources

BCCUMA

Datasets

Workshops

Online Docs
SAS 8.2
SAS 9.1.3
SIR 2002

 

Statistical Software > SAS/WIN Web > Manipulating SAS Data

SAS Online Tutorial

VIII. Manipulating SAS data

Often your data is not exactly in the form you need it to be to do your analysis. You may need to create new variables by combining existing variables, or you may want to analyze only part of your data, or the data has missing value codes that are not the default SAS missing value codes. These kinds of manipulation of your data can be performed within the SAS system in the DATA step.


Algebraic and logical transformations

Using the minidat SAS dataset created in the previous section, suppose you wish to analyze a variable which is the difference between pulse2 and pulse1. Also for SAS the default missing value codes are blank for character variables and period for numeric variables. In the description of the minidat.dat data, we have missing value codes of 9 for smoke and activity and 0 for pulse1, pulse2 and weight. SAS does not recognize these values as missing because they are not the default value of a period. Type or copy/paste the following DATA step to assign the missing values and create the new variable. We assign the missing values first so that the newly created variable will be missing when either of the variables used in creating it is missing. The frequency and means procedures are used to confirm that we have assigned missing values and created the new variable correctly.

data minifix; set minidat;
if smoke=9 then smoke=.;
if activity=9 then activity=.;
if pulse1=0 then pulse1=.;
if pulse2=0 then pulse2=.;
if weight=0 then weight=.;
pulsdif=pulse2-pulse1;
proc freq data=minifix;
tables smoke activity;
proc means data=minifix;
var pulse1 pulse2 weight pulsdif;
run;


The DATA statement begins the data step and tells SAS to call the new SAS dataset minifix (in the WORK library). The SET statement says to use the temporary SAS dataset minidat to create the new SAS dataset minifix. (Note: Instead of the SET statement, we could have used the INFILE and INPUT statement to re-read the ascii data file minidat.dat.) The five IF statements assign a period (missing value code) to variables smoke and activity whenever their value is 9, and to variables pulse1, pulse2 and weight when their value is 0. The next statement creates new variable pulsdif as the difference between pulse2 and pulse1. We then run the frequency and means procedures to check our results. The output should be as follows:



Analysis of subsets of the data (Use of the WHERE statement)

Often when doing analysis, we wish to analyze only a portion of our complete data. For example, we wish to produce means for only the males from our minifix SAS dataset, created in previous section and then repeat the analysis for the females. The WHERE subcommand added to a procedure statement can be used to do this. Type or copy/paste the following procedure and where statements into the Editor window:


proc means data=minifix; where sex=1;
var pulse1 pulse2 pulsdif;
proc means data=minifix; where sex=2;
var pulse1 pulse2 pulsdif;
run;


The WHERE statement included after the first proc means statement limits the proc means to only cases where sex is equal to 1 (males). The second WHERE statement limits it to cases where sex is equal to 2 (females). The result of these analyses are shown here:

Proc means for the male SAS dataset:



Proc means for the female SAS dataset:



Home
| Back | Next


© 2004 University of Massachusetts Amherst. Site Policies.