|
Home
Site Licenses
Computer Labs
OITUNIX
Direct Purchase
Statistical
Software
Instructional Materials
What's
New
Current Releases
Patches
Resources
BCCUMA
Datasets
Workshops
Online Docs
SAS
8.2
SAS
9.1.3
SIR 2002
|
Statistical
Software > SPSS on OITUNIX
Introduction to SPSS on OITUNIX
I. General Information
SPSS (Statistical Package for
Social Sciences) is a general purpose statistical program which can be
used to analyze a wide variety of research data. It contains many statistical
procedures, ranging from simple descriptive statistics (e.g. means, standard
deviations, frequencies) to specialized statistical techniques.
This handout is for SPSS release 6.1.4 for the UNIX operating system
running on OITUNIX. You need some familiarity with Unix, and with an
editor available on OITUNIX.
The UNIX operating system is sensitive to upper/lower case. Therefore,
all UNIX commands should be typed as shown in the handout. SPSS is not
case senstive, so all SPSS commands can be typed upper or lower case
[except for file names which must reference the file exactly as the
name is on the disk].
Documentation
There are five manuals
that document the SPSS base system:
- SPSS 6.1 Syntax Reference
Guide (An alphabetical reference to all SPSS commands in the base
system)
- SPSS Professional Statistics
6.1(Documents the following analyses: cluster, K-means cluster,
discriminant, factor, multidimensional scaling, proximity and reliability)
- SPSS Advanced Statistics
6.1 (Documents logistic regression, log linear analyses, multivariate
analysis of variance, constrained nonlinear regression, probit analysis,
Cox regression and Kaplan-Meier and actuarial survival analyses)
- SPSS 6.1 Base System
User's Guide, Part 1, UNIX Version (Documents the motif graphical
user interface and running SPSS through the Manager)
- SPSS 6.1 Base System
User's Guide, Part 2 (Documents in detail the procedures in statistics
and graphs)
Online Help
There is an online help facility
for SPSS. Type
spss -m [Enter]
to start SPSS, then type
HELP [Enter]
You will get a screen
of the topics on which online help is available, and instructions
on using online help. You can also type a question mark, followed
by the name of a topic (e.g ?CROSSTABS) to get the
help screen for that topic. The online help does not eliminate the
need for the manual, but will provide a brief reminder of the syntax
of each command. To exit from online help, press return until you
get the SPSS> prompt, then type
FINISH [Enter]
Implementation
The following commands which
are in the SPSS manual are not available on this system:
- GET BMDP
- GET SCSS SAVE SCSS
- TABLES LISREL
- GET TRANSLATE SAVE TRANSLATE
Terminal set-up
vt100 emulation
By default, UNIX assumes
you are using a vt100 type terminal. This terminal type works for
most applications, including SPSS. You do not need to set your terminal
type unless you get the error message:
TERM not found in
SPSSTERMCAP file
If you get this error
message, type:
setenv TERM vt100
[Enter]
to set your terminal
type to vt100.
Xwindows
SPSS can be used with a
point-and-click interface on OITUNIX. This requires an Xwindows terminal
emulator. Two such emulators that have been tested on OITUNIX are
X-Win32 and vnc. Information on getting and using these is available
under XWindows Software.
II. Components of an SPSS job
A. input files
The typical SPSS job requires
that two files be prepared in advance.
1. The data file
The data file contains
the results of the research which are to be analyzed. It should
be coded in a form which the computer can read. See Handout: H81
for a description of how to prepare your data for statistical analysis,
and how to enter it into the computer.
2. SPSS instructions
(I-file)
The instructions which
tell SPSS what to do with the data can be prepared in advance (using
an editor, such as emacs or vi) and stored in a file, or they can
be typed and edited in the SPSS Manager. Sections IV and VI of this
introduction will help you learn a basic set of these instructions.
Section VII will show you how to work with the Manager.
B. Running SPSS
C. Output File
The output file which SPSS
creates contains a listing of your instructions, with interpretations
where appropriate, error messages if there were any mistakes in the
instruction file, and the results of the analysis if there were no
syntax errors.
If you are using batch mode, you may use an editor or the more
command to look at the output file. Make a note of any error messages,
so you can make the corrections in the instruction file. If there
are error messages, you must go back and edit the instruction file
to correct the errors, and then re-run the job.
When there are no more errors, you can get a printed copy of your
results using the lpr command:
lpr ofile
[Enter]
where ofile
is the name of your output file
The output will be in the
OIT I/O area (LGRC room A106) in about half an hour.
III. The Data File
A. A sample data set
A small dataset has been
prepared for use in this tutorial. It is listed in the MINITAB STUDENT
HANDBOOK (Ryan/Joiner/Ryan, Duxbury Press, 1976, p.285). The data
represent the results of a class 'experiment'. Each student measured
his own pulse rate. A randomly selected (by coin flip) part of the
class ran in place for one minute. Then, each student measured his
pulse again. The results, along with some other information on each
student were recorded in the following manner:
| ITEM |
COLUMNS |
CODING |
| first pulse |
1-3 |
beats per minute |
| second pulse |
6-8 |
beats per minute |
| group |
11 |
1=ran in place;
2=did not run; |
| smoking |
14 |
1=yes; 2=no; |
| gender |
17 |
1=male; 2=female; |
| height |
20-24 |
height in inches |
| weight |
27-29 |
weight in pounds |
| activity level |
32 |
1=slight; 2=moderate;
3=lot; |
The data is in a file
called minidat.dat on Username evagold, and has
been made public. To get a copy of this file on your account, use
the cp command:
cp /oitstaff/evagold/minidat.dat
minidat.dat [Enter]
or, the shorter version:
cp ~evagold/minidat.dat
minidat.dat [Enter]
where the tilda (~)
does the work of finding the path.
You can use the ls command
to check that the file is now on your directory.
B. Cases and Variable
SPSS (as most statistical
programs) thinks of the data in terms of 'cases' and 'variables'.
- A 'case ' is the unit
of analysis: e.g. a subject, an experimental animal, etc. In the
context of a matched pairs study, the pair, rather than the individual
may be defined as the case. In the sample data set, minidat.dat,
there are 91 students, and each student is a 'case'.
- 'Variables' are the
measurements that are recorded about each case. The sample data
set has eight variables.
The data is recorded
with the cases as the rows of the data file, and the variables as
the columns. If there are so many variables that the rows become
unwieldy, you may use two or more adjacent rows to record all the
information for one case. This, however, does not change the basic
concept of 'case' and 'variable'.
IV. SPSS Instructions
The commands in this section
should be entered into a newly created file, which will tell SPSS what
you want it to do with the data file. This file will be referred to
as your instruction file (or Ifile). You can create
this file with an editor (such as emacs or vi) for batch runs, or type
it into the SPSS Manager input window (see Section VII).
A. Data definition
In order for SPSS to be
able to do anything with your data, you must tell it where the data
is, and how and where the variables are coded. Assuming that you have
copied the practice dataset
to your user number (still with the name minidat.dat), the following
instructions will tell SPSS how to read this dataset. In the examples
below, SPSS commands and keywords are capitalized, and names that
will vary by dataset are in lower case. However, SPSS does not care
whether your instructions are in upper or lower case.
DATA LIST FILE=minidat.dat FIXED/puls1 1-3 puls2 6-8 run 11
smoke 14 sex 17 ht 20-24 wt 27-29 activity 32.
MISSING VALUES puls1,puls2(0)/smoke,sex(9)/.
General Syntax Rules
All SPSS commands begin
with a COMMAND, which must be spelled correctly, and must begin
in column 1. The command tells SPSS the task which must be done.
After the command, there is usually a 'specification field', which
is used to give the details of that task. The specification is the
part that you fill in, according to the nature of your data. If
you cannot fit the entire specification on one line, just continue
on the next line, leaving at least one blank space at the beginning
of the continuation line. Do not break a line in the middle of a
variable name or a label. You must end the specification with a
period. The lines of the instruction file cannot be wider than 80
characters. (This is not true of the data file.)
DATA LIST
The DATA LIST
command gives names to each of the items (variables) on the data
file, and tells SPSS the columns in which they are found. The FILE=
keyword tells SPSS the name of the data file. If you do not specify
a path, the data file is assumed to be in your current directory.
If the data file is in another directory, you must specify the path
as part of the file name, and the entire name must be enclosed in
apostrophes. For example, file='sub1/minidat.dat' would
access the file minidat.dat from the subdirectory, sub1,
of the current directory. Filenames that have UPPERCASE letters
must also be enclosed in apostrophes. In some situations, you will
also need a FILE HANDLE command. See Appendix
for more information on referring to files and when you need a FILE HANDLE.
The keyword FIXED tells SPSS that the information
for each case (subject) has been entered into the same columns.
This is the most common and versatile method of coding data. An
alternative to FIXED is LIST.
The data file is in LIST format if the variables
are not necessarily in the same columns for each case, but are
simply separated from each other by one or more blanks. In LIST
formatted data, all the variables for a case must be recorded
on one physical line.
For each recorded data item, the user must choose a name, which
may be any combination of numbers and letters, as long as the
first character is a letter, and the name is not more than 8 characters
long. In FIXED formatted data, the name of each
variable is followed by its column location.
MISSING VALUES
This command tells SPSS
what was coded for missing data for each variable. SPSS leaves out
of all computations any items which are coded with the value specified
as missing for that variable. If a variable does not appear on a
MISSING VALUES command, all of its values will be
used in computations. If there are no missing values in the data
file, you may leave out this command entirely. In FIXED formatted
data, if the columns that are assigned to a variable are left entirely
blank, SPSS will set the value of that variable to missing, even
without a MISSING VALUES statement. Leaving a variable
blank is not permitted in LIST formatted data, since
a blank is merely a separator between variables in this type of
data file.
B. Procedures
The three commands above
are sufficient to enable SPSS to read the data, but they do not
request that any calculations be done. For that, you need to ask
for one or more procedures.
Usually, before doing any fancy analysis, it is a good idea to get
some simple descriptive statistics. This will reveal if there are
any serious problems with the way the data are being read, and also
give you a general idea of what the data are like. The following
two commands will generate descriptive statistics for the sample
dataset.
FREQUENCIES VARIABLES=run smoke sex activity.
DESCRIPTIVES VARIABLES=puls1 puls2 ht wt.
FREQUENCIES
This procedure simply
tallies all the different values of the variables that are listed
in the specification. It is suitable for variables that have only
a few possible values.
DESCRIPTIVES
This procedure computes
means, standard deviations, and a number of other descriptive statistics
which are appropriate for 'continuous' variables.
V. Running SPSS in Batch Mode
When the data file and the
instruction file have been saved, you are ready to run SPSS. To do this,
type:
spss -m <ifile
>ofile [Enter]
where
ifile is the name of your SPSS instructions
file.
SPSS will write the results
of the run to 'ofile'. If there are errors
in the instructions, the error messages will appear on the screen,
and will also be written to 'ofile'. Use the editor to examine 'ofile'.
If there are error messages, you must go back and edit the instructions
file to make the corrections, then re-run the job. Keep doing this
until you get the output you want.
If you have a very large
SPSS job, which takes a long time to complete, you may want to run
it as a detached job. See Appendix for instructions on how to do this.
Exercise 1
Enter the commands (described
in Section IV) to define the practice dataset and generate descriptive
statistics. Then run the job, and examine your output.
VI. SPSS Instructions -- continued
A. Adding labels to the output
Notice that in the FREQUENCIES
output you just got, the values of the variables are numeric, with
no indication of what the numbers represent. E.g., you now know how
many subjects were of sex 1 and how many were of sex 2, but this is
not very helpful to anyone who does not know whether males were coded
1 and females 2, or vice versa. You can get the output to be labeled
'male' and 'female' by adding a VALUE LABELS command
to the instruction file.
VALUE LABELS
run 1 'yes' 2 'no'/sex 1 'male' 2 'female'/.
Similarly, while the
names of some variables suggest what information they contain (smoke,
sex), others may be uninformative. The variables may be labeled
on the output using the VAR LABELS command.
VAR LABELS puls1 'initial pulse rate' puls2 'second pulse '
run 'experimental group'.
Labeling commands should
go in the instruction file somewhere after the DATA LIST,
but before the first procedure.
B. Exercise 2
Go back to the instruction file created in exercise 1, and add labels
to the variables and values as appropriate. The commands above will
start you off. Be sure to insert the labeling commands before the
procedures. Then re-run the job and compare the two outputs.
C. Transformations - RECODE & COMPUTE
Sometimes it is necessary
to change the data in some way, or to use it to calculate some new
data. Such changes are accomplished using 'transformation' statements.
For example, in the practice dataset, we might decide that for
some analyses, we do not want to distinguish the first two levels
of physical activity. Also, we could take the difference of the
two pulse rates and use that as the variable to be analyzed. The
following commands will accomplish these transformations:
RECODE activity (1,2=1) (3=2) INTO activ2.
COMPUTE pulsdiff=puls2-puls1.
RECODE
The RECODE command creates
a new variable, activ2, which combines the first two levels of activity
into code 1, and moves the third level into code 2. All other values
(if any) become missing in activ2. The original variable is still
available. If there is no need to keep the original variable, you
can leave out the INTO clause. In this case, however, any values
of activity which are not recoded retain their original value. There
are a number of special keywords you can use on the RECODE
statement. The following example illustrates the use of these keywords:
RECODE ht (MISSING=9) (40 THRU 60=1) (60 THRU 66=2)
(66 THRU 70=3) (70 THRU HI=4) (ELSE=9).
This changes ht
from an interval to a categorical variable, with 9 as the new
missing value. (Of course, it is now necessary to have another
MISSING VALUES statement to establish 9 as the
new missing value for ht.)
COMPUTE
General Rules about
Transformations
Transformations can be
done any time after the original variables have been defined with
the DATA LIST, and before the procedure(s) which
will use the transformed data. However it is more efficient to have
them all before the first procedure, rather than interspersing them
among procedures. Transformations do not change the original data
file in any way. The transformed
data is available only for the duration of the run. (Example 2 shows
how to create a 'system file', which makes your transformations
permanent.) Transformations compute something on each individual
case, while procedures compute something based on all (or a selected
group of) cases.
D. More Transformations - IF
The IF
statement can be used to compute a variable based on some logical
condition. For example, the following series of commands could be
used to create a new variable called GP, which will have
value 1 for people who smoke and don't exercise, 3 for those that
do not smoke and exercise a lot, and 2 for everyone else:
COMPUTE gp=2.
IF (smoke EQ 1 AND activity LE 2) gp=1.
IF (smoke EQ 2 AND activity EQ 3) gp=3.
In general, you can
construct logical conditions out of the six comparison operations
EQ,NE,LT,LE,GT,GE (which stand for 'equal', 'not equal',
'less than', 'less than or equal', 'greater than' and 'greater than
or equal'). These conditions can be further combined using AND,
OR, and NOT. For each case, if the result of the logical
expression is true, the computation on the right is done; otherwise,
it is not done. Thus in the above example, if an individual has
value 1 for smoke and value 3 for activity, neither of the IF
statements is satisfied. Therefore, the value of gp is
left as 2.
Whenever you use more
than one of the connectors (AND, OR, NOT), you should use
parentheses to make the logic clear. For example:
IF ((smoke EQ 1 AND activity EQ 3) OR (smoke EQ 2 AND
activity LE 2)) gp=2.
E. More Procedures
Some other procedures are
needed to describe this data adequately. First, since the experimental
condition was determined by coin toss, we would hope that smoking
and the different physical activity levels are about equally distributed
in the two groups. After checking that assumption, we will plot the
second pulse rate against the first, controlling for experimental
condition, and get the average difference in pulse rates controlling
for experimental condition and physical activity.
CROSSTABS TABLES=smoke,activity by run/CELLS=COUNT,ROW.
PLOT PLOT=puls2 WITH puls1 BY run.
MEANS TABLES=pulsdiff BY run BY activity.
CROSSTABS
The CROSSTABS procedure
tabulates how many cases fall into each possible combination of
the variables listed in the TABLES= clause. The
above crosstabs requests two tables: one for smoke by run, and one
for activity by run.
OPTIONS and STATISTICS
All optional output is
requested by subcommands. For example, the subcommand /CELLS=COUNT,ROW
requests row percents in addition to the count in each cell; i.e.
percent of smokers and percent of people in each activity level
that fall into each experimental group. Options and Statistics lists
as used in version 2.2 and earlier are recognized (in batch mode
only), but should not be used in new programs.
PLOT
The PLOT procedure produces
scattergrams of two variables, with (optionally) a third control
variable. If there is a control variable, the first letter of the
value label of each of its values is used to label the points on
the plot.
MEANS
This procedure is used
to get means and standard deviations of a variable for each of several
groups defined by a control variable.
F. Exercise 3
Modify your instructions file to include the transformations and procedures
listed in sections VI.C and VII.E. Example 1 shows what your Ifile
should look like. Run the job and make sure you understand the output.
G. Case Selection
Sometimes you need to do
some analysis of a subset of your cases. This is easily accomplished
using the SELECT IF command. This command selects those cases that
satisfy some logical expression, which is formed in the same way as
in the IF statement described in section VII.D. All other
cases are not used in the analysis. For example,
SELECT IF (run
EQ 1 AND sex EQ 1).
will limit the analysis
to just the male subjects who ran in place.
The SELECT IF
command may be placed anywhere after the data definition commands,
and REMAINS IN EFFECT FOR THE REST OF THAT RUN. In other words,
if there is a SELECT IF command in the instruction
file, all procedures in that instruction file which are anywhere
after the SELECT IF will be limited to the selected
cases. (Procedures which precede the SELECT IF command
are not affected by it.) If you will later have to use all your
cases again, or use some other subset of the cases, you can do one
of two things:
- Remember that your
input data file is not changed in any permanent way by any transformations
or selections that are in your Ifile. Therefore, you can always
run a different set of transformations or selections simply by
changing your Ifile, and re-running the job.
- The TEMPORARY
command can be included in the Ifile before any
set of transformations and/or selections, and will limit the scope
of those transformations/selections to just the next procedure.
It applies to all transformations e.g. RECODE, COMPUTE,
IF, as well as SELECT IF) which are in the Ifile
between the TEMPORARY command and the next procedure.
For example, the following
series of commands requests descriptive statistics on the variable
'pulsdiff' for the experimental group, then the same statistics
for the control group, and finally a comparison of the two groups.
(There is no case selection in effect for the MEANS procedure,
so it will compare 'pulsdiff' for the two groups.
TEMPORARY.
SELECT IF (run EQ 1).
DESCRIPTIVES VARIABLES=pulsdiff.
TEMPORARY.
SELECT IF (run EQ 2).
DESCRIPTIVES VARIABLES=pulsdiff.
MEANS TABLES=pulsdiff BY run.
the next
section of this handout |