Statistical Consulting Center - UMass AMherst
 

Home
Site Licenses
Computer Labs
OITUNIX
Direct Purchase

Statistical Software
Instructional Materials

What's New
Current Releases
Patches


Resources

BCCUMA

Datasets

Workshops

Online Docs
SAS 8.2
SAS 9.1.3
SIR 2002

Statistical Software > SPSS on OITUNIX

Introduction to SPSS on OITUNIX

I. General Information

    SPSS (Statistical Package for Social Sciences) is a general purpose statistical program which can be used to analyze a wide variety of research data. It contains many statistical procedures, ranging from simple descriptive statistics (e.g. means, standard deviations, frequencies) to specialized statistical techniques.

    This handout is for SPSS release 6.1.4 for the UNIX operating system running on OITUNIX. You need some familiarity with Unix, and with an editor available on OITUNIX.

    The UNIX operating system is sensitive to upper/lower case. Therefore, all UNIX commands should be typed as shown in the handout. SPSS is not case senstive, so all SPSS commands can be typed upper or lower case [except for file names which must reference the file exactly as the name is on the disk].

    Documentation

      There are five manuals that document the SPSS base system:

    1. SPSS 6.1 Syntax Reference Guide (An alphabetical reference to all SPSS commands in the base system)

    2. SPSS Professional Statistics 6.1(Documents the following analyses: cluster, K-means cluster, discriminant, factor, multidimensional scaling, proximity and reliability)

    3. SPSS Advanced Statistics 6.1 (Documents logistic regression, log linear analyses, multivariate analysis of variance, constrained nonlinear regression, probit analysis, Cox regression and Kaplan-Meier and actuarial survival analyses)

    4. SPSS 6.1 Base System User's Guide, Part 1, UNIX Version (Documents the motif graphical user interface and running SPSS through the Manager)

    5. SPSS 6.1 Base System User's Guide, Part 2 (Documents in detail the procedures in statistics and graphs)

    Online Help

      There is an online help facility for SPSS. Type

      spss -m [Enter]

      to start SPSS, then type

      HELP [Enter]

      You will get a screen of the topics on which online help is available, and instructions on using online help. You can also type a question mark, followed by the name of a topic (e.g ?CROSSTABS) to get the help screen for that topic. The online help does not eliminate the need for the manual, but will provide a brief reminder of the syntax of each command. To exit from online help, press return until you get the SPSS> prompt, then type

      FINISH [Enter]

    Implementation

      The following commands which are in the SPSS manual are not available on this system:
      • GET BMDP
      • GET SCSS SAVE SCSS
      • TABLES LISREL
      • GET TRANSLATE SAVE TRANSLATE

    Terminal set-up

      vt100 emulation

        By default, UNIX assumes you are using a vt100 type terminal. This terminal type works for most applications, including SPSS. You do not need to set your terminal type unless you get the error message:

        TERM not found in SPSSTERMCAP file

        If you get this error message, type:

        setenv TERM vt100 [Enter]

        to set your terminal type to vt100.

      Xwindows

        SPSS can be used with a point-and-click interface on OITUNIX. This requires an Xwindows terminal emulator. Two such emulators that have been tested on OITUNIX are X-Win32 and vnc. Information on getting and using these is available under XWindows Software.

    II. Components of an SPSS job

      A. input files

        The typical SPSS job requires that two files be prepared in advance.

        1. The data file

          The data file contains the results of the research which are to be analyzed. It should be coded in a form which the computer can read. See Handout: H81 for a description of how to prepare your data for statistical analysis, and how to enter it into the computer.

        2. SPSS instructions (I-file)

          The instructions which tell SPSS what to do with the data can be prepared in advance (using an editor, such as emacs or vi) and stored in a file, or they can be typed and edited in the SPSS Manager. Sections IV and VI of this introduction will help you learn a basic set of these instructions. Section VII will show you how to work with the Manager.

      B. Running SPSS

        1. In batch mode

          If you have saved a file with the SPSS instructions, you can run SPSS in batch mode. You tell SPSS to process the data by typing:

          nice spss -m <ifile >ofile [Enter]

            where
          • ifile is the name of the file containing the SPSS instructions,
          • ofile is the name of a file which SPSS will create, and where it will put the results of the analysis. If a file with that name already exists on your account, SPSS will replace it with the new output.

            For a list of SPSS parameters, see Appendix.

          2. Interactively

            To start SPSS with the Xwindows (Motif) interface, open an Xterm window and type

            spss +x [Enter]

            See the SPSS 6.1 Base System User's Guide, Part 1, UNIX, for how to work with the Xwindows interface.

            Without any parameters, spss starts interactively with the SPSS Manager interface. This interface is not recommended.

            spss [Enter]

      C. Output File

        The output file which SPSS creates contains a listing of your instructions, with interpretations where appropriate, error messages if there were any mistakes in the instruction file, and the results of the analysis if there were no syntax errors.

        If you are using batch mode, you may use an editor or the more command to look at the output file. Make a note of any error messages, so you can make the corrections in the instruction file. If there are error messages, you must go back and edit the instruction file to correct the errors, and then re-run the job.

        When there are no more errors, you can get a printed copy of your results using the lpr command:

        lpr ofile [Enter]

          where ofile is the name of your output file
        The output will be in the OIT I/O area (LGRC room A106) in about half an hour.

    III. The Data File

      A. A sample data set

        A small dataset has been prepared for use in this tutorial. It is listed in the MINITAB STUDENT HANDBOOK (Ryan/Joiner/Ryan, Duxbury Press, 1976, p.285). The data represent the results of a class 'experiment'. Each student measured his own pulse rate. A randomly selected (by coin flip) part of the class ran in place for one minute. Then, each student measured his pulse again. The results, along with some other information on each student were recorded in the following manner:

        ITEM COLUMNS CODING
        first pulse 1-3 beats per minute
        second pulse 6-8 beats per minute
        group 11 1=ran in place; 2=did not run;
        smoking 14 1=yes; 2=no;
        gender 17 1=male; 2=female;
        height 20-24 height in inches
        weight 27-29 weight in pounds
        activity level 32 1=slight; 2=moderate; 3=lot;

        The data is in a file called minidat.dat on Username evagold, and has been made public. To get a copy of this file on your account, use the cp command:

        cp /oitstaff/evagold/minidat.dat minidat.dat [Enter]

        or, the shorter version:

        cp ~evagold/minidat.dat minidat.dat [Enter]

          where the tilda (~) does the work of finding the path.

        You can use the ls command to check that the file is now on your directory.

      B. Cases and Variable

        SPSS (as most statistical programs) thinks of the data in terms of 'cases' and 'variables'.

        • A 'case ' is the unit of analysis: e.g. a subject, an experimental animal, etc. In the context of a matched pairs study, the pair, rather than the individual may be defined as the case. In the sample data set, minidat.dat, there are 91 students, and each student is a 'case'.

        • 'Variables' are the measurements that are recorded about each case. The sample data set has eight variables.

        The data is recorded with the cases as the rows of the data file, and the variables as the columns. If there are so many variables that the rows become unwieldy, you may use two or more adjacent rows to record all the information for one case. This, however, does not change the basic concept of 'case' and 'variable'.

    IV. SPSS Instructions

      The commands in this section should be entered into a newly created file, which will tell SPSS what you want it to do with the data file. This file will be referred to as your instruction file (or Ifile). You can create this file with an editor (such as emacs or vi) for batch runs, or type it into the SPSS Manager input window (see Section VII).

      A. Data definition

        In order for SPSS to be able to do anything with your data, you must tell it where the data is, and how and where the variables are coded. Assuming that you have copied the practice dataset
        to your user number (still with the name minidat.dat), the following instructions will tell SPSS how to read this dataset. In the examples below, SPSS commands and keywords are capitalized, and names that will vary by dataset are in lower case. However, SPSS does not care whether your instructions are in upper or lower case.

        
        DATA LIST FILE=minidat.dat FIXED/puls1 1-3 puls2 6-8 run 11
        
                smoke 14 sex 17 ht 20-24 wt 27-29 activity 32.
        
        MISSING VALUES puls1,puls2(0)/smoke,sex(9)/.
        
        

        General Syntax Rules

          All SPSS commands begin with a COMMAND, which must be spelled correctly, and must begin in column 1. The command tells SPSS the task which must be done. After the command, there is usually a 'specification field', which is used to give the details of that task. The specification is the part that you fill in, according to the nature of your data. If you cannot fit the entire specification on one line, just continue on the next line, leaving at least one blank space at the beginning of the continuation line. Do not break a line in the middle of a variable name or a label. You must end the specification with a period. The lines of the instruction file cannot be wider than 80 characters. (This is not true of the data file.)

        DATA LIST

          The DATA LIST command gives names to each of the items (variables) on the data file, and tells SPSS the columns in which they are found. The FILE= keyword tells SPSS the name of the data file. If you do not specify a path, the data file is assumed to be in your current directory. If the data file is in another directory, you must specify the path as part of the file name, and the entire name must be enclosed in apostrophes. For example, file='sub1/minidat.dat' would access the file minidat.dat from the subdirectory, sub1, of the current directory. Filenames that have UPPERCASE letters must also be enclosed in apostrophes. In some situations, you will also need a FILE HANDLE command. See Appendix for more information on referring to files and when you need a FILE HANDLE.

          The keyword FIXED tells SPSS that the information for each case (subject) has been entered into the same columns. This is the most common and versatile method of coding data. An alternative to FIXED is LIST. The data file is in LIST format if the variables are not necessarily in the same columns for each case, but are simply separated from each other by one or more blanks. In LIST formatted data, all the variables for a case must be recorded on one physical line.

          For each recorded data item, the user must choose a name, which may be any combination of numbers and letters, as long as the first character is a letter, and the name is not more than 8 characters long. In FIXED formatted data, the name of each variable is followed by its column location.

        MISSING VALUES

          This command tells SPSS what was coded for missing data for each variable. SPSS leaves out of all computations any items which are coded with the value specified as missing for that variable. If a variable does not appear on a MISSING VALUES command, all of its values will be used in computations. If there are no missing values in the data file, you may leave out this command entirely. In FIXED formatted data, if the columns that are assigned to a variable are left entirely blank, SPSS will set the value of that variable to missing, even without a MISSING VALUES statement. Leaving a variable blank is not permitted in LIST formatted data, since a blank is merely a separator between variables in this type of data file.

      B. Procedures

        The three commands above are sufficient to enable SPSS to read the data, but they do not request that any calculations be done. For that, you need to ask for one or more procedures.

        Usually, before doing any fancy analysis, it is a good idea to get some simple descriptive statistics. This will reveal if there are any serious problems with the way the data are being read, and also give you a general idea of what the data are like. The following two commands will generate descriptive statistics for the sample dataset.

        
        
        
        FREQUENCIES VARIABLES=run smoke sex activity.
        
        DESCRIPTIVES VARIABLES=puls1 puls2 ht wt.
        
        
        
        

        FREQUENCIES

          This procedure simply tallies all the different values of the variables that are listed in the specification. It is suitable for variables that have only a few possible values.

        DESCRIPTIVES

          This procedure computes means, standard deviations, and a number of other descriptive statistics which are appropriate for 'continuous' variables.

    V. Running SPSS in Batch Mode

      When the data file and the instruction file have been saved, you are ready to run SPSS. To do this, type:

      spss -m <ifile >ofile [Enter]

        where
        ifile is the name of your SPSS instructions file.

      SPSS will write the results of the run to 'ofile'. If there are errors in the instructions, the error messages will appear on the screen, and will also be written to 'ofile'. Use the editor to examine 'ofile'. If there are error messages, you must go back and edit the instructions file to make the corrections, then re-run the job. Keep doing this until you get the output you want.

      If you have a very large SPSS job, which takes a long time to complete, you may want to run it as a detached job. See Appendix for instructions on how to do this.

      Exercise 1

        Enter the commands (described in Section IV) to define the practice dataset and generate descriptive statistics. Then run the job, and examine your output.

    VI. SPSS Instructions -- continued

      A. Adding labels to the output

        Notice that in the FREQUENCIES output you just got, the values of the variables are numeric, with no indication of what the numbers represent. E.g., you now know how many subjects were of sex 1 and how many were of sex 2, but this is not very helpful to anyone who does not know whether males were coded 1 and females 2, or vice versa. You can get the output to be labeled 'male' and 'female' by adding a VALUE LABELS command to the instruction file.

        VALUE LABELS run 1 'yes' 2 'no'/sex 1 'male' 2 'female'/.

        Similarly, while the names of some variables suggest what information they contain (smoke, sex), others may be uninformative. The variables may be labeled on the output using the VAR LABELS command.

        
        VAR LABELS puls1 'initial pulse rate' puls2 'second pulse '
        
             run 'experimental group'.
        
        
        
        

        Labeling commands should go in the instruction file somewhere after the DATA LIST, but before the first procedure.

      B. Exercise 2

        Go back to the instruction file created in exercise 1, and add labels to the variables and values as appropriate. The commands above will start you off. Be sure to insert the labeling commands before the procedures. Then re-run the job and compare the two outputs.

      C. Transformations - RECODE & COMPUTE

        Sometimes it is necessary to change the data in some way, or to use it to calculate some new data. Such changes are accomplished using 'transformation' statements.

        For example, in the practice dataset, we might decide that for some analyses, we do not want to distinguish the first two levels of physical activity. Also, we could take the difference of the two pulse rates and use that as the variable to be analyzed. The following commands will accomplish these transformations:

        
        RECODE activity (1,2=1) (3=2) INTO activ2.
        
             COMPUTE pulsdiff=puls2-puls1.
        
        
        
        

        RECODE

          The RECODE command creates a new variable, activ2, which combines the first two levels of activity into code 1, and moves the third level into code 2. All other values (if any) become missing in activ2. The original variable is still available. If there is no need to keep the original variable, you can leave out the INTO clause. In this case, however, any values of activity which are not recoded retain their original value. There are a number of special keywords you can use on the RECODE statement. The following example illustrates the use of these keywords:

          
          RECODE ht (MISSING=9) (40 THRU 60=1) (60 THRU 66=2)
          
          (66 THRU 70=3) (70 THRU HI=4) (ELSE=9).
          
          
          
          

          This changes ht from an interval to a categorical variable, with 9 as the new missing value. (Of course, it is now necessary to have another MISSING VALUES statement to establish 9 as the new missing value for ht.)

        COMPUTE

          The COMPUTE command creates a new variable, pulsdiff, which is the difference of the post and pre pulse rates. If either of those pulse rates was missing, the difference will be recognized as missing. The COMPUTE statement can combine the various arithmetic operations (+,-,*,/,**) along with parentheses into more complex statements, which are evaluated according to standard algebraic rules. A variety of 'built-in functions' is also available e.g. (SQRT, LG10, SIN). For example, the following statement could be used to create variable HYP:

          COMPUTE hyp=SQRT (ht**2 + wt*wt).

        General Rules about Transformations

          Transformations can be done any time after the original variables have been defined with the DATA LIST, and before the procedure(s) which will use the transformed data. However it is more efficient to have them all before the first procedure, rather than interspersing them among procedures. Transformations do not change the original data file in any way. The transformed
          data is available only for the duration of the run. (Example 2 shows how to create a 'system file', which makes your transformations permanent.) Transformations compute something on each individual case, while procedures compute something based on all (or a selected group of) cases.

      D. More Transformations - IF

        The IF statement can be used to compute a variable based on some logical condition. For example, the following series of commands could be used to create a new variable called GP, which will have value 1 for people who smoke and don't exercise, 3 for those that do not smoke and exercise a lot, and 2 for everyone else:

        
        COMPUTE gp=2.
        
        IF (smoke EQ 1 AND activity LE 2) gp=1.
        
        IF (smoke EQ 2 AND activity EQ 3) gp=3.
        
        
        
        

        In general, you can construct logical conditions out of the six comparison operations EQ,NE,LT,LE,GT,GE (which stand for 'equal', 'not equal', 'less than', 'less than or equal', 'greater than' and 'greater than or equal'). These conditions can be further combined using AND, OR, and NOT. For each case, if the result of the logical expression is true, the computation on the right is done; otherwise, it is not done. Thus in the above example, if an individual has value 1 for smoke and value 3 for activity, neither of the IF statements is satisfied. Therefore, the value of gp is left as 2.

        Whenever you use more than one of the connectors (AND, OR, NOT), you should use parentheses to make the logic clear. For example:

        
        
        
        IF ((smoke EQ 1 AND activity EQ 3) OR (smoke EQ 2 AND
        
             activity LE 2)) gp=2.
        
        

      E. More Procedures

        Some other procedures are needed to describe this data adequately. First, since the experimental condition was determined by coin toss, we would hope that smoking and the different physical activity levels are about equally distributed in the two groups. After checking that assumption, we will plot the second pulse rate against the first, controlling for experimental condition, and get the average difference in pulse rates controlling for experimental condition and physical activity.

        
        
        
        CROSSTABS TABLES=smoke,activity by run/CELLS=COUNT,ROW.
        
        PLOT PLOT=puls2 WITH puls1 BY run.
        
        MEANS TABLES=pulsdiff BY run BY activity.
        
        
        
        

        CROSSTABS

          The CROSSTABS procedure tabulates how many cases fall into each possible combination of the variables listed in the TABLES= clause. The above crosstabs requests two tables: one for smoke by run, and one for activity by run.

        OPTIONS and STATISTICS

          All optional output is requested by subcommands. For example, the subcommand /CELLS=COUNT,ROW requests row percents in addition to the count in each cell; i.e. percent of smokers and percent of people in each activity level that fall into each experimental group. Options and Statistics lists as used in version 2.2 and earlier are recognized (in batch mode only), but should not be used in new programs.

        PLOT

          The PLOT procedure produces scattergrams of two variables, with (optionally) a third control variable. If there is a control variable, the first letter of the value label of each of its values is used to label the points on the plot.

        MEANS

          This procedure is used to get means and standard deviations of a variable for each of several groups defined by a control variable.

      F. Exercise 3

        Modify your instructions file to include the transformations and procedures listed in sections VI.C and VII.E. Example 1 shows what your Ifile should look like. Run the job and make sure you understand the output.

      G. Case Selection

        Sometimes you need to do some analysis of a subset of your cases. This is easily accomplished using the SELECT IF command. This command selects those cases that satisfy some logical expression, which is formed in the same way as in the IF statement described in section VII.D. All other cases are not used in the analysis. For example,

        SELECT IF (run EQ 1 AND sex EQ 1).

        will limit the analysis to just the male subjects who ran in place.

        The SELECT IF command may be placed anywhere after the data definition commands, and REMAINS IN EFFECT FOR THE REST OF THAT RUN. In other words, if there is a SELECT IF command in the instruction file, all procedures in that instruction file which are anywhere after the SELECT IF will be limited to the selected cases. (Procedures which precede the SELECT IF command are not affected by it.) If you will later have to use all your cases again, or use some other subset of the cases, you can do one of two things:

        1. Remember that your input data file is not changed in any permanent way by any transformations or selections that are in your Ifile. Therefore, you can always run a different set of transformations or selections simply by changing your Ifile, and re-running the job.

        2. The TEMPORARY command can be included in the Ifile before any set of transformations and/or selections, and will limit the scope of those transformations/selections to just the next procedure. It applies to all transformations e.g. RECODE, COMPUTE, IF, as well as SELECT IF) which are in the Ifile between the TEMPORARY command and the next procedure.

        For example, the following series of commands requests descriptive statistics on the variable 'pulsdiff' for the experimental group, then the same statistics for the control group, and finally a comparison of the two groups. (There is no case selection in effect for the MEANS procedure, so it will compare 'pulsdiff' for the two groups.

        
        
        
        TEMPORARY.
        
        SELECT IF (run EQ 1).
        
        DESCRIPTIVES VARIABLES=pulsdiff.
        
        TEMPORARY.
        
        SELECT IF (run EQ 2).
        
        DESCRIPTIVES VARIABLES=pulsdiff.
        
        MEANS TABLES=pulsdiff BY run.
        
        
        
        


the next section of this handout


© 2004 University of Massachusetts Amherst. Site Policies.