Social Sciences Research and Instructional Council
Teaching Resources Depository

SPSS for Windows, Version 7.5: A Brief Tutorial
(Hypertext Version)

Chapter Eight: Multivariate Analysis

© The Authors, 1998; Last modified 15 August 1998
Crosstabulation Revisited

Simple crosstabs (contingency tables), which examine the influence of one variable on another, should be only the first step in the analysis of social science data. It is fun to hypothesize that the more conservative a person’s political orientation the more likely they are to oppose abortion, run the crosstabs, and then conclude you were right. However, this one-step method of hypothesis testing is very limited.

Or suppose you hypothesize that men and women differed significantly in their belief that the ability to think for one’s self (GSS96A.SAV variable name = THNKSELF) was an important value to instill in children. The crosstabs for THNKSELF and SEX show that while a slight majority (51%) of all respondents reported that this was the most important value among those listed (to be popular, to obey, to help others, to work hard), only 45% of the men surveyed agreed with this compared with 57% of the women (see Figure 8-1).

Figure 8-1

This percentage point difference (epsilon) of 12 is "interesting," even if you don't yet know whether it is statistically significant. Can you conclude that gender is the causal factor here? While it may indeed be true that gender is explanatory, you won’t really know this until you have failed to account for this variation in any other way. To do this, run crosstabs of (i.e., "control for") other independent variables to see if something else might account for this variation among respondents.

Recall that your original crosstabs procedure produces one contingency table, with as many rows as there are categories (or values) of the dependent variable, and as many columns as there are categories of the independent variable. So in Figure 8-1, we have a 5 by 2 table. When you start using control (sometimes called test) variables, you will get as many separate tables as there are categories of the control variable. For instance, if you want to control for levels of education, and simply used EDUC as the control variable, you end up with 20 separate tables. This is NOT a good idea. Try doing this to see what I mean. Notice how difficult it is to compare across this many tables. So before you do any further analysis, recode your variables into the smallest number of categories that are still logically useful.

In this next example EDUC was recoded as EDUCR into three categories (0-11 years, 12 years, more than 12 years). THNKSELF was recoded as THNKR into two categories (most important, other). After you have done these recodes, let’s see what happens when we do crosstabs again, this time controlling for education. To do the appropriate crosstabs, go to the Statistics, Summarize, Crosstabs menu and double-click. Enter THNKR into the Row box and SEX into the Column box. (Recall that this is how you generate one contingency table.) Now you are ready for the next step, the addition of a control variable. Choose EDUCR from your variables list and enter it into the empty box at the bottom of the Crosstabs screen. Figure 8-2 shows you what this will look like.

Figure 8-2

The SPSS output for this procedure is shown in Figure 8-3.

Figure 8-3

Note that there are now three tables, one for each value of EDUCR. If you want to produce more three-way tables, just move the variables from the variable list into that third box. [If you want to produce 4-way or more tables, click on the Next box, just to the right of "Layer 1 of 1." The box that had previously shown EDUCR would now be empty, and you could add in your fourth variable (perhaps RACE, recoded as White-Nonwhite). Your first table would show THNKR by SEX for whites with 0-11 years of education, then for 12 years, then 12+ years, then non-whites with 0-11 years, etc., for a total of six tables.]

Figure 8-1 shows the original, or zero-order contingency table of the relationship between THNKSELF (unrecoded) and SEX.

Figure 8-3 shows the three partial tables that resulted from the recoded THNKR crosstabbed by SEX, controlling for EDUCR.

First note that there is a big difference among respondents at each of the three educational levels. Only a third (34%) of the respondents with less than a high school education thought that thinking for oneself was the most important value to instill in children. Compare this with the three out of five (61%) with 13 or more years of education who did think this was most important. Also note that as education increases, women are more likely than men to say that thinking for oneself is the most important value. It appears here that educational level seems to explain more than does gender. Try other variables as a control to see what happens. As a general rule, here is how to interpret what you find from this elaboration analysis:

Try some of your own three-way (or higher) tables using some of the data sets we have provided you with. Recall that for this procedure, there should be few categories for each variable, particularly your control variables (so you might need to recode), and you are limited to variables measured at, or recoded to, nominal or ordinal levels.

Multiple Regression

Once you have discovered that several of your independent variables are related to your dependent variable, you might want to try multiple regression (multiple linear regression analysis). The three-or-more-way crosstabs shown previously are more an exploratory technique, whereas multiple regression is more explanatory. With multiple regression you can generate beta values (partial regression coefficients) which give you an idea of the relative impact of each independent variable on the dependent.

You also will generate the R-squared value, which is a summary statistic of the impacts of all the independent variables taken together. Remember the important assumptions for using regression: a linear relationship between each independent variable and the dependent; a normal distribution of your variables, and variables measured at interval or ratio levels.

Go to the Statistics, Regression, Linear menu. For your dependent variable, choose THNKSELF from the variable list. For the independent variables choose EDUC (unrecoded), ATTEND, BIBLE, and SEX (see Figure 8-4).

Figure 8-4

Note that EDUC doesn’t show up in the list of independent variables, but you could use the scroll bar to find it. Now choose the "Statistics" button at the bottom of the dialog box and a new dialog box will appear, shown here in Figure 8-5 with the default options.

Figure 8-5

Click on the "Continue" button to return to Figure 8-5, then click on the "Plots" button. Your screen should now look like Figure 8-6.

Figure 8-6

Click on "Continue" and look at your next choice, which is "Save" A dialog box like Figure 8-7 appears.

Figure 8-7

Click on "Continue" and then "Options" and your screen should look like Figure 8-8, which shows the default options (then click "Continue" to return to the Linear Regression dialog box).

Figure 8-8

Your last task is to choose your method of analysis. In Figure 8-4 you will see the "Method:" button right under the "Independent(s): " box. You have several choices here, and you can use the scroll button to see what they are. "Stepwise" is the one I chose for this example, and the one that you will probably use most often. For an in-depth discussion of all the possible choices for Multiple Regression, you will need to consult the SPSS manuals.

Figure 8-9 shows you the first screen of the results in the Output window when you finally click "OK" in the Linear Regression dialog box after having chosen stepwise regression using all the default options.

Figure 8-9

For further practice, try using some of the other data bases we have included with this manual.

Back
Top
Previous Chapter
SPSS Book Table of Contents
Home