Crosstabulation RevisitedSimple crosstabs (contingency tables), which examine the influence of one variable on another, should be only the first step in the analysis of social science data. It is fun to hypothesize that the more conservative a person’s political orientation the more likely they are to oppose abortion, run the crosstabs, and then conclude you were right. However, this one-step method of hypothesis testing is very limited.
What if all the Republicans in your sample are religiously conservative and all the Democrats are atheists? Is it political party that best explains your findings, or is it religious orientation?
Or suppose you hypothesize that men and women differed significantly in their belief that the ability to think for one’s self (GSS96A.SAV variable name = THNKSELF) was an important value to instill in children. The crosstabs for THNKSELF and SEX show that while a slight majority (51%) of all respondents reported that this was the most important value among those listed (to be popular, to obey, to help others, to work hard), only 45% of the men surveyed agreed with this compared with 57% of the women (see Figure 8-1).Or what if the political conservatives as a group are much older than the liberals, would age then be the real causal factor?
Or is it some combination among all of these variables that explains the varying opinions of your respondents?
This percentage point difference (epsilon) of 12 is "interesting," even if you don't yet know whether it is statistically significant. Can you conclude that gender is the causal factor here? While it may indeed be true that gender is explanatory, you won’t really know this until you have failed to account for this variation in any other way. To do this, run crosstabs of (i.e., "control for") other independent variables to see if something else might account for this variation among respondents.
Recall that your original crosstabs procedure produces one contingency table, with as many rows as there are categories (or values) of the dependent variable, and as many columns as there are categories of the independent variable. So in Figure 8-1, we have a 5 by 2 table. When you start using control (sometimes called test) variables, you will get as many separate tables as there are categories of the control variable. For instance, if you want to control for levels of education, and simply used EDUC as the control variable, you end up with 20 separate tables. This is NOT a good idea. Try doing this to see what I mean. Notice how difficult it is to compare across this many tables. So before you do any further analysis, recode your variables into the smallest number of categories that are still logically useful.
In this next example EDUC was recoded as EDUCR into three categories (0-11 years, 12 years, more than 12 years). THNKSELF was recoded as THNKR into two categories (most important, other). After you have done these recodes, let’s see what happens when we do crosstabs again, this time controlling for education. To do the appropriate crosstabs, go to the Statistics, Summarize, Crosstabs menu and double-click. Enter THNKR into the Row box and SEX into the Column box. (Recall that this is how you generate one contingency table.) Now you are ready for the next step, the addition of a control variable. Choose EDUCR from your variables list and enter it into the empty box at the bottom of the Crosstabs screen. Figure 8-2 shows you what this will look like.
The SPSS output for this procedure is shown in Figure 8-3.
Note that there are now three tables, one for each value of EDUCR. If you want to produce more three-way tables, just move the variables from the variable list into that third box. [If you want to produce 4-way or more tables, click on the Next box, just to the right of "Layer 1 of 1." The box that had previously shown EDUCR would now be empty, and you could add in your fourth variable (perhaps RACE, recoded as White-Nonwhite). Your first table would show THNKR by SEX for whites with 0-11 years of education, then for 12 years, then 12+ years, then non-whites with 0-11 years, etc., for a total of six tables.]
Figure 8-1 shows the original, or zero-order contingency table of the relationship between THNKSELF (unrecoded) and SEX.
Figure 8-3 shows the three partial tables that resulted from the recoded THNKR crosstabbed by SEX, controlling for EDUCR.
First note that there is a big difference among respondents at each of the three educational levels. Only a third (34%) of the respondents with less than a high school education thought that thinking for oneself was the most important value to instill in children. Compare this with the three out of five (61%) with 13 or more years of education who did think this was most important. Also note that as education increases, women are more likely than men to say that thinking for oneself is the most important value. It appears here that educational level seems to explain more than does gender. Try other variables as a control to see what happens. As a general rule, here is how to interpret what you find from this elaboration analysis:
Try some of your own three-way (or higher) tables using some of the data sets we have provided you with. Recall that for this procedure, there should be few categories for each variable, particularly your control variables (so you might need to recode), and you are limited to variables measured at, or recoded to, nominal or ordinal levels.
- If the partial tables are similar to the zero-order table, you have replicated your original findings, which means that in spite of the introduction of a particular control variable, the original relationship persists. The only way to convince us that this is indeed a strong, or even causal, relationship is if you control for all the other logical independent variables you can think of, and still find essentially no differences between the zero-order tables and their partials.
- If all the partials are significantly less than those found in the original AND IF your control variable is antecedent (occurs prior in time) to both the other variables, you have found a spurious relationship and explained away the original. In other words, the original relationship was due to the influence of that other variable, not the one you hypothesized.
- If the partials are less AND IF your control variable is intervening, you have interpreted the relationship. If the time sequence between the independent and control variable is not determinable (or otherwise unclear), you don’t know whether you have explanation or interpretation, but you do know that the control variable is important.
- If one or more partials is stronger than the original relationship and one or more is weaker, you have discovered the conditions under which the original relationship is strongest. This is referred to as specification, or the interaction effect.
- If the zero order table showed weak association between the variables, you might still find strong associations in the partials (which is a good argument for keeping on with your initial analysis of the data even if you didn’t "find" anything with bivariate analysis). The addition of your control variable showed it to have been acting as a suppressor in the original table.
- Last, if a zero order table shows only a weak or moderate association, the partials might show the opposite relationship, due to the presence of a distorter variable.
Once you have discovered that several of your independent variables are related to your dependent variable, you might want to try multiple regression (multiple linear regression analysis). The three-or-more-way crosstabs shown previously are more an exploratory technique, whereas multiple regression is more explanatory. With multiple regression you can generate beta values (partial regression coefficients) which give you an idea of the relative impact of each independent variable on the dependent.
You also will generate the R-squared value, which is a summary statistic of the impacts of all the independent variables taken together. Remember the important assumptions for using regression: a linear relationship between each independent variable and the dependent; a normal distribution of your variables, and variables measured at interval or ratio levels.
Go to the Statistics, Regression, Linear menu. For your dependent variable, choose THNKSELF from the variable list. For the independent variables choose EDUC (unrecoded), ATTEND, BIBLE, and SEX (see Figure 8-4).
Note that EDUC doesn’t show up in the list of independent variables, but you could use the scroll bar to find it. Now choose the "Statistics" button at the bottom of the dialog box and a new dialog box will appear, shown here in Figure 8-5 with the default options.
Click on the "Continue" button to return to Figure 8-5, then click on the "Plots" button. Your screen should now look like Figure 8-6.
Click on "Continue" and look at your next choice, which is "Save" A dialog box like Figure 8-7 appears.
Click on "Continue" and then "Options" and your screen should look like Figure 8-8, which shows the default options (then click "Continue" to return to the Linear Regression dialog box).
Your last task is to choose your method of analysis. In Figure 8-4 you will see the "Method:" button right under the "Independent(s): " box. You have several choices here, and you can use the scroll button to see what they are. "Stepwise" is the one I chose for this example, and the one that you will probably use most often. For an in-depth discussion of all the possible choices for Multiple Regression, you will need to consult the SPSS manuals.
Figure 8-9 shows you the first screen of the results in the Output window when you finally click "OK" in the Linear Regression dialog box after having chosen stepwise regression using all the default options.
For further practice, try using some of the other data bases we have included with this manual.
|
|
|
|
|
|