Social Sciences Research and Instructional Council
Teaching Resources Depository

SPSS for Windows, Version 7.5: A Brief Tutorial
(Hypertext Version)

Chapter Five: Cross Tabulations

© The Authors, 1998; Last modified 15 August 1998
In this chapter, we'll look at how SPSS for Windows can be used to create contingency tables, sometimes called crosstabs. A contingency table helps us look at whether the value of one variable is "contingent" upon that of another. It is useful when each variable contains only a few categories. Usually, though not always, such variables will be nominal or ordinal. Some techniques for examining relationships among interval or ratio variables are presented in later chapters.

To illustrate this technique, we'll examine the relationship between how respondents voted in the 1996 presidential contest and how they voted in races that same year for the U.S. House of Representatives. Our hypothesis is that voters cast "straight ticket" ballots, either because of a "presidential coattail" effect or (more likely) because voting in both presidential and congressional elections is a function of voter party identification. Our analysis will be based on the cross-sectional sample from the 1996 American National Election Study [permission and disclaimer at the end of this chapter].

A subset of this survey is stored on your diskette as "NES96A.SAV." Open this file following the instructions in Chapter 1 under "Getting a Data File."

You’ll next need to weight the cases to correct for known over and under representation in the sample. (See chapter 3 for more information on weighting of samples.) A variable called "WEIGHT" has been included in the file for this purpose. Click on "Data," and then on "weight Cases." Click on the circle to the left of "Weight cases by." Click on the "WEIGHT" variable on the list of variables on the left, then click on the arrow to the right of the list to move this variable into the "Frequency Variable" box. Click on "OK."

To create a contingency table, click on "Statistics," ""Summarize"," and "Crosstabs." This will open up the dialog box shown in Figure 5-1.

Figure 5-1

You'll next need to choose the row (usually the dependent) and column (usually the independent) variables. In this case, "VOTEPRES" measures how respondents voted in the presidential race, and "VOTEHSE" measures how they voted in House elections. In the box on the left, drag the scroll bar or click and hold the down arrow until you find "VOTEPRES." Click on "VOTEPRES," then on the arrow key that is pointing to the right toward the box labeled "Column(s):." Now find "VOTEHSE" in the box on the left. Click on it, and then click on the arrow pointing toward the box labeled "Row(s):."

The next step is to indicate what information you would like to have in each cell of the table. SPSS for Windows automatically provides a cell "count," that is, the number of cases actually occurring (observed) in each cell. To obtain additional information, click on "Cells" This opens up a new dialog box (Figure 5-2).

Figure 5-2

Here you encounter a number of choices. For present purposes, we need to convert raw numbers into percentages. In a contingency table, one should always percentage so that each category of the independent variable totals to 100%. In this example, we will (somewhat arbitrarily) treat presidential vote as the independent variable. Since we have placed "VOTEPRES" in the columns, click on "Column," then on "Continue" You are returned to the Crosstabs dialog box. Click on "OK" SPSS for Windows has now opened up an output window containing your table. Click on the maximize button. Even with the output window maximized, you will need to scroll around in order to see all of the results.

Figures 5-3 and 5-4 show portions of what you’ll find. Figure 5-3 displays the numbers and percentages of cases that have missing values for one or both variables, and the number and percentages of valid cases.

Figure 5-3

Figure 5-4 displays some of the actual contingency table.

Figure 5-4

Notice a few interesting things about the figures in the table. Most voters did indeed vote "straight tickets." Of those voting for Clinton, about three quarters (74.4%) also voted for Democrats in House elections, while almost nine tenths (87.7%) of Dole supporters stayed with the GOP candidates for the House. Perot supporters tended, by a margin of 55.6% to 41.7%, to favor Republicans over Democrat in House races.

We'd probably like to know the probability that the relationship found in the table occurred by chance, especially since a lot of respondents (688, or 40.1% of the total) had missing values for one or both variables, reducing the number of cases on which the table is based from 1714 (the total sample) to 1026. We'd also probably like some measure of the strength of the relationship within the sample between the two variables.

To get this information, click again on "Statistics," "Summarize," and "Crosstabs" Notice that all the information you provided last time has been retained. Now click on "Statistics" This opens up a new dialog box (Figure 5-5).

Figure 5-5

Click on "Chi-square" to obtain a measure of statistical significance, and on "phi and Cramer’s V" Phi and Cramer’s V are measures of the strength of association between two variables when one or both are at the nominal level of measurement. Phi is appropriate for tables with two rows and two columns, while Cramer’s V is appropriate in other instances, including this example. Now click on "Continue," and on "OK" The output window now reappears. The table that we obtained earlier is repeated, but is now followed by additional information, a portion of which is shown in Figure 5-6.

Figure 5-6

Several different versions of chi-square (Pearson's chi-square is probably the most familiar) all indicate that the relationship in our table would occur by chance less than one time in ten thousand. The significance level of .000 shown for Pearson's chi-square, for example, indicates a probability of less than one in a thousand that the relationship is due to chance. Note, however, that some of the cells have an expected frequency of less than five. This warns us that the value for Pearson’s chi-square may not be reliable. We might want to redefine as missing those votes cast for minor party candidates and rerun the analysis. (See the section in chapter 2 dealing with missing values.) The Cramer’s V of .437 indicates that the relationship is a fairly strong one.)

Let's look at a somewhat different table. Click on "Statistics," "Summarize," and "Crosstabs" Now click on "VOTEPRES" in the "Column(s):" box. Notice that the arrow next to this box is now pointing left. Click on this arrow to remove "VOTEPRES" from the "Column(s):" box. In the same way, remove "VOTEHSE" from the "Row(s):" box. Now move to the box on the left and click on "POLVIEWS" Click on the arrow (pointing right again) next to the "Column(s):" box. In the same way, add "PARTYID" to the "Row(s):" box.

Since both of these variables are ordinal, we'll want to obtain different statistics to measure their relationship. Click on "Statistics" Click on "Chi square" and on "phi and Cramer’s V" (to delete them from the list of requested statistics). Click on "Kendall’s tau-b" (to add it). (Tau-b is a measure of association that is appropriate when both variables are ordinal and have equal numbers of categories.) Click on "Continue," then on "OK" What do the results show?

Chapter Five Exercises

Try testing some other hypotheses using any of the data sets on your diskette that contain nominal or ordinal data. Some suggestions using the 1996 American National Election Study subset:

  1. Consult the codebook in Appendix B describing this dataset. What background variables (such as region of country, age category, marital status, gender, ethnicity, education, or income) best predict a person's self-identified political views ("POLVIEWS")? Note: you can create an ethnicity variable by combining the variables "RACE" and "HISPANIC" You can also collapse the "EDUC" and "INCOME" variables into more workable numbers of categories. See Chapter 3 on transforming data.
  2. Is ideology a general characteristic, or is it issue-specific? That is, are people who are liberal (or conservative) on one issue (such as "SERVSPND, " spending for government services) also liberal (or conservative) on other issues (such as the "DEATHPEN," the death penalty)?
  3. What background variables best predict a person's party identification ("PARTYID")?
  4. What background variables best predict whether a person voted in the 1996 elections ("VOTE")?
  5. For those who did vote, is ideology (either measured in terms of specific issues, or the overall "POLVIEWS" variable) or "PARTYID" the better predictor of how a person voted in the 1996 presidential election?

Citation, permission, and disclaimer:

 Rosenstone, Steven J., Donald R. Kinder, Warren E. Miller, and the National Election Studies. NATIONAL ELECTION STUDIES, 1996: PRE- AND POST-ELECTION STUDY [dataset]. 3rd release. Ann Arbor, MI: University of Michigan, Center for Political Studies [producer and distributor], 1998.

These materials are based on work supported by the National Science Foundation under Grant Nos. : SBR-9707741, SBR-9317631, SES-9209410, SES-9009379, SES-8808361, SES-8341310, SES-8207580, and SOC77-08885.

Any opinions, findings and conclusions or recommendations expressed in these materials are those of the author(s) and do not necessarily reflect those of the National Science Foundation.

Back
Top
Previous Chapter
SPSS Book Table of Contents
Next Chapter
Home