Social Sciences Research and Instructional Council
Teaching Resources Depository

SPSS for Windows, Version 7.5: A Brief Tutorial
(Hypertext Version)

Chapter Seven: Regression and Correlation

© The Authors, 1998; Last modified 15 August 1998
Regression and correlation analysis (also called "least squares" analysis) helps us examine relationships among interval or ratio variables. In this chapter, we'll explore techniques for doing bivariate regression and correlation. Chapter 8 will include a look at multiple regression and correlation.

To illustrate these techniques, we'll focus on roll call voting in the United States Senate. There is a dataset on your diskette called "SENATE.SAV" that contains information on members serving in the Senate at the start of the 105th Congress (1997) and on the states they represent. Open this file following the instructions in Chapter One under "Getting a Data File." (The codebook is in Appendix C.)

Each year, each member of the Senate is rated by a conservative interest group, the American Conservative Union (ACU). Ratings are based on the percent of the time the member votes in agreement with the ACU's position on selected roll calls. The results for 1996 are included in your file as "ACU96." Data are coded as missing for freshmen Senators.

We'll begin by testing three hypotheses:

  1. The wealthier a state, as measured by per capita income, the more conservative will be the voting record of a Senator from the state.
  2. The higher the percentage of a state's residents who are white, the more conservative will be the voting record of a Senator from the state.
  3. The higher Bill Clinton's percent of the two-party vote in a state, the less conservative will be the voting record of a Senator from the state.
To tests these hypotheses, click on "Statistics", "Correlate", and "Bivariate" The dialog box shown in Figure 7-1 will appear on your screen.

Figure 7-1

Move the following variables to the box under "Variables": "ACU96", "INCOME", "WHITE", and "PCLINTON" Since our variables are measured at least at the interval level, we will accept the default of calculating Pearson correlation coefficients. If our data were ordinal, or badly skewed, we would compute the Kendall's tau-b or Spearman coefficients instead. Since we have population data rather than a sample, we won't concern ourselves with statistical significance and so, even though we do have directional hypotheses, won't bother changing the default of "two-tailed" tests of statistical significance.

Now click on "Options" This will open up the dialog box shown in Figure 7-2.

Figure 7-2

For our purposes, we won't need to change any of the defaults, so click on "Continue", then on "OK"

Figure 7-3 shows a portion of the resulting correlation matrix.

Figure 7-3

Our first hypothesis is not at all supported. It turns out, in fact, that there is a modest negative correlation (-.270) between conservative roll call voting by Senators and the per capita income of their states. Our second hypothesis receives essentially no support. The correlation between Senators' conservatism scores and the percent of the states' residents who are white is in the hypothesized direction, but is a negligible .061. Only our third hypothesis is supported: there is a moderate and, as expected, negative correlation (-.485) between ACU ratings and the Clinton vote in the 1996 presidential race.

Let's look at this last relationship more closely. Click on "Graphs", "Scatter..." and "Define" This will open up the dialog box shown in Figure 7-4.

Figure 7-4

In the box on the left, click on "ACU96", then on the arrow key that is pointing toward the box labeled "Y axis:" Now scroll down and click on "PCLINTON", then on the arrow key that is pointing toward the box labeled "X Axis:" Finally (for a reason that will become clear a bit later), click on "PARTY", then on the arrow pointing toward the box labeled "Label Cases by:", then on "OK" This produces the result shown in Figure 7-5.

Figure 7-5

OK, but not very fancy (or easy to interpret). Place your mouse arrow somewhere on the graph, and click on your right mouse button. Now click on "SPSS Chart Object", and on "Open" Figure 7-6 shows a portion of the SPSS Chart Editor that now appears.

Figure 7-6

Go to the menu bar, click on "Chart", then on "Options", producing the dialog box shown in Figure 7-7.

Figure 7-7

In the left-hand box, click on "Off" next to "Case Labels:", then on "on" Under ""Fit Line" in the center of the screen, click on the box next to "Total" This will cause the regression (least squares) line to be added to your graph. Now click on "Fit Options" near the middle of the screen. This opens up the dialog box shown in Figure 7-8.

Figure 7-8

In the lower right-hand portion of the screen, click on the box next to "Display R-square in legend" Now click on "Continue", then on "OK."

The chart (see Figure 7-9) now shows the regression line and the R2 coefficient (.2350).

Figure 7-9

Also, because we earlier turned "Case Labels" to "On", it shows the party affiliation for each Senator represented on the chart. Notice that almost all those with conservatism scores above those "predicted" by the regression line are Republicans, while almost all those with scores below the line are Democrats. Note: In defining value labels for the "party" variable when setting up the "SENATE.SAV" file, we deliberately used one character labels ("R" and "D"). Had we employed longer labels, the output for the scatterplot would have appeared garbled.

We can get more information about the regression line. Minimize the SPSS Chart Editor. Click on "Statistics", "Regression", and "Linear" This opens up the dialog box shown as Figure 7-10.

Figure 7-10

Move "acu96" to the "Dependent" box, and "PCLINTON" to the "Independent(s)" box. Click on "OK." Scroll through the resulting output until you get to the portion shown in Figure 7-11.

Figure 7-11

From this information, we can construct the equation for the regression line as follows:

Chapter Seven Exercises
  1. What state characteristics best explain Clinton's share of the 2-party vote in 1996? (Be especially careful here about trying to measure statistical significance. Not only are we working with population data, but each state is counted twice!)

  2. For the remaining exercises, open up the NES96A.SAV file we used in Chapter 5. This file contains mostly nominal and ordinal data, but there are a few variables that are at least interval. These include age in years ("YEARBORN") and years of education ("EDUC"). The file also includes four "feeling thermometers." To construct these scales, respondents were asked to describe (on a scale from 0 to 100, with 0 being coldest and 100 being warmest) their feelings toward Congress, the courts, the military, and the federal government.

  3. What is the relationship between age in years and years of education?
  4. Does age influence how respondents feel about institutions of government?
  5. Does amount of education influence how respondents feel about institutions of government? (Note: to separate out the effects of age and education, or to assess their combined impact, would require multiple regression. See chapter 8.)
  6. How do the four feeling thermometers correlate with each other? That is, do respondents who feel warmly toward one institution of government also tend to feel warmly toward the others?
Back
Top
Previous Chapter
SPSS Book Table of Contents
Next Chapter
Home