Social Sciences Research and Instructional Council
Teaching Resources Depository

SPSS for Windows, Version 7.5: A Brief Tutorial
(Hypertext Version)

Chapter Three
Transforming Data

© The Authors, 1998; Last modified 15 August 1998
This section explains how to change or transform your variables in various ways. You can combine values of a variable into several categories. You can create new variables out of old variables. You can select particular cases and analyze only these cases. You can weight cases in such a way that some cases count more heavily than others. SPSS provides a wide variety of ways of transforming your data.

Recoding Variables

Recoding is a way of combining the values of a variable into a number of categories. For example, age might be expressed in actual years in your data set. However, you want to combine age into the following four categories: under 30, 30 to 49, 50 to 69, and 70 and over. Or perhaps you asked respondents how often they prayed--several times a day, once a day, several times a week, once a week, less than once a week, or never. Now you decide that you want to combine several times a day and once a day into one category (often), several times a week and once a week into another category (sometimes), and less than once a week and never into a third category (infrequently). Recoding is the process in SPSS that will do this.

Get into SPSS by clicking on the "Start" button. Point your mouse at "Programs" and then at "SPSS". Finally, click on "SPSS". Now let's get the data file we used in Chapter 1. Point your mouse at "File" and press the left clicker. A box will open which is the File menu. Point your mouse at "Open" and press the left clicker. This opens a larger box called the Open File box. Here we want to tell SPSS where to find the data file we want to open. In the upper part of the box you'll see "Look in:". Click the down arrow on the Look in line and then click A:. Click on the file name, GSS96A.SAV, to highlight it and then click on "Open." In a few seconds, your data matrix will appear. We're going to recode the variable called AGE which is, of course, the respondent's age. Click on "Transform" and then point your mouse at "Recode". Your screen will look like Figure 3-1.

Figure 3-1

Recoding Into Different Variables

You have a choice of recoding into the same variable or into different variables. If we recode into different variables we could combine ages into one set of categories and call this new variable AGE1 and then recode ages into a different set of categories called AGE2. Let's do that, so click on "Into Different Variables". Your screen will look like Figure 3-2.

Figure 3-2

Find "AGE" in the list of variables on the left and click on it to highlight it. Then click on the arrow just to the left of the big box in the middle of the window. This will move AGE into the list of variables to recode.

You want to give a name to this new variable so click in the "Name" box under Output Variable and type the name "AGE1" in this box. You can even type a variable label for this new variable in the Label box just below the Name box. Try typing "Age in Four Categories" as your label. Click on the "Change" button to tell SPSS to make these changes. Your screen will look like Figure 3-3.

Figure 3-3

Now we want to tell SPSS how to create these categories. Click on the "Old and New Values" button at the bottom of the window. Your screen will look like Figure 3-4.

Figure 3-4

You have several options. You can change a particular value into a new value by entering the value to be changed into the Old Value box and the new value into the New Value box and then clicking on Add. You can change the system-missing value into another value or you can change the system-missing or user-missing values into another value.

You can also change a range of values into a new value and that is what we are going to do. Click on the fourth bubble from the top labeled Range. Notice how this marks this choice by filling in the bubble. Then type "18" (the youngest age in the data set) in the box to the left of "through," click on the box to the right of through, and type "29" in that box. Then click on "Value" just below New Value and type "1" in that box. This means that we want to combine all ages from 18 through 29 into a single category and give it the value of 1. Then click on "Add".

Now do the same thing for the other categories. Click on the box under Range and type "30" in the box to the left of "through", click on the box to the right of through, and type "49" in that box. Click on "Value" just below New Value and type "2" in that box and click on Add. Do the same thing for the category 50 to 69 (give this a new value of "3") and the category 70 to 89 (the largest age in the data set). Give this last category a new value of "4". Your screen should look like Figure 3-5.

Figure 3-5

If you want to change one of your categories, highlight that category in the Old-> New box and make the changes. Then click on "Change". Your new category should appear in the Old->New box. If you want to remove a category, highlight it and click on "Remove".

Now we want SPSS to carry out the recoding. Click on "Continue" at the bottom of the window. This will take you back to the Recode into Different Variables box.

Click on "OK" and SPSS will take a few seconds to carry out your commands. The data matrix should appear on the screen. When it says that the SPSS Processor is Ready at the bottom of the window you know that SPSS has finished with the recoding.

Let's see what our recoding has accomplished. Click on "Statistics", then point your mouse at "Summarize", and then click on "Frequencies". Notice that AGE1 has appeared in the list of variables on the left. Click on it to highlight it and click on the arrow to move it to the Variables box. Then click on "OK". An output window will open. Click on "Age in Four Categories" in the left-hand pane or use the scroll box until you can see the entire table. Your screen will look like Figure 3-6.

Figure 3-6

Let's take a look at the data matrix. Click on "Window" in the menu bar. At the bottom of the box that is opened you will see a list of all the windows you have open. One of these windows will be called "GSS96A - SPSS Data Editor". Click on that line and the data matrix window will be moved to the foreground and you will see it on your screen. Use the scroll bar in the lower-right part of the window to scroll to the right until you see a column titled AGE1. (It will be the last column in the matrix.) This is the new variable you just created. Your screen should look like Figure 3-7.

Figure 3-7

Use the other scroll bar to scroll down and see the values in this variable.

Look back at Figure 3-6 and you will see that there are no value labels for the categories 1 through 4 for the new variable AGE1. We want to insert value labels which will appear on the output. Point your mouse at the variable name at the top of the column (AGE1) and double click. This will open the Define Variable box. Click on the Labels button at the bottom of the window.

The label you entered ("Age in Four Categories") will be in the Variable Label box. In the Value Labels box you will see two more boxes--Value and Value Label. Click in the Value box and type the value "1". Then click in the Value Label box and type the label for the first category, "under 30". Then click on "Add" and the new label will appear in another box just to the right of the Add button. Then click in the Value box and type the value "2" and type the label for the second category, "30 to 49" and click on "Add". Do this for values 3 and 4. If you make a mistake you can use the Change and Remove buttons which work the same way we just described. Your screen should look like Figure 3-8.

Figure 3-8

Click on "Continue" and then on "OK". Now click on "Statistics", point your mouse at "Summarize", and then click on "Frequencies" and rerun the frequencies distribution for AGE1. This time it should have the value labels you just entered on the output.

We said that recoding into different variables allowed you to recode a variable in more than one way. Let's recode AGE again, but this time let's recode age into three categories--18 through 34, 35 to 59, and 60 and over. Let's call this new variable AGE2. You should be able to recode AGE into AGE2 by retracing the steps you used to create AGE1. Remember to click on "Reset" in the "Recode into Different Variables" box to get rid of the recoding instructions for "AGE1." When you are done, do a frequency distribution for AGE2. Your screen should look like Figure 3-9.

Figure 3-9

There are two more important points to discuss. Figure 3-4 shows the "Recode into Different Variables: Old and New Values" box.

There are three options in the Old Value box that we haven't discussed. Two are different ways of entering ranges. You can enter the lowest value of the variable through some particular value and you can enter some particular value through the highest value of the variable. Make sure that you do not include your missing values in these ranges or your missing values will become part of that category. For example, if 99 is the missing value for age, then the recode 70 through highest would include the missing values with the oldest age category. This is probably not what you want to do. So be careful.

What happens if you don't recode a particular value? If it is a missing value, it retains its status as a missing value in the new variable. But what if it isn't a missing value? Any value (other than a missing value ) that is not recoded is changed into a system-missing value. If you want to leave a value in its original form, then click on "All other values" in the Old Value box and click on "Copy Old Values" in the New Value box and then click on "Add".

Recoding into the Same Variable

Now we are going to recode and have the recoded variable replace the old variable. This means that we will not create a new variable. We will replace the old variable with the recoded variable.

Click on "Transform" and then point your mouse at "Recode". This time click on "Recode Into Same Variables". Let's recode the variable called PRAY. Find "PRAY" on the list of variables on the left, click on it to highlight it, and then click on the arrow to the left of the Numeric Variables box. This will move the variable PRAY into the big box in the middle of the window. Click on the "Old and New Values" button. This will open the Recode into Same Variables: Old and New Values box. Your screen should look like Figure 3-10.

Figure 3-10

This box looks very much like the box you just used (see Figure 3-4). Combine the values 1 and 2 and give this category the value 1. Combine values 3 and 4 into another category and call this 2. Then combine values 5 and 6 into a third category and call this 3. We don't have to go through the instructions again, since it's the same as before. Moreover, since this is not a new variable, it will still be called "PRAY".

You will want to change the value labels. Find the variable "PRAY" in the data matrix by scrolling to the left. Point your mouse at the variable name ("PRAY") and double click. This will open the Define Variables box. Click on the "Labels" button and change the labels in whatever way you want. You will have to use the Change and Remove buttons to do this. Follow the instructions we just went through for recoding into different variables. When you finish, click on "Statistics", then point your mouse at "Summarize", then click on "Frequencies" and get a frequency distribution for PRAY. Your screen should look like Figure 3-11.

Figure 3-11

When you recode into the same variables, a value that is not recoded stays the same as it was in the original variable. If we had decided to keep "never" (value 6) as a separate category, we could have left it alone and it would have stayed a 6. Or we could have changed it to another value such as 4. This is an important difference between recoding into the same and different variables.

Recoding is a very useful procedure and one that you will probably use a lot. It's worth spending time practicing how to recode so you will be able to do it with ease when the time comes.

Creating New Variables Using COMPUTE

You can also create new variables out of old variables using Compute. There are seven variables in the data set we have been using that ask respondents if they think a women ought to be able to obtain a legal abortion under various scenarios. These are the variables ABANY (woman wants abortion for any reason), ABDEFECT (possibility of serious birth defect in baby), ABHLTH (woman's health is seriously threatened), ABNOMORE (woman is married and doesn't want any more children), ABPOOR (woman is poor and can't afford more children), ABRAPE (pregnant as result of rape), and ABSINGLE (woman is not married).

Each variable is coded 1 if the respondent says yes (ought to be able to obtain a legal abortion) and 2 if the person says no. The missing values are 0 (not applicable, question wasn't asked), 8 (don't know), and 9 (no answer).

Let's add these seven variables and create a new variable and call it ABORTION. If a person said yes to all seven questions the new variable would equal 7 and if he or she said no to all seven questions the new variable would equal 14. But what about missing values? If any of the seven variables have a missing value, then the new variable would be assigned a system-missing value.

We will use Compute to do this. Click on "Transform" and then click on "Compute". Your screen should look like Figure 3-12.

Figure 3-12

Type the name of the new variable, "ABORTION", in the Target Variable box. Then enter the formula for this new variable in the Numeric Expression box. There are two ways you could do this. One method is to click on the first of the seven variables, "ABANY", in the list of variables on the left, then click on the arrow to the right of this list. This will move ABANY into the Numeric Expression box. Now click on the "plus" sign and the plus sign moves into the box.

Continue doing this until the box contains the following formula: abany + abdefect + abhlth + abnomore + abpoor + abrape + absingle. (Don't type the period after absingle.) If you make a mistake, just click in the Numeric Expression box and use the arrow keys and the delete and backspace keys to make corrections. Your screen should look like Figure 3-13.

Figure 3-13

Click on "OK" to indicate that you want SPSS to create this new variable. You can use the scroll bars to scroll to the far right of the matrix and view the variable you just created. A second way to enter the formula in the Numeric Expression box is to click in the box and type the formula directly into the box.

We can add variable and value labels to this variable by pointing your mouse at the variable name ("ABORTION") at the top of the column in the data matrix and double clicking. This will open the Define Variable box. Click on the "Labels" button at the bottom of the window. You can enter the variable and value labels following the instructions in the recoding section of this chapter.

Enter the variable label "Sum of Seven Abortion Variables". Enter the value label "High Approval" for the value seven and "Low Approval" for the value fourteen. Remember that seven means they approved of abortion in all seven scenarios and fourteen means they disapproved all seven times. Click on "Continue" and then on "OK" in the Define Variable box.

Now get a frequency distribution for this variable by clicking on "Statistics", then point your mouse at "Summarize", then click on "Frequencies". Click on "Reset" to get rid of what is already in the box. Notice that the variable ABORTION is now in the list of variables on the left. Highlight it and click on the arrow to the left of the Variables box. Then click on "OK". Your screen should look like Figure 3-14.

Figure 3-14

Let's create another variable. Two of the variables in the data set are the number of years of education of the respondent's father (PAEDUC) and of the respondent's mother (MAEDUC). If we divide PAEDUC by MAEDUC we will get the ratio of the father's education to the mother's education. Any value greater than one will mean that the father has more education than the mother and any value less than one means the mother has more education than the father. Any value close to one means that the father and mother have about the same education.

We have a small problem though. If the mother's education is zero, then we will be dividing by zero which is mathematically undefined. Let's recode any value of zero for MAEDUC so it becomes a one. This will avoid dividing by zero and still give us a useful ratio of father's to mother's education. Click on "Transform", then point your mouse at "Recode", and then click on "Into Same Variables". Click on "Reset" to get rid of the recoding instructions for "PRAY."

Move "MAEDUC" into the Variables box by highlighting it in the list of variables on the left and clicking on the arrow to the right of this list. Then type "0" into the Value box under Old Value and click in the Value box under New Value. Type "1" in this box and click on "Add". Your screen should look like Figure 3-15.

Figure 3-15

Click on "Continue" and then on "OK" in the Recode Into Same Variables box. Now we have changed each 0 for MAEDUC into a 1.

Let's create our new variable. Click on "Transform" and then on "Compute". Click on "Reset" to get rid of the formula for the ABORTION variable you just created. Call this new variable RATIO. So type "RATIO" in the Target Variable box. Now we want to write the formula in the Numeric Expression box. Click in the list of variables on the left and scroll down until you see PAEDUC. Click on it to highlight it and click on the arrow to the right of the list to move it into the Numeric Expression box.

SPSS uses the slash (/) to indicate division, so click on the / in the box in the center of the window. Click on the list of variables again and scroll up until you see "MAEDUC" and click on it to highlight it. Move it to the Numeric Expression box by clicking on the arrow. Your screen should look like Figure 3-16.

Figure 3-16

Click on "OK" and SPSS will create your new variable. Use the scroll bar to scroll to the right in the data matrix until you can see the new variable you called RATIO. Scroll up and down so you can see what the values of this variable look like.

Let's recode this new variable and then get a frequency distribution for it. Click on "Transform", then point your mouse at "Recode", and then click on "Into Different Variables".

Click on "Reset" to get rid of the recoding instructions for "AGE2." Find the variable "RATIO" in the list of variables on the left and click on it to highlight it. Then click on the arrow to the right of this list to move it into the box in the middle of the window. Type "RATIO1" in the Name box under Output Variable and type "Recoded Ratio" in the Label box. Then click on "Change".

Click on "Old and New Values" to open the Recode Into Different Variables: Old and New Values box.Click on the fifth bubble from the top under Old Value and then type "0.89" in the box to indicate that you want to recode the lowest value through 0.89. Click on the Value box under New Value and type "1" in that box, then click on "Add". Click on the fourth bubble from the top under Old Value and type "0.90" in the box to the left of through and "1.10" in the box to the right.

Then type "2" in the Value box under New Value and click on "Add". Finally, click on the sixth bubble from the top under Old Value and type "1.11" in the box to the left of through. Type "3" in the Value box under New Value and click on "Add". Your screen should look like Figure 3-17.

Figure 3-17

Click on "Continue" and then on "OK" in the Recode Into Different Variables box.

We want to add value labels to this recoded value. Find the variable RATIO1 in the data matrix and double click on the variable name, "RATIO1". This opens the Define Variable box. Click on "Labels". Type "1" in the Value box and "under 0.90" in the Value Label box and then click on "Add".

Do this twice more to add the label "0.90 through 1.10" to the value 2 and "over 1.10" to the value 3. Your screen should look like Figure 3-18.

Figure 3-18

Click on "Continue" and then on "OK" in the Define Variable box.

Now let's see what the frequency distribution for this recoded variable looks like. Click on "Statistics", then point your mouse at "Summarize", and then click on "Frequencies". Find the variable RATIO1 in the list of variables on the left and move it to the Variables box. Click on "OK" and your screen should look like Figure 3-19.

Figure 3-19

The first category (under 0.90) means that father's education was less than 90% of mother's education. The second category (0.90 through 1.10) means that father's and mother's education were about the same, while the third category (over 1.10) means that father's education was more than 110% of mother's education. You can see that about 42% of the respondents have fathers and mothers with similar education, while about 30% have fathers with substantially less education than the mother and another 28% have fathers with substantially more education than the mother.

You have already seen that SPSS uses + for addition and / for division. It also uses - for subtraction, * for multiplication, and ** for exponentiation. There are other arithmetic operators and a large number of functions (e.g., square root) that can be used in Compute statements. These can be found in the SPSS Base 7.5 for Windows User's Guide (SPSS, Inc., 1997).

Creating New Variables Using IF

The "IF" command is another way to create new variables out of old variables. Let's say that, as in the preceding section, we want to compare the level of education of each respondent's father to that of his or her mother. Now, however, we're not interested in the precise ratio, but just want to know if the father had more education than the mother, the same amount, or less. We'll create a new variable that will have the value 1 when the father has more education than the mother, 2 when both have the same amount of education, and 3 when the mother has more education.

Click on "Transform" and then click on "Compute". Click on "Reset" to get rid of the instructions for creating "RATIO". Type the name of the new variable, "COMPEDUC", in the Target Variable box. Then click on the Numeric Expression box and enter "1". So far, this is just like what you did in the previous section. This time, however, click on "If". Your screen should look like Figure 3-20.

Figure 3-20

Click on: "Include if case satisfies condition:". Find "PAEDUC" in the list of variables on the left and click on it to highlight it. Then click on the arrow to the right of this list. This will move PAEDUC into the box to the right of the arrow. Now click on ">" (greater than). Find "MAEDUC" in the list of variables on the left, click on it, and click on the arrow to add MAEDUC to the formula. (Alternatively, you could click on the box to the right of the arrow and directly enter the formula, "PAEDUC > MAEDUC".) Now click on "Continue". Your screen should look like Figure 3-21.

Figure 3-21

Click on "OK". Now repeat the same procedures as above, but this time setting the value of "COMPEDUC" to "2" (instead of 1) and the formula to "PAEDUC = MAEDUC". When you are asked if you want to Change existing variable, click on "OK". Now repeat the procedures a third time, but change the value of "COMPEDUC" to "3" and the formula to "PAEDUC < MAEDUC".

You can add variable and value labels to this variable. Find the variable COMPEDUC in the data matrix and click on it. (It will be the far right of the data matrix.) Click on "Data" on the menu bar near the top of your screen, then on "Define Variable". Your screen should look like Figure 3-22.

Figure 3-22

Click on "Labels". This opens up a new window. (See Figure 3-23.)

Figure 3-23

On the box next to Variable Label, type: "Father's vs. mother's education". Now click on the box next to Value and type: "1". Click on the box next to Value label (or press the Tab key) and type: "Dad More". Now click on "Add". Repeat this procedure for values 2 and 3, labeling them "Same" and "Mom More" respectively. Click on "Continue", then on "OK".

To look at the results of what you've done, Click on "Statistics" on the menu bar, then point your mouse at "Summarize", then click on "Frequencies". Your screen should look like Figure 3-24.

Figure 3-24

Find COMPEDUC in the list of variables on the left, click on it, and then click on the arrow to the right of the box. Click on "OK" and your screen should look like Figure 3-25.

Figure 3-25

Selecting Cases

SPSS can also select subsets of cases for further analysis. One of the variables in the data set is the respondent's religious preference (RELIG). The categories include Protestant (value 1), Catholic (2), Jewish (3), none (4), and other (5). The missing values are 8 (don't know) and 9 (no answer). We might want to select only those respondents who have a religious preference for analysis. We can do this by using the Select Cases option in SPSS. Click on "Data" and then on "Select Cases". This will open the Select Cases box. Your screen should look like Figure 3-26.

Figure 3-26

Notice that All Cases is currently selected. (The circle to the left of All Cases is filled in to indicate that it is selected.) We want to select a subset of these cases so click on the circle to the left of If condition is satisfied to select it. At the bottom of the window it says that unselected cases are filtered. This means that the cases you do not select can be used later if you click on "All Cases".

If you had selected "Deleted" these unselected cases could not be used later. You should be very careful about saving a file after you have deleted cases because those are gone forever in that file. (You could, of course, get another copy of the data file by clicking on "File" and on "Open", if you saved the altered file under a different file name.)

Click on "If" and this will open the Select Cases: If box. Scroll down the list of variables on the left until you come to "RELIG" and then click on it to highlight it. Click on the arrow to the right of this list to move RELIG into the box in the middle of the window. We want to select all cases that are not equal to 4 so click on the "~=" sign. This symbol means "not equal to." Now click on "4" and the expression in the box will read RELIG ~= 4 which means that the variable RELIG does not equal 4 (the code for no religious preference). Your screen should look like Figure 3-27.

Figure 3-27

Click on "Continue" and then on "OK" in the Select Cases box.

Let's see what our data file looks like now. Click on "Statistics", then point your mouse at "Summarize", and then click on "Frequencies". Move RELIG into the Variables box and click on "OK". Your screen should look like Figure 3-28.

Figure 3-28

There are no respondents without a religious preference (value 4) in this table because you selected only those cases with values not equal to four.

What if we wanted to analyze only Protestants and Catholics? Click on "Data" and then on "Select Cases". Now click on "Reset" to eliminate what you had entered previously. Click on "If condition is satisfied" and then on "If". Scroll down the list of variables and click on "RELIG" and then click on the arrow to the right of the list to move it into the box.

Click on "=" and then on "1" so the expression in the box reads relig = 1. SPSS uses the symbol & for AND and the symbol | for OR. We want all cases for which RELIG is 1 or 2. Now click on "|".

Click on "RELIG" in the list of variables again and on the arrow to move it into the box. Then click on "=" and then on "2" so the expression in the box reads relig = 1 | relig = 2 which means that RELIG will equal 1 or 2. Your screen should look like Figure 3-29.

Figure 3-29

Click on "Continue" and on "OK" in the Select Cases box.

Now let's see what our data file looks like. Click on "Statistics", then point your mouse at "Summarize", then click on "Frequencies". "RELIG" should already be in the Variables box, so all you have to do is click on "OK". Your screen should look like Figure 3-30.

Figure 3-30

You will only have Protestants (1) and Catholics (2) in your table because you selected only those cases with values one and two on RELIG.

After you have selected cases for analysis, you may want to continue your analysis with all the cases. To do this, click on "Data", then on "Select Cases", and then click on the circle to the left of All cases. Click on "OK" and SPSS will select all the cases in the data file. This will work only if you selected Unfiltered in the Select Cases box when you began using select cases. If you selected Deleted, then you will have to get another copy of the data file by clicking on "File" and then on "Open".

Weighting Cases

Sometimes you may want to weight some cases in your data more heavily than others. Each household represented in the General Social Survey (i.e., the data set you have been using in Chapter Three) had an equal probability of selection. If there was more than one person eligible in the household (18 years of age or older), then one of these individuals was randomly selected. If there was one eligible person in the household, then that person had a 1 out of 1 chance of being selected. If there were two eligible people, then each person had a 1 out of 2 chance. If there were three eligibles, then each person had a 1 out of 3 chance and so on.

In other words, the more eligible people in the household, the smaller the chance of selection for any one of them. We can correct for this by weighting each case by the number of eligible people in their household. There is a variable called ADULTS which is the number of people 18 years of age or older in the household and this is, of course, also the number of eligible people in the household.

The number of adults in the household varied from one to six. The following table shows what this distribution looks like.

Weighting Cases by Number of Eligible Adults in Household (ADULTS)

Number of Eligible Adults Number of Cases Weighted Number of Cases
1 984 984
2 1524 3048
3 298 894
4 76 304
5 16 80
6 6 36
Total 2904 5346

The weighted number of cases is just the number of eligible adults multiplied by the number of cases. This means that each case with two eligible adults has a weight twice that of each case with one eligible adult, each case with three eligible adults has a weight three times that of each case with one eligible, and so on.

The problem with this is that we started with 2,904 cases and ended up with 5,346 cases. This artificially inflates the size of the sample which we really don't want to do. There is an easy way to fix this. If we divide 5,346 (the weighted sum of cases) by 2,904 (the actual number of cases) we get 1.841. We can divide each weight by 1.841 to get an adjusted weight. This would produce the following weighted data.

Weighting Cases Using Adjusted Weights

Number of Eligible Adults Adjusted Weight Number of Cases Weighted Number of Cases
1 1/1.841=0.542 984 534.31
2 2/1.841=1.086 1524 1655.06
3 3/1.841=1.630 298 485.74
4 4/1.841=2.173 76 165.15
5 5/1.841=2.716 16 43.46
6 6/1.841=3.259 6 19.55
Total   2904 2903.27

Notice that when using the adjusted weights, the weighted number of cases equals the number of cases (except for a small amount of rounding error). Let's use Compute to create our new adjusted weight variable. We'll call this variable WADULTS for weighted adults. Click on "Transform" and then on "Compute". Click on "Reset" to get rid of what you entered previously. Type "WADULTS" in the Target Variable box. Find the variable ADULTS in the list of variables on the left and click on it to highlight it. Then click on the arrow to the right of this list to move it into the Numerical Expression box. Now click on "/" (for division) and then enter the value 1.841 by clicking on the "one", then the "decimal", then "eight", then "four", and finally on "one". The formula in the box should read ADULTS/1.841 and your screen should look like Figure 3-31.

Figure 3-31

Click on "OK" and SPSS will create the new variable called WADULTS.

Now we want to weight the data using this variable we just created. Click on "Data" and then on "Weight Cases". Click on the circle to the left of Weight cases by. Notice that this fills the circle in to indicate that it has been selected. Scroll down the list of variables on the left and find the variable "WADULTS". Click on it to highlight it and then click on the arrow to the right of the list to move this variable into the Frequency Variable box. Your screen should look like Figure 3-32.

Figure 3-32

Click on "OK" and SPSS will weight the data appropriately.

Let's get a frequency distribution for the variable ADULTS using the weighted data. Click on "Statistics", then point your mouse at "Summarize", and then click on "Frequencies". Move the variable "ADULTS" into the Variables box and click on "OK". The weighted frequency distribution should look like Figure 3-33.

Figure 3-33

Notice that the frequencies are very close to the weighted number of cases produced by using the adjusted weights we computed above. (Any differences are due to rounding error.)

If you want to go back to the unweighted data, you will have to click on "Data" and then on "Weight cases". Click on the circle to the left of Do not weight cases and then on "OK". Now you are using the unweighted cases again.

Summary

In this part of the book you have learned how to recode, create new variables using Compute and If, select particular cases for analysis, and weight cases. You can do more complicated things with these commands than we have shown you, but these are the basics. You can use the SPSS Base 7.5 for Windows User's Guide (SPSS, Inc., 1997) to learn what else you can do with these commands. In the rest of this book, we will focus on the some of the statistical procedures that SPSS can do for you.

Chapter Three Exercises

Use the gss96a.sav data set on your data disk for all these exercises.

RECODE

  1. There are two variables that refer to the highest year of school completed by the respondent's mother and father (MAEDUC and PAEDUC). Do a frequency distribution for each of these variables. Now recode each of them (into same variable) into three categories: under 12 years of school, 12 years, and over 12 years. Create new value labels for the recoded categories. Do a frequency distribution again to make sure that you recoded correctly.
  2. INCOME91 is the total family income. Do a frequency distribution to see what the variable looks like before recoding. Recode (into a different variable) into eight categories: under $10,000, $10,000 to $19,999, $20,000 to $29,999, $30,000 to $39,999, $40,000 to $49,999, $50,000 to $59,999, $60,000 to $74,999, and $75,000 and over. Call this new variable INCOME1. Create new value labels for the recoded categories. Do another frequency distribution to make sure you recoded correctly. Now recode INCOME91 again (into a different variable). This time use only four categories: under $20,000, $20,000 to $39,999, $40,000 to $59,999, and $60,000 and over. Call this new variable INCOME2. Create new value labels for the recoded categories. Do another frequency distribution to make sure you recoded correctly.
COMPUTE
  1. In this chapter we created a new variable called ABORTION which was the sum of the seven abortion variables in the data set. Create a new variable called AB1 which is the sum of ABDEFECT, ABHLTH, and ABRAPE. Do a frequency distribution for this new variable to see what it looks like. How is this distribution different from the distribution for the ABORTION variable based on all seven variables?
  2. There are five variables that measure tolerance for letting someone speak in your community who may have very different views than your own (SPKATH, SPKCOM, SPKHOMO, SPKMIL, and SPKRAC). For each of these variables, 1 means that they would allow such a person to speak and 2 means that they would not allow it. Create a new variable (call it SPEAK) which is the sum of these five variables. This new variable would have a range from 5 (would allow a person to speak in each of the five scenarios) to 10 (would not allow a person to speak in any of the five scenarios). Do a frequency distribution for this new variable to see what it looks like.
IF
  1. There are two variables that describe the highest educational degree of the respondent's father and mother (PADEG and MADEG). Create a new variable (call it MAPAEDUC) that indicates if the father and mother have a college education. This variable should equal 1 if both parents have a college education, 2 if only the father has a college education, 3 if only the mother has a college education, and 4 if neither parent has a college education. Create new value labels for the recoded categories. Do a frequency distribution for this new variable to see what it looks like.
  2. One variable indicates how often the respondent prays (PRAY) and another variable indicates if the respondent approves or disapproves of the Supreme Court's decision regarding prayer in the public schools (PRAYER). Create a new variable (call it PRY) that is a combination of these two variables. This variable should equal 1 if the respondent prays a lot (once a day or several times a day) and approves of the Supreme Court's decision, 2 if the respondent prays a lot (once a day or several times a day) and disapproves of the Supreme Court's decision, 3 if the respondent doesn't pray a lot and approves of the Supreme Court's decision, and 4 if if the respondent doesn't pray a lot and disapproves of the Supreme Court's decision. Do a frequency distribution for this new variable to see what it looks like.
SELECT IF
  1. Select all males (1 on the variable SEX) and do a frequency distribution for the variable FEAR (afraid to walk alone at night in the neighborhood). Then select all females (2 on the variable SEX) and do a frequency distribution on FEAR. Are males or females more fearful of walking alone at night?
  2. Select all whites (1 on the variable RACE) and do a frequency distribution for the variable PRES92. Did they vote for Clinton, Bush, or Perot in 1992? Then select all blacks (2 on the variable RACE) and do a frequency distribution on PRES92. Were whites or blacks more likely to vote for Clinton?
Back
Top
Previous Chapter
SPSS Book Table of Contents
Next Chapter
Home