Chapter Three: Transforming Data
This chapter explains how to change, or transform, the
values associated with your variables, like the values entered in the data
entry process shown in Chapter 2. SPSS
for Windows can transform the values in several ways. SPSS for Windows can:
1. combine
values of a variable into several categories,
2. create
new variables out of old variables,
3. select
particular cases and analyze only these cases,
4. weight cases so that some cases count more heavily than
others.
SPSS provides many ways to transform data. Covered in this
chapter are recode, compute, if, and weight.
Recoding Variables
Recoding is a way of combining the values of a variable into
fewer categories. Let me give a
hypothetical example. Let’s say you have
conducted a survey and one of your demographic questions was the age of the
respondent. Entering the actual age in
years would be the simplest way of working with the data. But let’s also say that you want to compare
people of different age groupings. In
other words, your data would be more useful if it was organized into collapsed
categories, like "Young", "Middle age", and
"Older". Using SPSS for
Windows you could reorganize the data so that you had these three
groupings. There are two things you need
to know before you recode the values.
First, you need to decide the number of categories you want to end up
with. Generally this will be determined
by the way you plan to use the information.
If you are going to analyze the data using a table where you crosstabulate
two variables (see Chapter 5), you probably want to limit the number of new
categories to three or four. The second
thing you need to know is which of the old values are going to be combined into
a new category. For example, you might
do something like this.
|
The actual age of the
respondent as originally recorded in the data file. |
The new, collapsed, category. |
|
18 years through 35 years |
Young |
|
36 years through 55 years |
Middle age |
|
56 years of age or over |
Older |
Another example might be if respondents were asked how often
they prayed, and the original responses were several times a day, once a day,
several times a week, once a week, less than once a week, once a month, once a
year, or never. With recode we can
combine the people who said “several times a day” with the people who said
"once a day" and put all these respondents into a new category which
we could call "a lot".
Similarly, we could combine the people who said "several times a
week" with those who said "once a week" and call this category
"sometimes" and combine those who said "less than once a
week" and "once a month" and call this category
“infrequently". Those who said
"once a year" or "never" could be combined into a fourth
category called "hardly ever".
Recoding is the process in SPSS that will do the above examples.
Starting SPSS for Windows the way you were taught in Chapter
1, bring in the GSS02A file, as you did in Chapter 1. Our task is going to be to recode the
variable called AGE, which is, of course, the respondent's age.
Click on Transform and then point your mouse at Recode. Your screen will look like Figure 3‑1.
Now we have two options:
Recode Into
Different Variables and Recode Into Same
Variables. It is strongly
suggested that the beginning student only use the Recording Into Different Variables
option. If you make an error, your
original variable is still in the file and you can try again. If you make an error using Recode Into the Same
Variables, you have changed the original variable. If you also saved the file after doing this,
and you did not have another copy of the file, you have just eliminated any
chance of correcting your error.
Recoding Into Different
Variables
The recoding into a different variable starts with giving the new variable a variable name. For example, if we recode into different variables we could combine ages into one set of categories and call this new variable AGE1 and then recode ages into a different set of categories called AGE2. To do that, click on Into Different Variables. Your screen will look like Figure 3‑2.
Find AGE in the list
of variables on the left and click on it to highlight it, and then click on the
arrow just to the left of the big box in the middle of the window. This will
move AGE into the list of variables to recode.
You want to give a name to this new variable so click in the Name box under Output Variable and type the name AGE1 in this box. You can even type a variable label for this new variable in the Label box just below the Name box. Try typing Age in Four Categories as your label. Click on the Change button to tell SPSS to make these changes. Your screen will look like Figure 3‑3.
Now we have to tell SPSS how to create these categories
referred to as values. Click on the Old and New
Values button at the bottom of the window. The screen will look like Figure 3‑4.
There are several options. You can change a particular value
into a new value by entering the value to be changed into the OLD VALUE box and
the new value into the NEW VALUE box and then clicking on Add. You will usually
change one "real" value to another "real" value. For
example, change 18 thru 35 into value 1 ( The next
paragraph tells you how to do this.) There
are also other options[1].
As you can tell from the previous example, you can also
change a range of values into a new value and that is what we are going to do.
Click on the fourth bubble from the top labeled RANGE.
Notice how this marks this choice by filling in the bubble. Then type 18(the
youngest age in the data set) in the box to the left of through, click on the box to the right of through, and type 29
in that box. Then click on VALUE just
below NEW VALUE and type 1 in that box. This will have SPSS combine
all ages from 18 through 29 into a single category and give it the value of 1.
Then click on ADD.
Repeat this process for the other categories. Click on the box under RANGE and type 30 in the box to the left of through, click on the box to the right of through, and type 49 in that box. Click on Value just below New Value and type 2 in that box and click on Add. Do the same thing for the category 50 to 69 (give this a new value of 3) and the category 70 to 89 (the largest age in the data set). Give this last category a new value of 4. Your screen should look like Figure 3‑5.
To change one of your categories, highlight that category in
the OLD‑> NEW box, make the changes in the OLD or NEW VALUE box, and
then click on Change. The new category
should appear in the OLD‑>NEW box. To remove a category, highlight it
and click on REMOVE.
Now we want SPSS to carry out the recoding. Click on CONTINUE at the bottom of the window. This
will take you back to the RECODE INTO DIFFERENT
VARIABLES box. Click on OK and
SPSS will take a few seconds to carry out your commands. The data matrix should
appear on the screen. When it says that the SPSS Processor is READY at the
bottom of the window you know that SPSS has finished with the recoding.
Click on ANALYZE, then point your mouse at DESCRIPTIVE STATISTICS, and then click on FREQUENCIES. Notice that AGE1[2] has appeared in the list of variables on the left. Click on it to highlight it and click on the arrow to move it to the Variables box. Then click on OK. An output window will open. Your screen will look like Figure 3‑6.
Let's take a look at the data matrix. Click on WINDOW in the menu bar. In the box that is opened you will see a list of all the windows you have opened. One of these windows will be called GSS02A ‑ SPSS Data Editor. Click on that line and the data matrix window will be moved to the foreground and you will see it on your screen. Use the scroll bar in the lower‑right part of the window to scroll to the right until you see a column titled AGE1. (It will be the last column in the matrix.) This is the new variable you just created. Your screen should look like Figure 3‑7.
Use the other scroll bar to scroll down and see the values
in this variable. Look back at Figure 3‑6 and
you will see that there are no value labels for categories 1 through 4 for the
new variable AGE1.
If you want the output to give you more information about
what each category means you need to insert value labels. To do this, point your mouse at the variable
name at the top of the column (AGE1) and double click. This will open the Variable View tab in the Data Editor. Now you’re going to enter labels for the
values in the recoded variable using what you learned in Chapter Two.
In the Values box you will see a small gray button in the right-hand side of the box. Point your mouse at this box and click. This will open the Value Labels box. You will see two more boxes‑‑Value and Value Label. Click in the Value box and type the value 1. Then click in the Value Label box and type the label for the first category, under 30. Then click on Add and the new label will appear in another box just to the right of the Add button. Then click in the Value box and type the value 2 and type the label for the second category, 30 to 49, and click on Add. Do this for values 3 and 4. If you make a mistake you can use the Change and Remove buttons which work the same way we just described. Your screen should look like Figure 3‑8.
Click on OK.
Now click on Analyze, point your mouse
at Descriptive Statistics, and then click
on Frequencies and rerun the frequencies
distribution for AGE1. This time it should have the value labels you just
entered on the output.
We said that recoding into different variables allowed you
to recode a variable in more than one way. Let's recode AGE again, but this
time let's recode age into three categories‑‑18 through 34, 35 to
59, and 60 and over. Let's call this new variable AGE2. Retracing the steps you
used to create AGE1, recode AGE into AGE2.
Be sure to click on Reset in the Recode into Different Variables box to get rid of the recoding instructions for AGE1. When you are done, do a frequency distribution for AGE2. Your screen should look like Figure 3‑9.
There are two more important points to discuss. Look back at
Figure 3‑4.
It shows the Recode into Different
Variables: Old and New Values box. There are three options in the Old
Value box that we haven't discussed. Two are different ways of entering ranges.
You can enter the lowest value of the variable through some particular value
and you can enter some particular value through the highest value of the
variable. Make sure that you do not include your missing values in these ranges
or your missing values will become part of that category. For example, if 99 is the missing value for age, then recoding 70 through
highest would include the missing values with the oldest age category. This is
probably not what you want to do. So be careful.
Here is another important point. What happens if you don't recode a particular
value? If it is a missing value, it retains its status as a missing value in
the new variable. But what if it isn't a missing value? Any value (other than a
missing value) that is not recoded is changed into a system‑missing
value. If you want to leave a value in its original form, then click on All other values in the Old Value box and click
on Copy Old Value in the New Value box
and then click on Add.
Recoding into the Same
Variable
Now we are going to recode and have the recoded variable replace the old variable. This means that we will not create a new variable. We will replace the old variable with the recoded variable, but remember the warning given you earlier in this chapter. Click on Transform and then point your mouse at Recode. This time click on Recode Into Same Variables. Let's recode the variable called PRAY. Find PRAY on the list of variables on the left, click on it to highlight it, and then click on the arrow to the left of the NUMERIC VARIABLE box. This will move the variable PRAY into the big box in the middle of the window. Click on the Old and New Values button. This will open the Recode into Same Variables: Old and New Values box. Your screen should look like Figure 3‑10.
This looks very much like the box you just used (see Figure 3‑4). Combine the values 1 and 2 by clicking
on the fourth circle from the top under OLD VALUE and entering a 1 to the left
of through and a 2 to the right of through and then entering a 1 in the NEW
VALUE box and then clicking on ADD. Now combine values 3 and 4 into a category
called 2. Then combine values 5 and 6 into a third category called 3. Since
this is not a new variable, it will still be called PRAY.
You will want to change the value labels. Find the variable PRAY in the Data View by scrolling to the left. Point your mouse at the variable name (PRAY) and double click. This will open the Variable View tab in the Data Editor. Click on the small gray box in the Values box and change the labels to an appropriate name for the new variable. You will have to use the Change and Remove buttons to do this. Follow the instructions we just went through for recoding into different variables. When you finish, click on Analyze, then point your mouse at Descriptive Statistics, then click on Frequencies and move PRAY over to the VARIABLES box and click on OK. Your screen should look like Figure 3‑11.
When you recode into the same variable, a value that is not
recoded stays the same as it was in the original variable. If we had decided to
keep "never" (value 6) as a separate category, we could have left it
alone and it would have stayed a 6. Or we could have changed it to another
value such as 4. This is an important difference between recoding into the same
and different variables.
Recoding is a very useful procedure and one that you will
probably use a lot. It's worth spending time practicing how to recode so you
will be able to do it with ease when the time comes.
Creating New Variables Using
COMPUTE
You can also create new variables out of old variables using
compute. There are seven variables in the data set we have been using that ask
respondents if they think a women ought to be able to obtain a legal abortion
under various scenarios. These are the variables ABANY (woman wants abortion
for any reason), ABDEFECT (possibility of serious birth defect in baby), ABHLTH
(woman's health is seriously threatened), ABNOMORE (woman is married and
doesn't want any more children), ABPOOR (woman is poor and can't afford more
children), ABRAPE (pregnant as result of rape), and ABSINGLE (woman is not married). Each variable is coded 1 if the respondent
says yes (ought to be able to obtain a legal abortion) and 2 if the person says
no. The missing values are 0 (not applicable, question wasn't asked), 8 (don't
know), and 9 (no answer).
Compute will allow us to combine these seven variables,
creating a new variable that we will call ABORTION. If a person said yes to all
seven questions the new variable would equal 7 and if he or she said no to all
seven questions the new variable would equal 14. But what about missing values?
If any of the seven variables have a missing value, then the new variable would
be assigned a system‑missing value.
To use compute to do this, click on Transform and then click on Compute. Your screen should look like Figure 3‑12.
Type the name of the new variable, ABORTION, in the Target
Variable box. Then enter the formula for this new variable in the Numeric
Expression box. There are two ways to do this. One method is to click on the
first of the seven variables, ABANY, in
the list of variables on the left, then click on the arrow to the right of this
list. This will move ABANY into the Numeric Expression box. Now click on the plus sign and the plus sign moves into the
box.
Continue doing this until the box contains the following formula: ABANY + ABDEFECT + ABHLTH + ABNOMORE + ABPOOR + ABRAPE + ABSINGLE. (Don't type the period after ABSINGLE.) If you make a mistake, just click in the Numeric Expression box and use the arrow keys and the delete and backspace keys to make corrections. Your screen should look like Figure 3‑13.
Click on OK to
indicate that you want SPSS to create this new variable. You can use the scroll
bar to scroll to the far right of the matrix and view the variable you just
created. A second way to enter the formula in the Numeric Expression box is to
click in the box and type the formula directly into the box using the keyboard.
You can add variable and value labels to this variable by
pointing your mouse at the variable name (ABORTION) at the top of the column in
the data matrix and double clicking. This will open the Variable View tab in
the Data Editor. You can enter the
variable and value labels the way you were taught earlier in this chapter, and
in Chapter 2.
Enter the variable label Sum of Seven Abortion Variables.
Enter the value label High Approval for the value seven and Low
Approval for the value fourteen. (Remember that seven means they
approved of abortion in all seven scenarios and fourteen means they disapproved
all seven times.) Click
on OK.
You should check your new variable to see that it was calculated correctly. Go to Analyze, then Descriptive Statistics, and then Frequencies. Click on Reset to get rid of what is already in the box. Find the variable ABORTION, highlight it and click on the arrow to the left of the Variables box. Then click on OK. Your screen should look like Figure 3‑14. The lowest number should be 7 and the highest number should be 14. Do you remember why?
One of the problems with this approach is that the new variable (ABORTION) will be assigned a system missing value if one or more of the original variables has a missing value. We can avoid this problem by summing the values of the original variable and dividing by the number of variables with valid values. For example, if six of the seven original variables had valid values, then we would divide the sum by six. We can also tell SPSS to create this new variable only if at least four of the original variables have valid values. If fewer than four of the original variables have valid values, SPSS will assign it a system missing value.
We can do this by clicking on Transform and then on Compute and entering the new variable name in the Target Variable box. Let’s call this variable ABORT. In the Function Group: box, scroll down and click on Statistical. This will list the statistical functions in the Functions and Special Variables: box. Double-click on Mean. Your screen should look like Figure 3-15.
Notice that Mean(?,?) has been inserted in the Numeric Expression: box. What you want to do is to replace the (?,?) with the list of the seven original variables. It should now read (ABANY, ABDEFECT,ABHLTH, ABNOMORE, ABPOOR, ABRAPE, ABSINGLE). All that is left is to tell SPSS that you want to create this new variable only if at least four of the original variables have valid values. Do this by entering .4 following Mean so the expression reads Mean.4(ABANY, ABDEFECT, ABHLTH, ABNOMORE, ABPOOR, ABRAPE, ABSINGLE). Your screen should look like Figure 3-16.
Click on OK and run a frequency distribution to see what your new variable looks like. You screen should look like Figure 3-17.
Try creating another variable. Two of the variables in the
data set are the number of years of education of the respondent's father
(PAEDUC) and of the respondent's mother (MAEDUC). If we divide PAEDUC by MAEDUC
we will get the ratio of the father's education to the mother's education. Any value greater than one will mean that the father has more
education than the mother and any value less than one means the mother has more
education than the father. Any value close to one
means that the father and mother have about the same education.
We have a small problem though. If the mother's education is zero, then we will be dividing by zero, which is mathematically undefined. Let's recode any value of zero for MAEDUC so it becomes a one. This will avoid dividing by zero and still give us a useful ratio of father's to mother's education. Click on Transform, then point your mouse at Recode, and finally click on into same variables. (You may need to click on Reset to get rid of the recoding instructions used earlier.) Move MAEDUC into the Variables box by highlighting it in the list of variables on the left and clicking on the arrow to the right of this list. Click on Old and New Values and then type 0 into the Value box under Old Value and click in the Value box under New Value. Type 1 in this box and click on Add. Your screen should look like Figure 3‑18.
Click on Continue
and then on OK in the Recode Variables
box. Now we have changed each 0 for MAEDUC into a 1.
To create our new variable, click on Transform and then on Compute. (If
necessary, click on Reset to get rid of
the formula for the ABORT variable you just created.) Call this new variable
RATIO. So type RATIO in the Target Variable box. Now we want to write the
formula in the Numeric Expression box. Click in the list of variables on the
left and scroll down until you see PAEDUC. Click on it to highlight it and
click on the arrow to the right of the list to move it into the Numeric
Expression box.
SPSS uses the slash (/) to indicate division, so click on the / in the box in the center of the window. Click on the list of variables again and scroll up until you see MAEDUC and click on it to highlight it. Move it to the Numeric Expression box by clicking on the arrow. Your screen should look like Figure 3‑19.
Click on OK and SPSS
will create your new variable. Use the scroll bar to scroll to the right in the
data matrix until you can see the new variable you called RATIO. Scroll up and
down so you can see what the values of this variable look like. You may want to do a frequencies distribution
as a check to make sure the new variable was created correctly.
After looking at the frequencies distribution it is obvious
that it would be easier to understand if we grouped some of the scores
together, so create a new variable by recoding it into a Different Variable. Click on Transform
and then point your mouse at Recode and
then click on Into Different Variables.
Find the variable RATIO in the list of variables on the left and click on it to
highlight it. (Again, you may have to click Reset
if there is old information still in the boxes.) Click on the arrow to the right of this list
to move it into the box in the middle of the window. Type RATIO1 in the Name
box under Output Variable and type Recoded
Ratio in the Label box. Then click on Change.
Click on OLD AND NEW VALUES to open the Recode Into Different Variables: Old and New Values box. Click on the fifth bubble from the top under Old Value and then type 0.89 in the box to indicate that you want to recode the lowest value through 0.89. Click on the Value box under New Value and type 1 in that box, and then click on Add. Click on the fourth bubble from the top under Old Value and type 0.90 in the box to the left of through and 1.10 in the box to the right. Then type 2 in the Value box under New Value and click on Add. Finally, click on the sixth bubble from the top under Old Value and type 1.11 in the box to the left of through. Type 3 in the Value box under New Value and click on Add. Your screen should look like Figure 3‑20. Click on Continue and then on OK in the Recode Into Different Variables box.
Let’s add value labels to the new values. Find the variable
RATIO1 in the data matrix and double click on the variable name, RATIO1. This
will open the Variable View tab in the
Data Editor. Click on the small gray box
in the Values box and enter the labels. Type 1 in the Value box and under
0.90
in the Value Label box and then click on Add.
Do this twice more to add the label 0.90 through 1.10 to the value 2 and over 1.10 to the value 3. Your screen should look like Figure 3‑21. (This should seem familiar to you now.)
Click on OK in the Define Variable box. Run a frequencies distribution on the new variable to double-check your work. Your screen should look like Figure 3‑22.
The first category (under 0.90) means that Father's Education
was less than 90% of Mother's Education. The second category (0.90 through
1.10) means that Father's and Mother's Education were about the same, while the
third category (over 1.10) means that Father's Education was more than 110% of
Mother's Education. You can see that about 44% of the respondents have fathers
and mothers with similar education, while about 31% have fathers with
substantially less education than the mother and another 26% have fathers with
substantially more education than the mother.
You have already seen that SPSS uses + for addition and /
for division. It also uses ‑ for subtraction, * for multiplication, and
** for exponentiation. There are other arithmetic operators and a large number
of functions (e.g., square root) that can be used in compute statements.
Creating New Variables Using
IF
The IF command is another way to
create new variables out of old variables.
Perhaps we want to compare the level of education of each respondent's
father to that of his or her mother. Now, however, we're not interested in the precise
ratio, but just want to know if the father had more education than the mother,
the same amount, or less. We'll create a new variable that will have the value
1 when the father has more education than the mother, 2 when both have the same
amount of education and 3 when the mother has more education.
Click on TRANSFORM and then click on COMPUTE. (You may need to click on RESET to get rid of the instructions for creating RATIO.) Type the name of the new variable, COMPEDUC, in the Target Variable box. Then click on the Numeric Expression box and enter 1. So far, this is just like what you did in the previous section. Your screen should look like Figure 3‑23. This time, however, click on IF.
Click on: INCLUDE if case satisfies the condition:. Find PAEDUC in the list of variables on the left and click on it to highlight it. Then click on the arrow to the right of this list. This will move PAEDUC into the box to the right of the arrow. Now click on > (greater than). Find MAEDUC in the list of variables on the left, click on it, and click on the arrow to add MAEDUC to the formula. (Alternatively, you could click on the box to the right of the arrow and directly enter the formula, PAEDUC > MAEDUC.) Your screen should look like Figure 3‑24. Now click on Continue.
Click on OK.
Now repeat the same procedures as above, but this time setting the value of
COMPEDUC to 2 (instead of 1) and the formula to PAEDUC = MAEDUC. When you
are asked if you want to Change existing variable, click on OK. Now repeat the procedures a third time,
but change the value of COMPEDUC to 3 and the formula to PAEDUC < MAEDUC.
You can add variable and value labels to this variable, just
as you did earlier in this chapter and in Chapter 2. To do this, point your mouse at the variable name
at the top of the column (COMPEDUC) and double click. This will open the Variable View tab in the Data Editor. In the Values box you will see a small gray
button in the right-hand side of the box.
Point your mouse at this box and click.
This will open the Value Labels box.
Click in the box next to Value and type: 1.
Click on the box next to Value label (or press the Tab key) and type: Dad More.
Now click on Add. Repeat this procedure
for values 2 and 3, labeling them Same and Mom More
respectively. Click on Continue, then on
OK. Now run frequencies on your new
variable to double-check your work.
Using Select Cases
SPSS can also select subsets of cases for further analysis.
One of the variables in the data set is the respondent's religious preference
(RELIG). The categories include Protestant (value 1), Catholic (2), Jewish (3),
none (4), as well as other categories. The missing values are 98 (don't know)
and 99 (no answer). We might want to select only those respondents who have a
religious preference for analysis. We can do this by using the Select Cases
option in SPSS.
Click on DATA and then on SELECT CASES. This will open the Select Cases box. Your screen should look like Figure 3‑25. Notice that All Cases is currently selected. (The circle to the left of All Cases is filled in to indicate that it is selected.) We want to select a subset of these cases so click on the circle to the left of If condition is satisfied to select it. At the bottom of the window it says that unselected cases are filtered. This means that the cases you do not select can be used later if you click on All Cases. If you had selected Deleted, these unselected cases could not be used later. You should be very careful about saving a file after you have deleted cases because they are gone forever in that file. (You could, of course, get another copy of the data file by clicking on File and on Open.)
Scroll down the list of variables on the left until you come to RELIG and then click on it to highlight it. Click on the arrow to the right of this list to move RELIG into the box in the middle of the window. We want to select all cases that are not equal to 4 so click on the ~= sign. This symbol means "not equal to." Now click on 4 and the expression in the box will read RELIG ~= 4 which means that the variable RELIG does not equal 4 (the code for no religious preference). Your screen should look like Figure 3‑26. Click on Continue and then on OK in the Select Cases box. Run a frequencies distribution and check that your new variable gives you a reasonable range of values. Your screen should look like Figure 3‑27.
There are no respondents without a religious preference
(value 4) in this table because you selected only those cases with values not
equal to four.
What if we wanted to analyze only Protestants and Catholics? Click on DATA and then on SELECT CASES. Click on RESET to eliminate what you had entered previously. Scroll down the list of variables and click on RELIG and then click on the arrow to the right of the list to move it into the box. Click on = and then on 1 so the expression in the box reads "relig = 1". SPSS uses the symbol & for AND and the symbol | for OR. We want all cases for which RELIG is 1 or 2. Now click on |. Click on RELIG in the list of variables again and on the arrow to move it into the box. Then click on = and then on 2 so the expression in the box reads "relig = 1 | relig = 2" which means that RELIG will equal 1 or 2. Your screen should look like Figure 3‑28. Click on Continue and on OK in the Select Cases box.
Run a frequencies distribution on the new variable to see what it looks like. Your screen should look like Figure 3‑29. You will only have Protestants (1) and Catholics (2) in your table because you selected only those cases with values one and two on RELIG.
After you have selected cases for analysis, you will
probably want to continue your analysis with all the cases. To do this, click
on DATA, then on SELECT CASES, and then click on the circle to the left of All Cases. Click on OK and SPSS will select all the cases in the data file. This is
very important. If you don't do this,
you will continue to work with just the cases you have selected. This will work
only if you selected Unfiltered in the Select Cases box when you began using
select cases. If you selected Deleted, then you will have to get another copy of
the data file by clicking on FILE and
then on OPEN.
Weighting Cases
Sometimes you may want to weight some cases in your data
more heavily than others. Each household represented in the General Social
Survey (i.e., the data set you have been using in Chapter Three) had an equal
probability of selection. If there was more than one person eligible in the
household (18 years of age or older), then one of these individuals was
randomly selected. If there was one eligible person in the household, then that
person had a 1 out of 1 chance of being selected. If there were two eligible
people, then each person had a 1 out of 2 chance. If there were three
eligibles, then each person had a 1 out of 3 chance and so on.
In other words, the more eligible people in the household,
the smaller the chance of selection for any one of them. We can correct for
this by weighting each case by the number of eligible people in their household.
There is a variable called ADULTS which is the number of people 18 years of age
or older in the household and this is, of course, also the number of eligible
people in the household.
The number of adults in the household varied from one to five.
The following table shows what this distribution looks like.
|
Weighting Cases by Number of
Eligible Adults in Household (ADULTS) |
||
|
Number of Eligible Adults |
Number of Cases |
Weighted Number of Cases |
|
1 |
1047 |
1047 |
|
2 |
1364 |
2728 |
|
3 |
258 |
774 |
|
4 |
75 |
300 |
|
5 |
21 |
105 |
|
Total |
2765 |
4954 |
The weighted number of cases is just the number of eligible
adults multiplied by the number of cases. This means that each case with two
eligible adults has a weight twice that of each case with one eligible adult,
each case with three eligible adults has a weight three times that of each case
with one eligible, and so on.
The problem with this is that we started with 2,765 cases
and ended up with 4,954 cases. This artificially inflates the size of the
sample which we really don't want to do. There is an easy way to fix this. If
we divide 4,954 (the weighted sum of cases) by 2,765 (the actual number of
cases) we get 1.792. We can divide each weight by 1.792 to get an adjusted
weight. This would produce the following weighted data.
|
Weighting Cases Using Adjusted
Weights |
|||
|
Number of Eligible Adults |
Adjusted Weight |
Number of Cases |
Weighted Number of Cases |
|
1 |
1/1.792=0.558 |
1047 |
584.23 |
|
2 |
2/1.792=1.116 |
1364 |
1522.22 |
|
3 |
3/1.792=1.674 |
258 |
431.89 |
|
4 |
4/1.792=2.232 |
75 |
167.40 |
|
5 |
5/1.792=2.790 |
21 |
58.59 |
|
Total |
|
2765 |
2764.33 |
Notice that when using the adjusted weights, the weighted number of cases equals the number of cases (except for a small amount of rounding error). Let's use compute to create our new adjusted weight variable. We'll call this variable WADULTS for weighted adults. Click on TRANSFORM and then on COMPUTE. Click on RESET to get rid of what you entered previously. Type WADULTS in the Target Variable box. Find the variable ADULTS in the list of variables on the left and click on it to highlight it. Then click on the arrow to the right of this list to move it into the Numerical Expression box. Now click on / (for division) and then enter the value 1.792 by clicking on the one, then the decimal, then seven, then nine, and finally on two. The formula in the box should read “ADULTS/1.792” and your screen should look like Figure 3‑30. Click on OK and SPSS will create the new variable called WADULTS.
Now we want to weight the data using this variable we just created. Click on Data and then on Weight Cases. Click on the circle to the left of Weight cases by. Notice that this fills the circle in to indicate that it has been selected. Scroll down the list of variables on the left and find the variable WADULTS. Click on it to highlight it and then click on the arrow to the right of the list to move this variable into the Frequency Variable box. Your screen should look like Figure 3‑31. Click on OK and SPSS will weight the data appropriately.
Get a frequency distribution for the variable ADULTS using the weighted data. Click on ANALYZE, then point your mouse at DESCRIPTIVE STATISTICS, and then click on FREQUENCIES. Move the variable ADULTS into the Variables box and click on OK. The weighted frequency distribution should look like Figure 3‑32.
Notice that the frequencies are equal to the weighted number
of cases produced by using the adjusted weights we computed above. (Any
differences would be due to rounding error.)
If you want to go back to the unweighted data, you will have
to click on Data and then on Weight cases. Click on the circle to the left
of Do not weight cases and then on OK. Now you are using the unweighted cases
again.
In this part of the book you have learned how to recode,
create new variables using compute and if, select particular cases for
analysis, and weight cases. You can do more complicated things with these
commands than we have shown you, but these are the basics. You can use the SPSS Base 13.0 User’s Guide (SPSS, Inc.,
2004) to learn what else you can do with these commands. In the rest of this
book, we will focus on some of the statistical procedures that SPSS can do for
you.
Chapter Three Exercises
Use the GSS02A data set for all these exercises.
RECODE Exercises
1.
There are two variables that refer to the highest year
of school completed by the respondent's mother and father (MAEDUC and PAEDUC).
Do a frequency distribution for each of these variables. Now recode each of
them (into a different variable) into three categories: under 12 years of
school, 12 years, and over 12 years. Create new value labels for the recoded
categories. Do a frequency distribution again to make sure that you recoded
correctly.
2.
INCOME98 is the total family income. Do a frequency distribution
to see what the variable looks like before recoding. Recode (into a different
variable) into eight categories: under $10,000, $10,000 to $19,999, $20,000 to
$29,999, $30,000 to $39,999, $40,000 to $49,999, $50,000 to $59,999, $60,000 to
$74,999, and $75,000 and over. Be very
careful that you recode the values, not the labels associated with the values.
Call a new variable INCOME1. Create new value labels for the recoded
categories. Do another frequency distribution to make sure you recoded correctly.
Now recode
INCOME98 again (into a different variable). This time use only four categories: under $20,000, $20,000 to
$39,999, $40,000 to $59,999, and $60,000 and over. Call the
new variable INCOME2. Create new
value labels for the recoded categories. Do another frequency distribution to
make sure you recoded correctly.
COMPUTE Exercises
3.
In this chapter we created a new variable called
ABORTION which was the sum of the seven abortion variables in the data set.
Create a new variable called AB1, which is the sum of ABDEFECT, ABHLTH, and
ABRAPE. Do a frequency distribution for this new variable to see what it looks
like. How is this distribution different from the distribution for the ABORTION
variable based on all seven variables?
4.
There are five variables that measure tolerance for letting someone speak in your community who
may have very different views than your own (SPKATH, SPKCOM, SPKHOMO, SPKMIL,
and SPKRAC). For each of these
variables, 1 means that they would allow such a person to speak and 2 means that they would not allow it. Create a new variable
(call it SPEAK) which is the sum of these five
variables. This new variable would have a range from 5 (would allow a person to
speak in each of the five scenarios) to 10 (would not allow a person to speak
in any of the five scenarios). Do a
frequency distribution for this new variable to see what it looks like.
IF Exercises
5.
There are two variables that describe the highest
educational degree of the respondent's father and mother (PADEG and MADEG).
Create a new variable (call it MAPAEDUC) that indicates if the father and
mother have a college education. This variable should equal 1 if both parents
have a college education, 2 if only the father has a college education, 3 if
only the mother has a college education, and 4 if
neither parent has a college education. Create new value labels for the recoded
categories. Do a frequency distribution for this new variable to see what it
looks like.
6.
One variable indicates how often the respondent prays
(PRAY) and another variable indicates if the respondent approves or disapproves
of the Supreme Court's decision regarding prayer in the public schools
(PRAYER). Create a new variable (call it PRY) that is
a combination of these two variables. This variable should equal 1 if the
respondent prays a lot (once a day or several times a day) and approves of the
Supreme Court's decision, 2 if the respondent prays a lot (once a day or
several times a day) and disapproves of the Supreme Court's decision, 3 if the
respondent doesn't pray a lot and approves of the Supreme Court's decision, and
4 if the respondent doesn't pray a lot and disapproves of the Supreme Court's
decision. Do a frequency distribution for this new variable to see what it
looks like.
SELECT IF Exercises
7.
Select all males (1 on the variable SEX) and do a
frequency distribution for the variable FEAR (afraid to walk alone at night in
the neighborhood). Then select all females (2 on the variable SEX) and do a
frequency distribution for FEAR. Are males or females more fearful of walking
alone at night?
8.
Select all whites (1 on the variable RACE) and do a
frequency distribution for the variable PRES00. Did they vote for Bush, Gore,
or Nader in 2000? Then select all blacks (2 on the
variable RACE) and do a frequency distribution for PRES00. Were whites or
blacks more likely to vote for Gore?
[1]
For example, you can work with what SPSS calls “system-missing” values. All blanks will automatically be changed to
system-missing values. You can change
these system‑missing values into another value, or you can change both
the system‑missing values and the missing values that you define into
another value.
[2] If your list shows labels, you can change the display. Check footnote 1 on page 5.