Chapter
Three:Transforming Data
This chapter explains how to change, or transform, the values associated
with your variables, like the values entered in the data entry process
shown in Chapter 2.SPSS for Windows
can transform the values in several ways.SPSS
for Windows can:
1.combine
values of a variable into several categories,
2.create
new variables out of old variables,
3.select
particular cases and analyze only these cases,
4.weight
cases so that some cases count more heavily than others.
SPSS provides many ways to transform data. Covered
in this chapter are recode, compute, if, and weight.
Recoding
Variables
Recoding is a way of combining the values of a variable
into fewer categories.Let me give
a hypothetical example.Let’s say
you have conducted a survey and one of your demographic
questions was the age of the respondent.Entering
the actual age in years would be the simplest way of working with the data.But
let’s also say that you want to compare people of different age groupings.In
other words, your data would be more useful if it was organized into collapsed
categories, like "Young", "Middle age", and "Older".Using
SPSS for Windows you could reorganize the data so that you had these three
groupings.There are two things you
need to know before you recode the values.First,
you need to decide the number of categories you want to end up with.Generally
this will be determined by the way you plan to use the information.If
you are going to analyze the data using a table where you cross-tabulate
two variables (see Chapter 5), you probably want to limit the number of
new categories to three or four.The
second thing you need to know is which of the old values are going to be
combined into a new category.For
example, you might do something like this.
|
The actual age of the
respondent as originally recorded in the data file.
|
The new, collapsed,
category.
|
|
18 years through 35
years
|
Young
|
|
36 years through 55
years
|
Middle age
|
56 years of age or over
|
Older
|
Another example might be if respondents were asked
how often they prayed, and the original responses were several times a
day, once a day, several times a week, once a week, less than once a week,
once a month, once a year, or never.With
recode we can combine the people who said “several times a day” with the
people who said "once a day" and put all these respondents into a new category
which we could call "a lot".Similarly,
we could combine the people who said "several times a week" with those
who said "once a week" and call this category "sometimes" and combine those
who said "less than once a week" and "once a month" and call this category
“infrequently".Those who said "once
a year" or "never" could be combined into a fourth category called "hardly
ever". Recoding is the process in SPSS that will do the above examples.
Starting SPSS for Windows the way you were taught
in Chapter 1, bring in the GSS00A file, as you did in Chapter 1.Our
task is going to be to recode the variable called AGE, which is, of course,
the respondent's age.
Click on "Transform" and then point your mouse at
"Recode". Your screen will look like Figure 3-1.
Figure 3-1
Now we have two options:"Recode Into
Different Variables" and "Recode Into Same Variables".It
is strongly suggested that the beginning student only use the "Recording Into
Different Variables" option.If you
make an error, your original variable is still in the file and you can
try again.If you make an error using
"Recode Into the Same Variables", you have
changed the original variable.If
you also saved the file after doing this, and you did not have another
copy of the file, you have just eliminated any chance of correcting your
error.
Recoding
Into Different Variables
The recoding into a different variable starts with
giving the new variable a variable name.For
example, if we recode into different variables we could combine ages into
one set of categories and call this new variable AGE1 and then recode ages
into a different set of categories called AGE2.To
do that, click on "Into Different Variables". Your screen will look like
Figure 3-2.
Figure 3-2
Find "AGE" in the list of variables on the left
and click on it to highlight it, and then click on the arrow just to the
left of the big box in the middle of the window. This will move AGE into
the list of variables to recode.
You want to give a name to this new variable so
click in the "Name" box under Output Variable and type the name "AGE1"
in this box. You can even type a variable label for this new variable in
the Label box just below the Name box. Try typing "Age in Four Categories"
as your label. Click on the "Change" button to tell SPSS to make these
changes. Your screen will look like Figure 3-3.
Figure 3-3
Now we have to tell SPSS how to create these categories. Click on the
"Old and New Values" button at the bottom of the window. The screen will
look like Figure 3-4.
Figure 3-4
There are several options. You can change a particular
value into a new value by entering the value to be changed into the Old
Value box and the new value into the New Value box and then clicking on
Add. You will usually change one "real" value to another "real" value.
For
example, change 18 thru 35 into value 1.
There
are also other options.
As you can tell from the previous example, you can
also change a range of values into a new value and that is what we are
going to do. Click on the fourth bubble from the top labeled "Range". Notice
how this marks this choice by filling in the bubble. Then type "18" (the
youngest age in the data set) in the box to the left of "through", click
on the box to the right of through, and type "29" in that box. Then click
on "Value" just below New Value and type "1" in that box. This will have
SPSS combine all ages from 18 through 29 into a single category and give
it the value of 1. Then click on "Add".
Now do the same thing for the other categories. Click
on the box under Range and type "30" in the box to the left of "through",
click on the box to the right of through, and type "49" in that box. Click
on "Value" just below New Value and type "2" in that box and click on Add.
Do the same thing for the category 50 to 69 (give this a new value of "3")
and the category 70 to 89 (the largest age in the data set). Give this
last category a new value of "4". Your screen should look like Figure 3-5.
Figure 3-5
To change one of your categories, highlight that
category in the Old?> New box and make the changes, then click on "Change".
The new category should appear in the Old?>New box. To remove a category,
highlight it and click on "Remove".
Now we want SPSS to carry out the recoding. Click
on "Continue" at the bottom of the window. This will take you back to the
“Recode into Different Variables” box. Click on "OK" and SPSS will take
a few seconds to carry out your commands. The data matrix should appear
on the screen. When it says that the SPSS Processor is Ready
at the bottom of the window you know that SPSS has finished with the recoding.
Click on "Analyze", then point your mouse at "Descriptive
Statistics", and then click on "Frequencies". Notice that AGE1 has appeared
in the list of variables on the left. Click on it to highlight it and click
on the arrow to move it to the Variables box. Then click on "OK". An output
window will open. Click on "Age in Four Categories" in the left?hand pane
or use the scroll box until you can see the entire table. Your screen will
look like Figure 3-6.
Figure 3-6
Let's take a look at the data matrix. Click on "Window"
in the menu bar. In the box that is opened you will see a list of all the
windows you have opened. One of these windows will be called "GSS00A ?
SPSS Data Editor". Click on that line and the data matrix window will be
moved to the foreground and you will see it on your screen. Use the scroll
bar in the lower?right part of the window to scroll to the right until
you see a column titled AGE1. (It will be the last column in the matrix.)
This is the new variable you just created. Your screen should look like
Figure 3-7.
Figure 3-7
Use the other scroll bar to scroll down and see
the values in this variable. Look back at Figure 3-6 and you will see that
there are no value labels for categories 1 through 4 for the new variable
AGE1.
If you want the output to give you more information
about what each category means you need to insert value labels.To
do this, point your mouse at the variable name at the top of the column
(AGE1) and double click. This will open the "Variable View" tab in the
Data Editor.Now you’re going to
enter labels for the values in the recoded variable using what you learned
in Chapter Two.In the Values box
you will see a small gray button in the right-hand side of the box.Point
your mouse at this box and click.This
will open the Value Labels box.You
will see two more boxes??Value and Value Label. Click in the Value box
and type the value "1". Then click in the Value Label box and type the
label for the first category, "under 30". Then
click on "Add" and the new label will appear in another box just to the
right of the Add button. Then click in the Value box and type the value
"2" and type the label for the second category, "30 to 49", and click on
"Add". Do this for values 3 and 4. If you make a mistake you can use the
Change and Remove buttons which work the same way we just described. Your
screen should look like Figure 3-8.
Figure 3-8
Click on "OK". Now click
on "Analyze", point your mouse at "Descriptive Statistics", and then click
on "Frequencies" and rerun the frequencies distribution for AGE1. This
time it should have the value labels you just entered on the output.
We said that recoding into different variables allowed
you to recode a variable in more than one way. Let's recode AGE again,
but this time let's recode age into three categories??18 through 34, 35
to 59, and 60 and over. Let's call this new variable AGE2. Retracing the
steps you used to create AGE1, recode AGE into AGE2.
Be sure to click on "Reset" in the "Recode into Different
Variables" box to get rid of the recoding instructions for "AGE1". When
you are done, do a frequency distribution for AGE2. Your screen should
look like Figure 3-9.
Figure 3-9
There are two more important points to discuss.
Look back at Figure 3-4.It shows
the "Recode into Different Variables: Old and New Values" box. There are
three options in the Old Value box that we haven't discussed. Two are different
ways of entering ranges. You can enter the lowest value of the variable
through some particular value and you can enter some particular value through
the highest value of the variable. Make sure that you do not include your
missing values in these ranges or your missing values will become part
of that category. For example, if 99 is the
missing value for age, then recoding 70 through highest would include the
missing values with the oldest age category. This is probably not what
you want to do. So be careful.
Here is another important point.What
happens if you don't recode a particular value? If it is a missing value,
it retains its status as a missing value in the new variable. But what
if it isn't a missing value? Any value (other than a missing value) that
is not recoded is changed into a system?missing value. If you want to leave
a value in its original form, then click on "All other values" in the Old
Value box and click on "Copy Old Values" in the New Value box and then
click on "Add".
Recoding
into the Same Variable
Now we are going to recode and have the recoded variable
replace the old variable. This means that we will not create a new variable.
We will replace the old variable with the recoded variable, but remember
the warning given you earlier in this chapter.Click
on "Transform" and then point your mouse at "Recode". This time click on
"Recode Into Same Variables". Let's recode
the variable called PRAY. Find PRAY on the list of variables on the left,
click on it to highlight it, and then click on the arrow to the left of
the Numeric Variables box. This will move the variable PRAY into the big
box in the middle of the window. Click on the "Old and New Values" button.
This will open the Recode into Same Variables: Old and New Values box.
Your screen should look like Figure 3-10.
Figure 3-10
This box looks very much like the box you just used
(see Figure 3-4). Combine the values 1 and 2 and give this category the
value 1. Combine values 3 and 4 into another category and call this 2.
Then combine values 5 and 6 into a third category and call this 3. We don't
have to go through the instructions again, since it's the same as before.
Moreover, since this is not a new variable, it will still be called PRAY.
You will want to change the value labels. Find the
variable PRAY in the data matrix by scrolling to the left.Point
your mouse at the variable name (PRAY) and double click. This will open
the "Variable View" tab in the Data Editor.Click
on the small gray box in the Values box and change the labels in whatever
way you want. You will have to use the Change and Remove buttons to do
this. Follow the instructions we just went through for recoding into different
variables. When you finish, click on "Analyze", then point your mouse at
"Descriptive Statistics", then click on "Frequencies" and get a frequency
distribution for PRAY. Your screen should look like Figure 3-11.
Figure 3-11
When you recode into the same variable, a value
that is not recoded stays the same as it was in the original variable.
If we had decided to keep "never" (value 6) as a separate category, we
could have left it alone and it would have stayed a 6. Or we could have
changed it to another value such as 4. This is an important difference
between recoding into the same and different variables.
Recoding is a very useful procedure and one that
you will probably use a lot. It's worth spending time practicing how to
recode so you will be able to do it with ease when the time comes.
Creating
New Variables Using COMPUTE
You can also create new variables out of old variables
using compute. There are seven variables in the data set we have been using
that ask respondents if they think a women ought to be able to obtain a
legal abortion under various scenarios. These are the variables ABANY (woman
wants abortion for any reason), ABDEFECT (possibility of serious birth
defect in baby), ABHLTH (woman's health is seriously threatened), ABNOMORE
(woman is married and doesn't want any more children), ABPOOR (woman is
poor and can't afford more children), ABRAPE (pregnant as result of rape),
and ABSINGLE (woman is not married).Each
variable is coded 1 if the respondent says yes (ought to be able to obtain
a legal abortion) and 2 if the person says no. The missing values are 0
(not applicable, question wasn't asked), 8 (don't know), and 9 (no answer).
Compute will allow us to combine these seven variables,
creating a new variable that we will call "ABORTION". If a person said
yes to all seven questions the new variable would equal 7 and if he or
she said no to all seven questions the new variable would equal 14. But
what about missing values? If any of the seven variables have a missing
value, then the new variable would be assigned a system?missing value.
To use compute to do this, click on "Transform" and
then click on compute. Your screen should look like Figure 3-12.
Figure 3-12
Type the name of the new variable, "ABORTION", in
the Target Variable box. Then enter the formula for this new variable in
the Numeric Expression box. There are two ways to do this. One method is
to click on the first of the seven variables, "ABANY", in the list of variables
on the left, then click on the arrow to the right of this list. This will
move ABANY into the Numeric Expression box. Now click on the "plus" sign
and the plus sign moves into the box.
Continue doing this until the box contains the following
formula: ABANY + ABDEFECT + ABHLTH + ABNOMORE + ABPOOR + ABRAPE + ABSINGLE.
(Don't type the period after ABSINGLE.) If you make a mistake, just click
in the Numeric Expression box and use the arrow keys and the delete and
backspace keys to make corrections. Your screen should look like Figure
3-13.
Figure 3-13
Click on "OK" to indicate that you want SPSS to
create this new variable. You can use the scroll bar to scroll to the far
right of the matrix and view the variable you just created. A second way
to enter the formula in the Numeric Expression box is to click in the box
and type the formula directly into the box using the keyboard.
You can add variable and value labels to this variable
by pointing your mouse at the variable name (ABORTION) at the top of the
column in the data matrix and double clicking. This will open the Variable
View tab in the Data Editor.Click
on the small gray box in the Values box and change the labels in whatever
way you want. You can enter the variable and value labels the way you were
taught earlier in this chapter, and in Chapter 2.
Enter the variable label "Sum of Seven Abortion Variables".
Enter the value label "High Approval" for the value seven and "Low Approval"
for the value fourteen. (Remember that seven means they approved of abortion
in all seven scenarios and fourteen means they disapproved all seven times.)
You should check your new variable to see that it
was calculated correctly.Go to "Analyze",
then "Descriptive Statistics", and then "Frequencies".Click
on "Reset" to get rid of what is already in the box. Find the variable
ABORTION, highlight it and click on the arrow to the left of the Variables
box. Then click on "OK". Your screen should look like Figure 3-14.The
lowest legitimate number should be 7 and the highest legitimate number
should be 14.Do you remember why?
Figure 3-14
Try creating another variable. Two of the variables
in the data set are the number of years of education of the respondent's
father (PAEDUC) and of the respondent's mother (MAEDUC). If we divide PAEDUC
by MAEDUC we will get the ratio of the father's education to the mother's
education. Any value greater than one will mean
that the father has more education than the mother and any value less than
one means the mother has more education than the father.Any
value close to one means that the father and mother have about the same
education.
We have a small problem though. If the mother's
education is zero, then we will be dividing by zero, which is mathematically
undefined. Let's recode any value of zero for MAEDUC so it becomes a one.
This will avoid dividing by zero and still give us a useful ratio of father's
to mother's education. Click on "Transform", then point your mouse at "Recode",
and finally click on “into same variables". (You may need to click on "Reset"
to get rid of the recoding instructions used earlier.)Move
MAEDUC into the Variables box by highlighting it in the list of variables
on the left and clicking on the arrow to the right of this list. Click
on “Old and New Values” and then type "0" into the Value box under Old
Value and click in the Value box under New Value. Type "1" in this box
and click on "Add". Your screen should look like Figure 3-15.
Figure 3-15
Click on "Continue" and then on "OK" in the Recode
Variables box. Now we have changed each 0 for MAEDUC into a 1.
To create our new variable, click
on "Transform" and then on compute.(If
necessary, click on "Reset" to get rid of the formula for the ABORTION
variable you just created.) Call this new variable RATIO. So type RATIO
in the Target Variable box. Now we want to write the formula in the Numeric
Expression box. Click in the list of variables on the left and scroll down
until you see PAEDUC. Click on it to highlight it and click on the arrow
to the right of the list to move it into the Numeric Expression box.
SPSS uses the slash "/" to indicate division, so click
on the / in the box in the center of the window. Click on the list of variables
again and scroll up until you see MAEDUC and click on it to highlight it.
Move it to the Numeric Expression box by clicking on the arrow. Your screen
should look like Figure 3-16.
Figure 3-16
Click on "OK" and SPSS will create your new variable.
Use the scroll bar to scroll to the right in the data matrix until you
can see the new variable you called RATIO. Scroll up and down so you can
see what the values of this variable look like.You
may want to do a frequencies distribution as a check to make sure the new
variable was created correctly.
After looking at the frequencies distribution it
is obvious that it would be easier to understand if we grouped some of
the scores together, so create a new variable by recoding it into a different
variable.Click on "Transform" and
then point your mouse at "Recode" and then click on "Into Different Variables".
Find the variable RATIO in the list of variables on the left and click
on it to highlight it. (Again, you may have to click “Reset” if there is
old information still in the boxes.)Click
on the arrow to the right of this list to move it into the box in the middle
of the window. Type RATIO1 in the Name box under
Output Variable and type "Recoded Ratio" in the Label box. Then
click on "Change".
Click on "Old and New Values" to open the Recode Into
Different Variables: Old and New Values box. Click on the fifth bubble
from the top under Old Value and then type "0.89" in the box to indicate
that you want to recode the lowest value through 0.89. Click on the Value
box under New Value and type "1" in that box, and then click on "Add".
Click on the fourth bubble from the top under Old Value and type "0.90"
in the box to the left of through and "1.10" in the box to the right.Then
type "2" in the Value box under New Value and click on "Add". Finally,
click on the sixth bubble from the top under Old Value and type "1.11"
in the box to the left of through. Type "3" in the Value box under New
Value and click on "Add". Your screen should look like Figure 3-17. Click
on "Continue" and then on "OK" in the Recode Into
Different Variables box.
Figure 3-17
Let’s add value labels to the new values. Find the
variable RATIO1 in the data matrix and double click on the variable name,
RATIO1. This will open the "Variable View" tab in the Data Editor.Click
on the small gray box in the Values box and enter the labels. Type "1"
in the Value box and "under 0.90" in the Value Label box and then click
on "Add".
Do this twice more to add the
label "0.90 through 1.10" to the value 2 and "over 1.10" to the value 3.
Your screen should look like Figure 3-18.(This
should seem familiar to you now.)
Figure 3-18
Click on "OK" in the Define Variable box.Run
a frequencies distribution on the new variable to double-check your work.Your
screen should look like Figure 3-19.
Figure 3-19
The first category (under 0.90) means that father's
education was less than 90% of mother's education. The second category
(0.90 through 1.10) means that father's and mother's education were about
the same, while the third category (over 1.10) means that father's education
was more than 110% of mother's education. You can see that about 44% of
the respondents have fathers and mothers with similar education, while
about 29% have fathers with substantially less education than the mother
and another 27% have fathers with substantially more education than the
mother.
You have already seen that SPSS uses + for addition
and / for division. It also uses ? for subtraction, * for multiplication,
and ** for exponentiation. There are other arithmetic operators and a large
number of functions (e.g., square root) that can be used in compute statements.
Creating
New Variables Using IF
The "if" command is another way to create new variables
out of old variables.Perhaps
we want to compare the level of education of each respondent's father to
that of his or her mother. Now, however, we're not interested in the precise
ratio, but just want to know if the father had more education than the
mother, the same amount, or less. We'll create a new variable that will
have the value 1 when the father has more education than the mother, 2
when both have the same amount of education and 3 when the mother has more
education.
Click on "Transform" and then click on compute. (You
may need to click on "Reset" to get rid of the instructions for creating
RATIO.) Type the name of the new variable, COMPEDUC, in the Target Variable
box. Then click on the Numeric Expression box and enter "1". So far, this
is just like what you did in the previous section. Your screen should look
like Figure 3-20. This time, however, click on "If".
Figure 3-20
Click on: "Include if case satisfies condition:".
Find PAEDUC in the list of variables on the left and click on it to highlight
it. Then click on the arrow to the right of this list. This will move PAEDUC
into the box to the right of the arrow. Now click on ">" (greater than).
Find MAEDUC in the list of variables on the left, click on it, and click
on the arrow to add MAEDUC to the formula. (Alternatively, you could click
on the box to the right of the arrow and directly enter the formula, PAEDUC
> MAEDUC.) Your screen should look like Figure 3-21. Now click on "Continue".
Figure 3-21
Click on "OK". Now repeat
the same procedures as above, but this time setting the value of COMPEDUC
to "2" (instead of 1) and the formula to PAEDUC = MAEDUC. When you are
asked if you want to Change existing variable, click on "OK". Now repeat
the procedures a third time, but change the value of COMPEDUC to "3" and
the formula to PAEDUC < MAEDUC.
You can add variable and value labels to this variable,
just as you did earlier in this chapter and in Chapter 2.
Click on the box next to Value and type: "1". Click
on the box next to Value label (or press the Tab key) and type: "Dad More".
Now click on "Add". Repeat this procedure for values 2 and 3, labeling
them "Same" and "Mom More" respectively. Click on "Continue", then on "OK".
Now run frequencies on your new variable to double-check your work.
Using
Select Cases
SPSS can also select subsets of cases for further
analysis. One of the variables in the data set is the respondent's religious
preference (RELIG). The categories include Protestant (value 1), Catholic
(2), Jewish (3), none (4), as well as other categories. The missing values
are 98 (don't know) and 99 (no answer). We might want to select only those
respondents who have a religious preference for analysis. We can do this
by using the Select Cases option in SPSS.
Click on "Data" and then on "Select
Cases". This will open the select cases box. Your screen should
look like Figure 3-22 . Notice that All Cases is currently selected.
(The circle to the left of All Cases is filled in to indicate that it is
selected.) We want to select a subset of these cases so click on the circle
to the left of “If condition is satisfied” to select it. At the bottom
of the window it says that unselected cases are filtered. This means that
the cases you do not select can be used later if you click on "All Cases".If
you had selected "Deleted", these unselected cases could not be used later.
You should be very careful about saving a file after you have deleted cases
because they are gone forever in that file. (You could, of course, get
another copy of the data file by clicking on "File" and on "Open".)
Figure 3-22
Click on "If" (below "If condition is satisfied")
and this will open the Select Cases: If box. Scroll down the list of variables
on the left until you come to RELIG and then click on it to highlight it.
Click on the arrow to the right of this list to move RELIG into the box
in the middle of the window. We want to select all cases that are not equal
to 4 so click on the "~=" sign. This symbol means "not equal to." Now click
on "4" and the expression in the box will read RELIG ~= 4 which means that
the variable RELIG does not equal 4 (the code for no religious preference).
Your screen should look like Figure 3-23.Click
on "Continue" and then on "OK" in the Select Cases box.Run
a frequencies distribution and check that your new variable gives you a
reasonable range of values.There
are no respondents without a religious preference (value 4) in this table
because you selected only those cases with values not equal to four.
Figure 3-23
What if we wanted to analyze only Protestants and
Catholics?
First, you have to go back
to the Data Editor window.
Now
click on "Data" and then on "Select Cases". Click on "Reset" to eliminate
what you had entered previously. Click on "If condition is satisfied" and
then on "If". Scroll down the list of variables and click on RELIG and
then click on the arrow to the right of the list to move it into the box.
Click
on "=" and then on "1" so the expression in the box reads "relig = 1".
SPSS uses the symbol & for AND and the symbol | for OR. We want all
cases for which RELIG is 1 or 2. Now click on "|".
Click
on RELIG in the list of variables again and on the arrow to move it into
the box. Then click on "=" and then on "2" so the expression in the box
reads "relig = 1 | relig = 2" which means that RELIG will equal 1 or 2.
Your screen should look like Figure 3-24.
Click
on "Continue" and on "OK" in the Select Cases box.
Figure 3-24
Run a frequencies distribution on the new variable
to see what it looks like.Your screen
should look like Figure 3-25. You will only have Protestants (1) and Catholics
(2) in your table because you selected only those cases with values one
and two on RELIG.
Figure 3-25
After you have selected cases for analysis, you
will probably want to continue your analysis with all the cases. To do
this, click on "Data", then on "Select Cases", and then click on the circle
to the left of "All cases". Click on "OK" and SPSS will select all the
cases in the data file. This is very important.If
you don't do this, you will continue to work with just the cases you have
selected. This will work only if you selected Unfiltered in the Select
Cases box when you began using select cases. If you selected Deleted, then
you will have to get another copy of the data file by clicking on "File"
and then on "Open".
Weighting
Cases
Sometimes you may want to weight some cases in your
data more heavily than others. Each household represented in the General
Social Survey (i.e., the data set you have been using in Chapter Three)
had an equal probability of selection. If there was more than one person
eligible in the household (18 years of age or older), then one of these
individuals was randomly selected. If there was one eligible person in
the household, then that person had a 1 out of 1 chance of being selected.
If there were two eligible people, then each person had a 1 out of 2 chance. If
there were three eligible, then each person had a 1 out of 3 chance and
so on.
In other words, the more eligible people in the household,
the smaller the chance of selection for any one of them. We can correct
for this by weighting each case by the number of eligible people in their
household. There is a variable called ADULTS which is the number of people
18 years of age or older in the household and this is, of course, also
the number of eligible people in the household.
The number of adults in the household varied from
one to six. The following table shows what this distribution looks like.
|
Weighting Cases by Number
of Eligible Adults in Household (ADULTS)
|
|
Number of Eligible
Adults
|
Number of Cases
|
Weighted Number
of Cases
|
|
1
|
959
|
959
|
|
2
|
1511
|
3022
|
|
3
|
253
|
759
|
|
4
|
77
|
308
|
|
5
|
13
|
65
|
|
6
|
3
|
12
|
|
Total
|
2815
|
5125
|
The weighted number of cases is just the number of
eligible adults multiplied by the number of cases. This means that each
case with two eligible adults has a weight twice that of each case with
one eligible adult, each case with three eligible adults has a weight three
times that of each case with one eligible, and so on.
The problem with this is that we started with 2,815
cases and ended up with 5,125 cases. This artificially inflates the size
of the sample which we really don't want to do. There is an easy way to
fix this. If we divide 5,125 (the weighted sum of cases) by 2,815 (the
actual number of cases) we get 1.821. We can divide each weight by 1.821
to get an adjusted weight. This would produce the following weighted data.
|
Weighting
Cases Using Adjusted Weights
|
|
Number
of Eligible
Adults
|
Adjusted
Weight
|
Number
of Cases
|
Weighted
Number
of
Cases
|
|
1
|
1/1.821=0.549
|
959
|
526.49
|
|
2
|
2/1.821=1.098
|
1511
|
1659.08
|
|
3
|
3/1.821=1.647
|
253
|
416.69
|
|
4
|
4/1.821=2.197
|
77
|
169.17
|
|
5
|
5/1.821=2.746
|
13
|
35.70
|
|
6
|
6/1.821=3.295
|
2
|
6.59
|
|
Total
|
|
2815
|
2813.72
|
Notice that when using the adjusted weights, the weighted
number of cases equals the number of cases (except for a small amount of
rounding error). Let's use compute to create our new adjusted weight variable.
We'll call this variable WADULTS for weighted adults. Click
on "Transform" and then on “Compute”. Click on "Reset" to get rid
of what you entered previously. Type WADULTS in
the Target Variable box. Find the variable ADULTS in the list of
variables on the left and click on it to highlight it. Then click on the
arrow to the right of this list to move it into the Numerical Expression
box. Now click on "/" (for division) and then enter the value 1.821 by
clicking on the "one", then the "decimal", then "eight", then "two", and
finally on "one". The formula in the box should read ADULTS/1.821 and your
screen should look like Figure 3-26. Click on "OK" and SPSS will create
the new variable called WADULTS.
Figure 3-26
Now we want to weight the data using this variable
we just created. Click on "Data" and then on "Weight
Cases". Click on the circle to the left of Weight cases by. Notice
that this fills the circle in to indicate that it has been selected. Scroll
down the list of variables on the left and find the variable WADULTS. Click
on it to highlight it and then click on the arrow to the right of the list
to move this variable into the Frequency Variable box. Your screen should
look like Figure 3-27.Click on "OK"
and SPSS will weight the data appropriately.
Figure 3-27
Get a frequency distribution
for the variable ADULTS using the weighted data. Click on "Analyze", then
point your mouse at "Descriptive Statistics", and then click on "Frequencies".
Move the variable ADULTS into the Variables box and click on "OK".
Notice, figure 3-28, that the frequencies are very
close to the weighted number of cases produced by using the adjusted weights
we computed above. (Any differences are due to rounding error.)
Figure 3-28
If you want to go back to the un-weighted data,
you will have to click on "Data" and then on "Weight cases". Click
on the circle to the left of “Do not weight cases” and then on "OK".
Now you are using the un-weighted cases again.
Summary
In this part of the book you have learned how to
recode, create new variables using compute and "if", select particular
cases for analysis, and weight cases. You can do more complicated things
with these commands than we have shown you, but these are the basics. You
can use the SPSS 11.0 Syntax Reference Guide (Prentice Hall, 2002)
to learn what else you can do with these commands. In the rest of this
book, we will focus on some of the statistical procedures that SPSS can
do for you.
Chapter
Three Exercises
Use the GSS00A data set for all these exercises.
RECODE
Exercises
1.There
are two variables that refer to the highest year of school completed by
the respondent's mother and father (MAEDUC and PAEDUC). Do a frequency
distribution for each of these variables. Now recode each of them (into
a different variable) into three categories: under 12 years of school,
12 years, and over 12 years. Create new value labels for the recoded categories.
Do a frequency distribution again to make sure that you recoded correctly.
2.INCOME98
is the total family income. Do a frequency distribution to see what the
variable looks like before recoding. Recode (into a different variable)
into eight categories: under $10,000, $10,000 to $19,999, $20,000 to $29,999,
$30,000 to $39,999, $40,000 to $49,999, $50,000 to $59,999, $60,000 to
$74,999, and $75,000 and over.Be
very careful that you recode the values, not the labels associated with
the values. Call this new variable INCOME1. Create new value labels for
the recoded categories. Do another frequency distribution to make sure
you recoded correctly.
Now
recode INCOME98 again (into a different variable). This time use only four
categories: under $20,000, $20,000 to $39,999, $40,000 to $59,999, and
$60,000 and over. Call this new variable INCOME2. Create new value labels
for the recoded categories. Do another frequency distribution to make sure
you recoded correctly.
COMPUTE
Exercises
3.In
this chapter we created a new variable called ABORTION which was the sum
of the seven abortion variables in the data set. Create a new variable
called AB1, which is the sum of ABDEFECT, ABHLTH, and ABRAPE. Do a frequency
distribution for this new variable to see what it looks like. How is this
distribution different from the distribution for the ABORTION variable
based on all seven variables?
4.There
are five variables that measure tolerance for letting
someone speak in your community who may have very different views
than your own (SPKATH, SPKCOM, SPKHOMO, SPKMIL, and SPKRAC).For
each of these variables, 1 means that they would allow such a person to
speak and 2 means that they would not allow
it. Create a new variable (call it SPEAK)
which is the sum of these five variables. This new variable would have
a range from 5 (would allow a person to speak in each of the five scenarios)
to 10 (would not allow a person to speak in any of the five scenarios).Do
a frequency distribution for this new variable to see what it looks like.
IF
Exercises
5.There
are two variables that describe the highest educational degree of the respondent's
father and mother (PADEG and MADEG). Create a new variable (call it MAPAEDUC)
that indicates if the father and mother have a college education. This
variable should equal 1 if both parents have a college education, 2 if
only the father has a college education, 3 if only the mother has a college education,
and 4 if neither parent has a college education. Create new value labels
for the recoded categories. Do a frequency distribution for this new variable
to see what it looks like.
6.One
variable indicates how often the respondent prays (PRAY) and another variable
indicates if the respondent approves or disapproves of the Supreme Court's
decision regarding prayer in the public schools (PRAYER). Create a new
variable (call it PRY) that is a combination
of these two variables. This variable should equal 1 if the respondent
prays a lot (once a day or several times a day) and approves of the Supreme
Court's decision, 2 if the respondent prays a lot (once a day or several
times a day) and disapproves of the Supreme Court's decision, 3 if the
respondent doesn't pray a lot and approves of the Supreme Court's decision,
and 4 if the respondent doesn't pray a lot and disapproves of the Supreme
Court's decision. Do a frequency distribution for this new variable to
see what it looks like.
SELECT
IF Exercises
7.Select
all males (1 on the variable SEX) and do a frequency distribution for the
variable FEAR (afraid to walk alone at night in the neighborhood). Then
select all females (2 on the variable SEX) and do a frequency distribution
on FEAR. Are males or females more fearful of walking alone at night?
8.Select
all whites (1 on the variable RACE) and do a frequency distribution for
the variable PRES96. Did they vote for Clinton, Dole, or Perot in 1996?
Then select all blacks (2 on the variable RACE) and do a frequency distribution
on PRES96. Were whites or blacks more likely to vote for Clinton?