Chapter Two: Creating a Data File

 

This chapter explains how to set up a file with new data. After finishing this chapter, you should be able to create a SPSS data file that will include the data and some labeling that gives more detail about the data. To illustrate this process, we will use a shortened version of the questionnaire used by the General Social Survey conducted by the National Opinion Research Center. For this example, our students wanted to see if their opinions on social issues were similar to those of the national sample.

The students knew they were not a representative sample, even of college students, but this questionnaire is an interesting way to learn how to create a new data file. They decided to use the following questions[1]:

·        What is your age?

·        Are you male or female?

·        What is your religious preference?

·        Generally speaking, in politics do you consider yourself as conservative, liberal, middle of the road?

·        What kind of marriage do you think is the more satisfying way of life: one where the husband provides for the family and the wife takes care of the house and children or one where both the husband and wife have jobs and both take care of the house and children?

·        Do you think it should be possible for a pregnant woman to obtain a legal abortion:

If there is a strong chance of a serious defect in the baby? [ABDEFECT[2]]

If she is married and does not want any more children? [ABNOMORE]

If the woman’s own health is seriously endangered by pregnancy? [ABHLTH]

If the family has a very low income and cannot afford any more children? [ABPOOR]

If she became pregnant as a result of rape? [ABRAPE]

If she is not married and does not want to marry the man? [ABSINGLE]

If the woman wants it for any reason [ABANY]

 

Basic Steps in Creating a Data File

 

It is best to start a data file with some careful planning.

1.   First we will assign each respondent an identification number. This is not so we can identify individuals, but so we can keep track of each case when we go back to check the accuracy of the data entering. Each question is a variable in our data set. It needs a variable name that is simple but expresses something about the data. (SPSS limits variable names 64 characters or fewer). They can use numbers or letters but not spaces and very few special characters, so don’t use any odd symbols.) AGE and SEX would be good variable names for the first two questions.[3] For the questions on abortion, we decided to use the first three characters of the variable names used by the General Social Survey. We used MG for the preferred type of marriage and called political orientation CONLIB. Each variable name can be given an extended variable label that gives more detail. (Extended variable labels can use spaces or special characters.) For example, CONLIB could have a variable label that said Conservative-Liberal.

2.   After we have given each variable a name and label, we give each possible response to the question a code called a value label that is often the number corresponding to the order of the answers. (We could use another system, but this is the easiest because SPSS works best with numeric codes to represent the data.) For example, SEX could use 1 for male and 2 for female; CONLIB could use 1 for conservative, 2 for liberal, and 3 for middle of the road. Values would then be given value labels such as Male, Female, Conservative, Liberal, Middle of the Road.

3.    Sometimes respondents do not answer a question, give more than one answer, or do something else that would make their answers unusable. In our example, respondent #2 marked both yes and no on the last question, respondent #3 wrote in none on question 4, and respondent #13 didn’t answer the marriage question. We can assign these missing value codes so they don’t distort the analysis. Often 9 is used to indicate missing data or 99 if it is a two-digit value.

Everything must be planned carefully before entering the data into SPSS. It is useful to put the data in a matrix like Table 2.1 before entering it into the SPSS Data Editor. For this exercise, we will use only the first four questions and five respondents. (The complete matrix is Appendix 2.B at the end of this chapter.)

Table 2-1. Matrix for Data-entry Exercise

ID

AGE

SEX

REL

CONLIB

01

20

1

4

2

02

24

2

5

2

03

21

2

2

9

04

24

2

5

3

05

26

2

4

2

 

 

Getting Started in SPSS

 

To create the data file in SPSS, open SPSS (probably by clicking on the SPSS icon on the desktop. When it says, What would you like to do?, choose Type in data and click OK. (See Figure 2-1.)

This opens a matrix similar to a spreadsheet such as Excel or the matrix we just worked on. The rows will be the cases (the respondents) and the columns will be the variables (answers to the questions). So, the upper-left cell will contain the identification number for the first case and the cells to the right will be data about that case. The SPSS Data Editor has tabs in the lower-left that let you work with your data in two ways. Variable View, is used to set up the data—names, variable labels, value labels, etc. The other tab, Data View is used to actually enter the data. SPSS probably opened in the Data View mode, if not, click the Data View tab at the bottom left of the SPSS screen now. (See Figure 2-2.)

 

Entering Variable and Value Names and Labels

 

1. In Data View, we will use the first column for the respondents’ ID numbers, so type 001 into the first cell. (See Figure 2-3.) [4]

 

2. We will use the Variable View tab to assign variable names and longer variable labels plus value names and labels that will make it easier to use the data in tables and charts. Click Variable View now and click the VAR00001 in the top left column. Type in ID. (We use all capital letters for variable names to differentiate them from other terms.) Press Enter and VAR00001 changes to our variable name, ID. Go back to Data View and notice that the first column is now titled ID. (See Figure 2-4.)

 

3. The second variable will be the student’s age, so change back with the Variable View and type AGE under name in the second row. SPSS makes some assumptions about data that might not be appropriate. Notice that it uses two decimal points even if the values are integers. To hide these inappropriate decimals, select the decimals column by clicking the heading and type 0 instead of 2. Remember to do this whenever a numeral doesn’t really refer to a numerical value. Since the short variable name usually doesn’t give enough information about the variable, we want a longer or clearer variable label for our analysis. This one would be simple. To add a variable label to AGE, just tab over to the label column and type in Age. (See Figure 2-5.) Although, it may not seem necessary to have a variable label for age, but for most variables a longer variable label is very useful.

 

4. Sometimes respondents don’t answer a question or give two answers or do something else so the data can’t be used in the analysis. To have accurate results, missing or invalid data need to be indicated. Still in Variable View, tab over to missing and click the gray box. This dialog box lets you specify up to three distinct missing values. For our data, click discrete and type 99 in the first text box and leave the other two empty. Then Click OK. Now if someone doesn’t answer a question, it will be marked as missing. Click Add to save this. (See Figure 2-6.)

 

5. The third variable will be the sex of the respondent, so type SEX in the third row under name and Sex as the variable label. Since we’re going to use the code 1 for males and 2 for females, we’re going to need value names in words for each category. Tab over to the cell under values and click the little gray box to get the Value Labels menu. Type a 1 in the value box and then Male in the value label box and click Add. Then, click the gray box again, type a 2 in the value space, and type Female in the value label space. Click Add and then click OK to save these. Now, SPSS knows that 1 and 2 in SEX are really male and female respectively. (See Figure 2-7.)

 

6. For this exercise, we are also using religion and conservative-liberal as variables. Add those variables in rows 4 and 5. Give each a variable label—REL gets Religion and CONLIB gets something like Conservative-Liberal. Then add value names and labels. Notice that REL has five possibilities—Protestant, Catholic, Jewish, other, and no religion. Go ahead and work out the variable labels, as well as value names and value labels. Make arrangements for missing values just as you did in #4 above. (You can refer to the Appendix 2-A Codebook for Student Questionnaire at the end of this chapter.) Remember to type variable labels, value names, and value labels exactly the way you would want them in a table when you do the analysis—often this is with the first letter of each important word capitalized. (Your data file might look like Figure 2-8.)

 

Entering the Data

 

7. Enter the codes for each variable using Data View[5]. Then check the accuracy of your data entry by scanning down each column looking for codes that would be impossible. For example, sex can have only three possibilities since male is 1, female is 2, and missing information is 9, so a 5 or 6 would be a mistake. Then check everything carefully. The best check is to have one person read the codes while another checks the entries on Data View.


Student Survey Questionnaire

(1) What is your age? ________

(2) Are you ____ male or ___ female?

(3) What is your religious preference?

___ Protestant ___Catholic ___ Jewish ___ Some other religion ___No religion

(4) Generally speaking, in politics, do you consider yourself as

___conservative, ___ liberal, __ middle of the road, or

(5) What kind of marriage do you think is the more satisfying way of life?

___ One where the husband provides for the family and the wife takes care of the house and children

___ One where both the husband and wife have jobs and both take care of the house and children

Do you think it should be possible for a pregnant woman to obtain a legal abortion?

(6) If there is a strong chance of serious defect in the baby? __Yes __ No ___Don’t Know

(7) If she is married and does not want any more children? __Yes __ No ___Don’t Know

(8) If the woman's own health is seriously endangered by pregnancy?

__Yes __ No ___Don’t Know

(9) If the family has a very low income and cannot afford any more children?

__Yes __ No ___Don't Know

(10) If she became pregnant as a result of rape? __Yes __ No ___Don’t Know

(11) If she is not married and does not want to marry the man? __Yes __No __ Don’t Know

(12) If the woman wants it for any reason __Yes __ No ___Don’t Know

Codebook for Student Questionnaire

Missing Values

9 or 99

Age

Age at last birthday

 

Sex

1 = male, 2 = female

Religious Preference    

1 = Protestant, 2 = Catholic, 3 = Jewish, 4 = Other, 5 = None

Political Orientation

1 = Conservative, 2 = Liberal, 3 = Middle of the road

Preferred Marriage

1 = Traditional, 2 = Shared

 

Abortion if Birth Defect

1= Yes, 2 = No, 3 = Don't Know

 

Abortion if No More Children

1= Yes, 2 = No, 3 = Don't Know

Abortion if Health Risk

1= Yes, 2 = No, 3 = Don't Know

Abortion if Poor

1= Yes, 2 = No, 3 = Don't Know

Abortion if Rape:

1= Yes, 2 = No, 3 = Don't Know

Abortion if Not Married:

1= Yes, 2 = No, 3 = Don't Know

Abortion For Any Reason:

1= Yes, 2 = No, 3 = Don't Know

 

Planning Matrix for Data-entry Exercise


AGE

SEX

REL

C-L

MG

ABD

ABN

ABH

ABP

ABR

ABS

ABA

01

20

1

4

2

2

2

2

1

3

1

2

2

02

24

2

5

2

2

1

1

1

1

1

1

9

03

21

2

2

9

2

2

2

2

2

2

2

2

04

24

2

5

3

2

1

1

1

1

1

1

1

05

26

2

4

2

2

1

1

1

1

1

1

1

06

28

2

2

2

2

2

2

1

2

1

2

2

07

23

1

1

2

2

1

2

1

1

1

2

2

08

22

2

4

3

1

1

1

1

1

1

1

1

09

22

1

5

2

2

1

1

1

1

1

1

1

10

22

2

4

4

2

1

1

1

1

1

1

1

11

23

1

2

2

1

2

2

1

2

1

2

3

12

24

2

2

3

2

1

1

1

1

1

1

2

13

51

2

1

2

9

1

1

1

1

1

1

1

14

22

2

2

3

2

1

1

1

1

1

1

1

15

21

2

4

3

2

1

1

1

1

1

1

1

16

37

1

1

3

2

1

2

1

2

1

2

2

17

22

2

4

2

2

1

1

1

1

1

2

2

18

22

2

3

3

2

1

2

1

2

1

2

2

19

22

2

4

3

2

3

2

1

2

1

1

1

20

30

2

5

2

2

1

1

1

1

1

1

1

21

25

2

5

2

2

1

1

1

1

1

1

1

22

23

1

2

2

2

1

1

1

1

1

1

1

23

21

1

1

2

1

1

1

2

1

2

1

1

 


Chapter Two Exercises

Exercise 2-1. Clients of Friendly Visitor Service.

At California State University, Fresno, the Friendly Visitors Service hires college students to do in-home care for elderly people so they can remain independent and stay in their homes as long as possible.  The students do cleaning, yard work, shopping, etc.  The staff begins by interviewing clients in their homes and assessing their need for services. The following information is used to match the seniors with the students who want employment:

·        Age:  Age at last birthday

·        Sex:   Male or Female

·        Lives alone:  Yes or No

·        Low income:  Yes = Eligible for Supplemental Security Income (SSI)

·        Need for assistance with the activities of daily living (ADL): Bathing, Dressing, Toileting, Transferring in/out of bed, Eating

·        Total number of ADLs needing help:

·        Need for assistance with the instrumental activities of daily living (IADL):  Using telephone, Shopping, Preparing food, Light housework, Heavy housework, Finances

·        Total Number IADLs needing help:

To keep track of the needs of potential clients, the program could create a data file and use it in SPSS. (Data from one month’s new applications are provided below. For this example, we’ll just use the count of the number of activities for which the seniors need help, but note that they could include the yes/no responses for each of the activities of daily living.)

Sample Data Set: Friendly Visitor Service Clients

ID                  AGE            SEX            ALONE       INCOME      #ADL          #IADL

001               74                M                 N                  N                   0                  4

002               66                M                 N                  N                   4                  6

003               81                M                 N                  N                   2                  5

004               76                F                  N                  N                   0                  4

005               74                M                 N                  N                   1                  5

006               69                F                  N                  Y                   0                  4

007               79                F                  Y                   N                   0                  4

008               80                M                 N                  Y                   3                  6

009               89                M                 N                  N                   3                  5

010               60                F                  Y                   N                   2                  6

011               88                F                  Y                   N                   0                  3

012               82                F                  Y                   N                   2                  4

013               79                F                  Y                   N                   1                  4

014               77                M                 N                  N                   3                  6

015               62                M                 Y                   N                   1                  4

016               83                M                 N                  N                   4                  6

017               80                F                  Y                   N                   0                  2

018               85                F                  N                  N                   1                  4

019               66                F                  Y                   N                   1                  3

020               84                M                 N                  N                   4                  6

021               74                F                  N                  N                   4                  4

022               74                M                 N                  N                   0                  2

023               74                F                  Y                   N                   0                  5

024               92                M                 N                  N                   3                  6

025               66                F                  N                  N                   2                  6

 

Exercise 2-2. Age at Death from Newspaper Obituaries

 

An interesting source of data for student practice with data analysis using SPSS is the death notices in local newspapers. Although big city newspapers publish obituaries only on the rich and famous, many local newspapers provide information on almost everyone who dies in the community. (See, for example, see The Fresno Bee--www.fresnobee.com/obituaries--which publishes information provided by funeral homes for most deaths in the community as well as more detailed obituaries provided by some families). From these death notices, you could set up a data file with the age and sex of each person who died at a particular time (for example, the first month in the term). The age or birthday is usually given and you can infer sex from names or pronouns. This could be used for analysis with SPSS, for example, frequency and percent distributions, various charts, and descriptive statistics in Chapter 4; cross tabulations in Chapter 5; and/or comparison of means in Chapter 6.


 

 



[1] A copy of their questionnaire is included as Appendix 2-A at the end of this chapter.

[2] This, ABDEFECT, is the variable name of this question in the General Social Survey.

[3] For this book, we use all caps for variable names.

[4] It is wise to save your computer work early and often. You might want to save this file now. Choose Save under File and call it something like Data Entry Exercise 1. Notice that SPSS saves it in the SPSS folder as a .sav file. This means it contains the data in the format for SPSS analysis.

[5] Some people, especially those who are used to working with spreadsheets, like to enter all the data in Data View before they set up the variable names, etc. In this example, we’ll set up the variable names, etc. before we enter any data. (You’ll have to figure out what works best for you.) You can also enter data from a spreadsheet like Excel.