Social Sciences Research and Instructional Council
Teaching Resources Depository

SPSS for Windows, Version 7.5: A Brief Tutorial
(Hypertext Version)

Chapter Two: Creating a Data File

© The Authors, 1998; Last modified 15 August 1998
This section explains how to set up a file with new data. For this example, our class wanted to see if their opinions on social issues were similar to the national sample polled by the National Opinion Research Center for the General Social Survey. (For more detail on the General Social Survey, see Davis and Smith 1996.) They knew they were not a representative sample, even of college students, but it was an interesting way to learn how to create a new data file. They decided to use the following questions: (The questionnaire and codebook are in Appendix A.) To start the data file, we need to assign each respondent an identification number so we can keep track of each case when we check the accuracy of the data entering. For each question (variable), we need a variable name that is simple but expresses the main idea of the variable in some way. The variable names must be eight characters or less starting with a letter. They can be numbers and/or letters but not spaces and only a few special characters. AGE and SEX are easy variable names for the first two questions.

For the questions on abortion, we decided to use the first three characters of the variable names used by the General Social Survey. We called preferred type of marriage MG and political orientation CL. Each variable name could be given an extended variable label that will give more detail, and it could include spaces or special characters. For example, CL could have a variable label that meant Conservative-Liberal.

After we have given each variable a name, we give each possible response a code called a value label that is often the number corresponding to the order of the answers. (Although it could use letters, using numbers only will avoid some possible problems in statistical analysis.) For example, SEX could be coded 1 for male and 2 for female or M for male and F for female; political orientation could be 1 for conservative, 2 for liberal, and 3 for middle of the road. These also could be given extended value labels such as Male, Female, Conservative, Liberal, Middle of the Road.

Sometimes respondents don’t answer a question, give more than one answer, or do something else that makes their answers unusable. For example, respondent 02 marked both yes and no on the last question, respondent 03 wrote none on question 4 on political orientation, and respondent 13 did not answer the marriage question. We often use 9 to code this "missing data" or 99 if it is a two-digit value. Note that this would cause problems if 9 or 99 were real codes, for example, if 9 was an actual response to a question or if age at last birthday included some ninety-nine-year-olds.

Often it is a good idea to plan all this and to put the data in a matrix like Table 2.1 before entering them into the computer.

Table 2.1. Sample Data Set: Questionnaire Responses
 

ID AGE SEX REL CL MG ABD ABN ABH ABP ABR ABS ABA
01 20 1 4 2 2 2 2 1 3 1 2 2
02 24 2 5 2 2 1 1 1 1 1 1 9
03 21 2 2 9 2 2 2 2 2 2 2 2
04 24 2 5 3 2 1 1 1 1 1 1 1
05 26 2 4 2 2 1 1 1 1 1 1 1
06 28 2 2 2 2 2 2 1 2 1 2 2
07 23 1 1 2 2 1 2 1 1 1 2 2
08 22 2 4 3 1 1 1 1 1 1 1 1
09 22 1 5 2 2 1 1 1 1 1 1 1
10 22 2 4 4 2 1 1 1 1 1 1 1
11 23 1 2 2 1 2 2 1 2 1 2 3
12 24 2 2 3 2 1 1 1 1 1 1 2
13 51 2 1 2 9 1 1 1 1 1 1 1
14 22 2 2 3 2 1 1 1 1 1 1 1
15 21 2 4 3 2 1 1 1 1 1 1 1
16 37 1 1 3 2 1 2 1 2 1 2 2
17 22 2 4 2 2 1 1 1 1 1 2 2
18 22 2 3 3 2 1 2 1 2 1 2 2
19 22 2 4 3 2 3 2 1 2 1 1 1
20 30 2 5 2 2 1 1 1 1 1 1 1
21 25 2 5 2 2 1 1 1 1 1 1 1
22 23 1 2 2 2 1 1 1 1 1 1 1
23 1 4 1 2 1 1 1 2 1 2 1 1

To start creating the new data file, get into SPSS by clicking on the "SPSS" icon. [Find out how to get to this place on your computer before you continue.] This opens up the Data Editor. It is set up like a spreadsheet with the upper-left cell outlined. See Figure 2-1.

Figure 2-1

The rows are the cases, e.g., the respondents or the questionnaires, and the columns are for the variables, e.g., the questions. The upper-left cell will usually contain the identification number for the first case and the cells across that row will contain data about that case. To replace the default data definitions with your own or to edit later, double click on the "var" on top of the column to get the Define Variables dialog box. See Figure 2-2.

Figure 2-2

The first variable will be the identification number for the first case. Type in a variable name of eight characters or less, e.g., id. Variable names are not case sensitive so ID and Id and id are the same. (Actually, SPSS has reserved a few words that cannot be used, e.g., ALL and AND--see SPSS Inc., 1997.) Click on "Labels" to add descriptive labels to each variable. See Figure 2-3. These can be up to 120 characters but are usually less. They are case sensitive, so type them exactly as you want them. Use brief, but descriptive, phrases that will be easy to recognize later.

After naming and labeling the variable, give each possible response a value name and label the values in a way that would be useful. See Figure 2-3.

Figure 2-3

For example, SEX would be 1 for male, so type 1 and then click "Add", type 2 for female and then click " Add". (When you want to modify value labels, click " Change", and click "Remove" if you want to delete one.) For the variable with missing data, open the Define Missing Values dialog box by clicking on "Missing Values" in the Define Variables dialog box. See Figure 2-4.

Figure 2-4

The default is no missing values. You can enter up to three missing values for a variable, so type in 9 and click on "Continue" to go back to the Define Variables dialog box. Since we want to use whole numbers for our data, click on "Type" and change "Decimal Places" to O and click on "Continue"

Once the variable names and labels and the value names and labels are set up, you can enter the data into the matrix on the screen. Give your new data a file name and save it before you exit or go on to using the file. On the first screen, you can save by clicking on "File" and using "Save As" to enter your file name. After you have named the file and saved the data the first time, you can save changes with "File" and " Save" or by clicking the little disk icon near the upper-left corner of the screen. It is important to save data very often.

Check the accuracy of your data entry by skimming down each column for codes that are impossible with these value labels. For example, SEX can have only three possibilities since males are 1, females are 2, and missing information is 9. You could do this on the screen or on a frequency distribution from SPSS (see, Ch. 4, in this book). Next, check the accuracy of the coding by having one person read the codes while another checks the entries in SPSS. These instructions are very simple. If you want more detail see SPSS Inc. (1997).

Templates: Using the Same Value Labels Over and Over.

What if you have twenty variables that would use the same value labels? Using templates you can enter the value labels once and then use the value labels for as many variables as you wish. The template function allows you to create a "master" variable and copy the characteristics of the master variable to other new or old variables. First move the mouse to the menu bar at the top of the screen and click on "Data." Next, click on the choice " Templates." . You should get something that looks like Figure 2-5.

Figure 2-5

Now click on the "Define" button, then highlight the "Name" section under "Template Description." It should look something like Figure 2-6.

Figure 2-6

I am going to make the template using a modified Likert scale so I will call this template "AGREE, " which I will type into the name section.

Now we want to define the characteristics of the template we are calling AGREE. We are really only going to take care of the " Type" and the "Value Labels" because the other two items we could define are OK the way they are. So click on "Type " and then make sure "numeric" is selected, then click on "Continue" (see Figure 2-7).

Figure 2-7

Next, click on "Value Labels" and you should see a screen that looks like Figure 2-8, which is just like what you saw earlier in the chapter when we showed you how to label values.

Figure 2-8

Here I will type in the "1" for the "Value," and then tab down to Value Labels and type "Strongly Agree." Then I press "Add." (Be sure to do this. It’s an easy step to forget. Pressing the enter key gets you something else. If you press the enter key by mistake, press cancel.) Figure 2-9 shows you a screen in the process of having the labels typed.

Figure 2-9

Now, do this again, only the value you should type in is "2" and the label is "Agree". Again, be sure to press the "Add" button. Continue to associate the value numbers with the value labels until you are finished. When you are finished, click on "Continue". The next step is to associate our template "Agree" with the variables. In our case, we have checked all four items under the heading "Apply" (type, value labels, missing vlaues, column format) even though we have only worked with the first two (the default for the others is OK). Now press "Add" and then on "OK."

Now we need to apply this to our variables. First, highlight the columns where you want the new variables. See Figure 2-10 for an example of what your screen should look like now.

Figure 2-10

Next, click on "Data" on the menu bar, then " Template." Make sure the template name is "AGREE, " then click on "OK." This will have transferred the information we saved in the Agree Template into those variables and the variables will also have been given the default variable names of VAR0001, VAR0002, etc. You can now click on the columns and rename the variables as you were shown earlier.

One last thing: this works for variables that have not been named as well as those that have already been named.

Chapter Two Exercises

At California State University, Fresno, the Friendly Visitor Service hires college students to do in-home care for elderly people so they can remain independent and stay in their homes as long as possible. The students do cleaning, yard work, shopping, etc. The staff begins by interviewing clients in their homes and assessing their need for services. The following information about clients is used to match the seniors with the students who want employment:

To keep track of the needs of potential clients, the program could create a data set for SPSS beginning with information from the applications in early fall, 1995 from Table 2.2. For this example, we just used the number of activities for which the seniors need help, but we could have included the yes/no responses for each of the activities of daily living. (Code the values numerically using 1 for male and 2 for female for SEX, etc.)

Table 2.2 Sample Data Set: Friendly Visitor Service Clients

ID BIRTH SEX ALONE LOW INC # ADL #IADL
001 21 M N N 0 4
002 29 M N N 4 6
003 14 M N N 2 5
004 19 F N N 0 4
005 21 M N N 1 5
006 26 F N Y 0 4
007 16 F Y N 0 4
008 15 M N Y 3 6
009 06 M N N 3 5
010 03 F Y N 2 6
011 07 F Y N 0 3
012 13 F Y N 2 4
013 16 F Y N 1 4
014 18 M N N 3 6
015 33 M Y N 1 4
016 12 M N N 4 6
017 15 F Y N 0 2
018 10 F N N 1 4
019 29 F Y N 1 3
020 11 M N N 4 6
021 21 F N N 4 4
022 21 M N N 0 2
023 21 F Y N 0 5
024 03 M N N 3 6
025 29 F N N 2 6
Back
Top
Previous Chapter
SPSS Book Table of Contents
Next Chapter
Home