STF1 contains 36 tables of population information and 45 tables of housing information. Each of these 81 tables is repeated for each geographic unit within a file. STF2 is much like STF1 except that it contains more table categories. Furthermore, it contains a and b record types for both person and housing data. The a record type is the tabulation for all the population while the b record is a tabulation for a particular ethnic group. One form of STF2 contains tabulations for 9 ethnic groups while another contains tabulations for up to 28 ethnic groups. STF2 contains 13 a-type person tables, 28 a-type housing tables, 27 b-type person tables, and 27 b-type housing tables.
STF3 contains 170 person tables and 92 housing tables. Like STF2, STF4 contains a and b record types which make it by far the largest and most detailed of the Summary Tape Files. It contains 122 a-type person tables, 76 a-type housing tables, 161 b-type person tables, and 74 b-type housing tables. One form of STF4 contains a and b tabulations for 9 ethnic groups while another contains tabulations of up to 48 groups. The table below indicates the relative sizes of the four Summary Tape Files by the number of cells of data for each geographic unit.
The table below indicates the smallest level of geography contained in each of the data files.
STF1a Block Group
STF1c County and Place > 10,000
STF1d Congressional District
STF2b County Subdivision and Place > 1000
STF2c County and Place > 10,000
STF3a Block Group
STF3b ZIP Code
STF3c County Subdivision and Place > 10,000
STF3d Congressional District
STF4b County Subdivision and Place > 2500
STF4c County Subdivision > 10,000
In Appendix D are the Summary Level Codes and the Geographic Component Codes for the STF3 data files. Note, for example, that the State code has potentially five records. If only the state total record is desired then the Geographic Component code must be set to 00.
Appendix D indicates the geographical hierarchy followed to reach the final geographic unit. Of particular importance is the hierarchy used to reach census tracts and block groups. Many tract boundaries are split by incorporated place boundaries. This means that selecting Summary Level Code 080 or 090 will result in many more geographic units than selecting Summary Level Code 140 or 150. The latter numbers are for complete tracts while the former codes might be useful when studying tracts only within a specific city. Accidentally selecting the split-tracts will cause considerable difficulty with most mapping programs since they typically use only complete tract boundary files. Also statistical computations may be affected because of numerous zero values in parts of the split tracts.
Also there are differences in some of the Summary Level Codes when accessing STF4. Complete tracts in STF4a have the Summary Level Code of 141, places over 2500 persons in STF4b have the Summary Level Code of 163, and places over 10,000 in STF4c have the Summary Level Code of 161. Summary Level Codes are found on page 6-1 of U.S. Census Summary Tape File Codebooks. A discussion of census geography can be found in Appendix A of the same documentation.
Coding Geographic Units - FIPS Codes
For various types of features within a state or county FIPS code numbers increase according to the name of a location in alphabetical order. For example, Alameda County in California is 001 and Amador County is 003. A census tract FIPS code consists of six digits. The first four identify a tract and the last two serve as a suffix. Tracts that are split in a later census because of increased population would have a suffix of .01 and .02 such as 1101.01 and 1101.02. In some cases split tracts have been split a second time. The suffix may also have values from .80 to .98 indicating that the tract was created by modifying an existing boundary. A value of .99 indicates persons aboard a ship at the time the census was taken.
In most cases a FIPS code must be specified to subset a desired geographic unit or units from a census file. Occasionally more than one FIPS code must be used to create a desired data set. For example, to get the tracts of Los Angeles County, one would request a Summary Level Code of 140 and a county FIPS Code of 037. To get the data for the state of California one might specify a Summary Level Code of 040, a Geographic Component Code of 00, and a state FIPS Code of 06.
For most mapping programs FIPS codes from the census must be joined to create a matching value for the boundary codes in the program's mapping data. Thus, if one wanted to map tracts in Los Angeles County, one would have to join the state, county, and tract FIPS codes to create an identifying label. For example, 060371101.02 would specify tract 1101.02 in county 037 in state 06. The number for the census file must match the number in the mapping file exactly or the census data will not load into the mapping program. One convenient approach is to export a list of geographic unit labels from the mapping program to check on the format needed for the census data. Often this list can be pasted directly into the census data table although some record checking is usually required to account for unlabeled areas.
The Census Bureau publishes a large list of FIPS codes in its Geographic Identification Coding Scheme publication. Each census data record contains an area name (ANPSADPI) so that names of locations can be directly located in the census files.
care is needed in comparing data from the 1980 and 1990 census at the tract
level, not only because of split or aggregated tracts, but because some boundaries
have been shifted contrary to policy. The California Department of Finance
maintains Tract Equivalency Files that can be used to locate where changes
Many spreadsheet programs seem incapable of dealing with the structure of PUMS. For this reason a modified structure has been created for the files stored at the Social Sciences Database Archive. In the SSDBA files, the household record has been appended to each person record. This has the effect of greatly enlarging the database while making it easier to work with. It also means that a user must restrict tabulations to include only heads of household records when tabulating housing data. Otherwise calculations will be based on duplicate housing records appended to each person in the household.
One also needs to be particularly conscious of the population serving as the universe when trying to replicate aggregations used by the U.S. Census. For example, employment data should be tabulated only for persons over age 15 who are employed as civilians. One could also limit the tabulations to those employed full-time.
populations can be created from the PUMS files, some care is needed in dealing
with the significance of very small counts especially when PUMAs are being
used. Chapter 3 of the PUMS Codebook suggests procedures in dealing with this
The Social Sciences Database Archive contains a number of digital files from the U.S. Census Bureau. These include the PUMS and Summary Tape Files for 1980 and 1990 as well as County-City Databooks and Current Population Estimates. Not all STF and PUMS files are available through the SSDBA for all states. Data files are in SPSS format, and in most cases additional files provide dictionaries and codebooks necessary to extract information from the files. A few of the files have spss programs for reading the data, and these may be expanded to carry out various procedures. See Appendix B for a description of the available data sets. The following table describes the current location and availability of the 1990 STF and PUMS resources in the SSDBA.
1990 STF and PUMS Resources in the SSDBA
This table contains the descriptions of the locations of various census resources within the SSDBA. There are four types of files which are indicated with the following codes:
Data: the file
containing the database. All are SPSS system files. "By request" means that
the data are not directly accessible.
Cbk: the codebook describing the database
Dic: a data dictionary for the database
Prog:a SSDBA program that will describe the contents of the database. SPSS statistics commands can be appended to it.
STF1a CA Data:
STF1b CA Data:
STF1c U.S. Data:
STF2a U.S. Data: By request
STF3a Los Angeles
and Orange Counties only
Other CA Counties Data: /usr/ssdba/ssdba38/c90stf3a-ca2.sys
STF3b ZIPS beginning
8 or 9
STF3c NE U.S.
Rest of U.S. Data: /usr/ssdba/ssdba47/c90stf3c-b.sys
B recs, All persons Data: /usr/ssdba/ssdba50/c90stf4a-t1.sys
B recs, White Data: /usr/ssdba/ssdba48/c90stf4a-t2.sys
B recs, Black Data: /usr/ssdba/ssdba65/c90stf4a-t3.sys
B recs, Am Inds Data: /usr/ssdba/ssdba61/c90stf4a-t4.sys
B recs, Asian Data: /usr/ssdba/ssdba49/c90stf4a-t5.sys
B recs, Other Race Data: /usr/ssdba/ssdba51/c90stf4a-t6.sys
B recs, Hispanic Data: /usr/ssdba/ssdba61/c90stf4a-t7.sys
B recs, NonHisp Wh Data: /usr/ssdba/ssdba51/c90stf4a-t8.sys
B recs, NonHisp Bl Data: /usr/ssdba/ssdba53/c90stf4a-t9.sys
B recs, NonHisp Oth Data: /usr/ssdba/ssdba49/c90stf4a-t10.sys
A recs, All persons Data: /usr/ssdba/ssdba53/c90stf4a-t11.sys
Cbk: None: See Census Docs.
Person & Housing Recs Data: /c/census/c90pums-p5.sys
Housing Recs only Data: /c/census/c90pums-hr
STF and PUMS SPSS Programs in the SSDBA
The programs below are located in the following directory:
In order to extract variables from one of the census files you need to know the variable names. One easy way to get these is to execute a DISPLAY DICTIONARY command. This command will list out the contents and formats of a database. Note that the variable names have slightly different formats between the various Summary Tape Files in the SSDBA and so such a listing is necessary.
The following program will create a dictionary of a STF4b file on the unix version of SPSS at the SSDBA.
The following two programs were used to read the SSDBA PUMS file and create some crosstabulations. Note that the resulting table is for all persons who were employed, not just civilian employed. A non-Hispanic white category was created by using the Hispanic and Race variables. This program could be copied and pasted into the pc-version of spss.
The following program reads the 5% PUMS file, computes several basic summary variables for the state of California, and produces frequencies of the values for selected variables.
get file '/c/census/c90pums-p5.sys'
/keep=PUMA AGE RACE HISPANIC ANCSTRY1 OCCUP CLASS ENGLISH POB YEARSCH CITIZEN
RELAT1 PERSONS TENURE/.
* Select PUMAS in the five-county Southern California area.
select if (PUMA ge 4200 and PUMA le 4808 or PUMA ge 5200 and PUMA le 7207).
recode RACE (2=2) (4,5,301 thru 327=3) (6=4) (7=5) (8=6) (9=7) (10=8) (11=9) (12=10) (13=11) (15=12) (16=13) (19=14) (22=15) (25=16) (26=17) into ETHNIC.
compute NH = 0.
if (HISPANIC eq 0 or HISPANIC = 199) NH = 1.
if (RACE eq 1 and NH eq 1) ETHNIC = 1.
recode HISPANIC (1,210 thru 220=1) (2,261=2) (3,271=3) (221,225,227,228,229=4) (222=5) (223=6) (224=7) (226=8) (231 thru 249=9) into HISP.
recode ANCSTRY1 (15,22=1) (148 thru 150=2) (302=3) (308=4) (360=5) (416=6) (419=7) (431=8) (434=9) (400 thru 415,417,418,421 thru 430,435 thru 481,490 thru 499=10) (522,523=11) (553 thru 558=12) (800 thru 802=13) into ANC.
value labels ANC 1 'Eng' 2 'Rus' 3 'Belz' 4 'Jam' 5 'Brz' 6 'Ira' 7 'Isr' 8 'Arm' 9 'Tur' 10 'Arab' 11 'Eth' 12 'Nig' 13 'Aus'.
value labels ETHNIC 1 'NhW' 2 'Blk' 3 'InAlEs' 4 'Chi' 5 'Taiw' 6 'Fil' 7 'Jap' 8 'AsInd' 9 'Kor' 10 'Vie' 11 'Cam' 12 'Lao' 13 'Tha' 14 'Indo' 15 'Pak' 16 'Haw' 17 'Sam'.
value labels HISP 1 'Mex' 2 'PR' 3 'Cub' 4 'CenAm' 5 'Gua' 6 'Hon' 7 'Nic' 8 'Sal' 9 'SoAm'.
* Tabulate various stats for 5-co area by ethnic groups.
if (AGE ge 18) VAR01 = 1.
if (AGE ge 25) VAR02 = 1.
if ((ENGLISH eq 0 or ENGLISH eq 1) and AGE ge 18) VAR03 = 1.
if ((YEARSCH ge 14 and YEARSCH le 17) and AGE ge 25) VAR04 = 1.
if (CITIZEN eq 3 or CITIZEN eq 4) FB = 1.
if (FB eq 1 and AGE ge 25) VAR05 = 1.
if (OCCUP ge 3 and OCCUP le 199) OC = 1.
if (CLASS ge 1 and CLASS le 8) CL = 1.
if (OC eq 1 and CL eq 1 and AGE ge 25) VAR07 = 1.
if (PERSONS ge 1 and RELAT1 eq 0) VAR08 = 1.
if ((TENURE eq 1 or TENURE eq 2) and RELAT1 eq 0) VAR09 = 1.
if ((TENURE ge 1 and TENURE le 4) and RELAT1 eq 0) VAR10 = 1.
variable labels VAR01 'Pers 18+' VAR02 'Pers 25+' VAR03 'SpkEngO/VW/18' VAR04 'ColEd/25' VAR05 'ForB/25' VAR07 'AdmExProf/25' VAR08 'Hsehldrs' VAR09 'OwnOccHU' VAR10 'OccHU' FB 'ForBorn'.
value labels VAR01 1 'Pers 18+'/ VAR02 1 'Pers 25+'/ VAR03 1 'SpkEng'/ VAR04 1 'ColEduc'/ VAR05 1 'ForBor25'/ VAR07 1 'AdmPrOcc'/ VAR08 1 'HsHlds'/ VAR09 1 'OwnOcH'/ VAR10 1 'OccHU'/ FB 1 'ForBorn'/ OC 1 'ProfOcc'/ CL 1 'Worker'.
frequencies variables=ETHNIC (1,17) HISP (1,9) ANC (1,13) CLASS (0,9) CITIZEN (0,4) FB (1,1) VAR01 to VAR10 (1,1)/ barchart/ format=condense.
Because the PUMS file is quite large and because some time is required to access STF databases, it is quite possible that a program will take from 30 to 45 minutes to finish execution. An alternative to waiting at the terminal is to submit your spss file as a batch job. This can be done be entering the following statement at the unix prompt.
The SPSS program on Venus serves as a good editor for entering and updating programs. A number of basic edit functions are available through various escape key -number strokes. Esc-1 can be used to view directory contents. Esc-2 will allow you to switch between an output and input screen. Esc-3 inputs an existing file, Esc-9 saves an edited file, and Esc-0 runs a program from the editor and quits the editor. Two escape strokes will terminate a command selection.