SSRIC Teaching Resources Depository
Exploring the US Census
Eugene Turner, California State University, Northridge

Exercise 1 -- The Public-Use Microdata Sample (PUMS)

© The Author, 1998; Last Modified 17 August 1998
These exercises accompany the Exploring the Census module and are intended to provide the user the opportunity to access digital information to answer basic kinds of demographic questions. Six data sets have been extracted from various census data files. These files are for different geographic units and contain somewhat different variables. Because codebooks are necessary to understand the variable names and codes a number of appendices have been included so that these may be understood. These appendices may prove particularly useful when it becomes necessary to extract data from the actual Summary Tape Files or the PUMS files. Also, several codebooks, dictionaries, and sample SPSS programs in digital form are provided in the directory for this module. Users may want to copy these and modify them when working with the actual census data files.

A great deal of census information is now becoming available over the Internet and this will prove an increasingly convenient source. The SSDBA is gradually moving its access to web-based software and soon its census holdings may be available through a web browser. Currently data may be found at:

Names of variables are generally the same as provided by the U.S. Census Bureau on CD-ROM.

In the following exercises occasional references are made to specific SPSS procedures in bold type.

Ex. 1 Ranking and Mapping Places

Database:  USCIsp.por

In this exercise you will look at a few of the basic characteristics of the population of American cities with over 10,000 persons. The data are extracts from the STF1c file. Note that the module provides values on California and the United States which may be compared against values you extract from the data files.

For this exercise you are to:

A. Compare some population values of a California city of your choosing to the State of California.

B. Examine all cities of the United States to see which cities have very high and low values for a given characteristic.

C. If you have software, produce a map of the entire United States to look at the distribution of cities with the highest values of a given characteristic.

A. Describing the General Population of a City.

Examine the FIPS codes of US cities (Appendix F) to locate the code of a California city of interest to you. You may also load the city file and scan down the list of names. Note that to extract the data for states that you must use a summary level code of 040 and for cities you must use a code of 161. Also you will need the codebook of variables in the USCI file. Most of these exercises will require only the SPSS functions of subsetting the file, summary statistics, sorting, and listing. Note the references to SPSS below:

1. From the USCI file use SPSS to extract the following variables. Note that three variables will need to be added to obtain figures for American Indians, Aleuts, and Eskimos and that nineteen variables (P007006 through P007024) will have to be added to obtain figures for all Asians.

 AREALAND           Land area
 P0010001            Total population

 P0050001            Males

 P0050002            Females

 P0100001            Non-Hispanic white population

 P0070002            Black population


  thru P0070005    American Indian-Eskimo-Aleut population


  thru P0070024    Asian and Pacific Islander population

 P0080001            Hispanic population

 P17A0001            Persons per family.

2. Compute the percentage of total population that is male, non-Hispanic white, black, American Indian-Aleut-Eskimo, Asian and Pacific Islander, and Hispanic. To calculate the percentage, multiply the population of each group by 100 and divide the result by the total population.

3. Compute the population density in square miles. Do this by dividing the city's total population by its land area. Note that the census data are in square kilometers with an implied decimal of .001. This means that a figure of 31652 should be divided by 1000 to get  a figure of 31.652 square kilometers. To convert the tabled figure to square miles divide by 2590.

The new variables will be appended to the end of the SPSS spreadsheet. 4. Print out the absolute numbers and percentage values for your city, California and the United States. Construct a table with carefully chosen title and row and column headings to show clearly the absolute and percentage values for these areas. Briefly discuss how your city compares to the values for California and the United States.

B. Comparing All Cities on a Given Characteristic

In this part of the exercise you will compute a few values for all cities in the U.S., rank them on that value, note which cities tend to rank very high or very low, and make a hypothesis about what types of cities tend to rank very high or very low.

1. Select one of the percentage or ratio variables created in A. above or create a new percentage from the data in the table with the Compute option.

2. Note the values for a few cities that lie at the ends of the ranked list. If the values are percentages, also note the size of the absolute values contributing to the calculation of the percentage. Try to suggest any possible reasons for cities to appear at the extreme ends of this ranking.

C. Mapping a Distribution

Database:  USCOsp.por

See Appendix M for a review of the operation of the Arcview program.

In this exercise you will produce a map of infant mortality in the United States with counties as areal units. You will then examine the map to see what spatial patterns are evident. Before you make the map, think to yourself about what areas you would expect to have unusually high or low infant mortality.

1. For the U.S. County data set compute the percent non-white population by subtracting the non-Hispanic white population (P0100001) from the total population (Item5) and dividing by the total population.

2. From the U.S. County data set select the FIPS code, county name, infant mortality (Item52), percent of persons with a bachelor's degree (Item71), median household income (Item79), percent of families with income below poverty level (Item83), and the percent non-white population. Export these values to a dbf or ascii file for use by a mapping program.

Note that your file and the mapping file of county boundaries must each have a linking item that matches exactly. Usually this is the FIPS code. However, sometimes the leading zero on some of the codes gets dropped or the variable is changed from character to numeric. If this happens, your data file will not link to the mapping file. It is possible to export the list of FIPS codes from a mapping program and then paste it next to the codes in your data sheet to get a matching variable.

3. If using ArcView, you can find two needed themes in the sample data included within the directory, ESRIdata. Within that directory is a USA subdirectory which contains shape files for state outlines, county outlines, and city centroids.

4. Start the ArcView program, add the County theme from the ESRIdata directory. Add your data as a table and then join it to the city theme attributes using the STPLCODE variable in your table.

Module Table of Contents