People often want very basic information about housing and population in specific areas like cities or counties. They want to know the number of children within a community, the level of poverty, the kinds of employment that people are engaged in, or the size and age of housing. Political representation and revenue sharing are allocated based on numbers of persons, and the amount of government spending is often based on the numbers of persons with a given characteristic.
Just acquiring the desired information is often not sufficient. To understand the meaning of the data, the values should be compared to a place of similar size or to a larger summary area such as an entire city, county, state, region, or the United States. This information helps one understand whether the acquired data values are greatly different from those of a much larger population. For example, data for the city of San Francisco could be compared to corresponding values for other cities in California or the State as a whole while values of California could be compared to either other states or national averages.
Furthermore, demographers frequently extract the same information for earlier censuses. In this way they get a sense about whether the current values represent increases or decreases from previous decades.
A. Some Basic Population Data Describing a City
As an example for this exercise we will arbitrarily pick the city of Glendale, California. It has a census place code of 30000.
Table 1. Ethnic Populations in Glendale, Los Angeles, and California, 1990 Glendale Los Angeles City California Area in Square Miles 30.61 469.3 155,973 Total Population 180,038 3,485,398 29,760,021 Males 86,606 1,750,055 14,897,627 Females 93,432 1,735,343 14,862,394 Non-Hispanic Whites 114,765 1,299,604 17,029,126 Blacks 2,334 487,674 2,208,801 American Indians 629 15,641 242,164 Hispanics 37,731 1,391,411 7,687,938 Asians and Pacific Islanders 25,453 341,807 2,845,659 Persons / Family 3.22 3.48 3.32 Pop Density per Square Mile 5881.5 7426.2 190.8 % Male 48.1 50.2 50.1 % Non-Hispanic White 63.7 37.3 57.2 % Black 1.3 14.0 7.4 % American Indian 0.3 0.4 0.8 % Hispanic 21.0 39.9 25.8 %Asian and Pacific Islander 14.1 9.8 9.6
Glendale is a city of about 30 square miles located just northeast of downtown Los Angeles. Its 1990 population was just over 180,000.
Density - The population density of the city seems high compared to all California, but the state contains large, unsettled areas while most cities do not. Glendale does contain some large areas of mountainous open space so its population density is less than that of neighboring Los Angeles. Density values exceeding 20,000 persons per square mile are found in some neighborhoods of large cities. Density computed this way assumes the population is spread evenly over the sampling area, but this is rarely the case.
Family Size - An important indicator of the number of people in a household is the average number of people per household, but the number of people in an average family is also sometimes used. In Glendale the number of persons per family is slightly lower than the state. This may be a result of an older population, more singles, or the larger white population, a group that tends to have smaller families.
Sex - There are fewer males than females in Glendale and the proportion is lower than for all California. This may be another indicator of an older population in the city since the number of females tends to exceed the number of males in older age groups. Age data could be extracted to confirm this.
Ethnicity - Non-Hispanic whites are the largest group within the city. Expressed as a percentage, non-Hispanic whites constituted about 64% of the population while Hispanics and Asians accounted for 21% and 14% respectively. Compared to the State, Glendale has higher percentages of both whites and Asians and a substantially lower percentage of blacks. If more detailed race data had been used, the relatively larger Korean and Filipino communities within Glendale would have been evident within the Asian and Pacific Islander category.
B. Examining a Characteristic in All Cities - Ranking Places
Often one wants to see how cities rank according to a given characteristic. Once the ranking is done, those cities that have very high or very low values can be examined in more detail to see if reasons can be determined for their position in the ranking.
Densely Populated Places
In the example below, cities have been ranked by population density. State names, area, and total population have been included.
Table 2. Highest and Lowest Density Cities
in the U. S., 1990
State Area in Sq. Mi. City Total Population DensityPop/Sq. Mi. NJ 1.27 Union City 58,012 45,822 NJ 1.02 West New York 38,125 37,502 NJ 1.27 Hoboken 33,397 26,243 CA 1.17 Maywood 27,850 23,900 NY 308.95 New York 7,322,564 23,702 NJ 0.96 Cliffside Park 20,393 21,153 CA 1.10 Cudahy 22,817 20,728 NJ 2.88 Irvington 59,774 20,725 CA 0.74 Walnut Park 14,722 20,026 CA 1.15 Lennox 22,757 19,785 CA 1.88 West Hollywood 36,118 19,228 NJ 3.92 East Orange 73,552 18,743 NJ 3.10 Passaic 58,041 18,707 CA 54.10 Twentynine Palms 11,821 218 OK 79.64 El Reno 15,414 194 NH 61.73 Berlin 11,824 192 MO 97.18 Fort Leonard Wood 15,863 163 FL 74.78 North Port 11,973 160 ME 75.75 Presque Isle 10,550 139 AK 1697.65 Anchorage 226,338 133 VA 400.08 Suffolk 52,141 130 MN 181.68 Hibbing 18,046 99 MT 716.18 Butte-Silver Bow 33,336 47 MT 736.94 Anaconda 10,278 14 AK 2593.56 Juneau 26,751 10
Cities with the highest densities are usually found in California or New Jersey. With the exception of New York City, these cities tend to be small in area with fairly small populations. All are a part of major urban areas. Probably they contain many apartment units for people who work in nearby Los Angeles and New York.
Examination of the low density cities indicates a problem with calculating density for cities. There is no guarantee that the corporate limit of a city encompasses populated areas. For political reasons the boundary may have been extended far beyond the settled portion of the city. The areas surrounding Juneau and Anchorage, Alaska are such cases.
Ethnic Composition - An Example
The question on race in the U.S. Census is separate from the question on Hispanic origin. People can indicate a particular race such as white, black, American Indian, any of several Asian groups, or other. Then they may indicate if they are or are not of Spanish/Hispanic origin, such as Mexican, Cuban, or Puerto Rican. Many Hispanics indicate their race as white, yet whites are commonly seen as distinct from Hispanics. Thus, tabulations based on the total reported white race are complicated by two distinctly different groups. To compensate for this it is usually better to use the non-Hispanic white category when tabulating data for "whites." This removes those persons of white race who indicated that they also were of Hispanic origin.
In the following table, the percent of the total white population that is also Hispanic has been calculated and ranked. For the entire United States, 5.8% of the white population is white Hispanic. Table 3a below shows the 10 cities with the highest percentages of white race population that are also Hispanic. Most of these cities are on the U.S. - Mexican border, but there are stongly Hispanic places within the Los Angeles and Miami metropolitan areas.
Table 3a. Highest Ten United States Cities with the Hispanic White Population as a Percent of Total White Population
State City White Race Population Hispanic Persons Indicating White Race % of White Race Persons
that are Hispanic White
United States 199,686,070 11,557,774 5.8 CA Calexico 12,628 12,212 96.7 CA Florence-Graham 11,676 11,219 96.1 TX Socorro 18,071 17,123 94.8 CA East Los Angeles 53,330 49,833 93.4 TX Eagle Pass 11,696 10,855 92.8 TX Laredo 87,048 80,224 92.2 FL Sweetwater 10,857 9,967 91.8 CA Coachella 5,329 4,783 89.8 AZ Nogales 13,642 1,232 89.7 TX Mercedes 10,208 9,102 89.2
Table 3b provides this data for cities of 500,000 or more persons. Large cities exhibit a great range in percentage of Hispanics within the entire white population. Obviously the error of using the entire white race population as an indicator for non-Hispanic whites is much more serious in the first half of the list, but only in a few places like Baltimore and Indianapolis are Hispanics so few in number that the entire white race population is a good indicator of the white population as usually understood.
Table 3b. Hispanic White Population as a Percent of White Population
United States Cities Over 500,000 Persons, 1990
State City White Race Population Hispanic Persons Indicating White Race % of White Race Persons
that are Hispanic White
TX El Paso 396,122 260,120 65.7 TX San Antonio 676,082 336,967 49.8 CA Los Angeles 1,841,182 541,578 29.4 TX Houston 859,069 196,427 22.9 CA San Jose 491,280 103,533 21.1 NY New York 3,827,088 663,963 17.3 IL Chicago 1.263.524 207.476 16.4 TX Dallas 556,760 76,780 13.8 CA San Francisco 387,783 50,665 13.1 CA San Diego 745,406 93,671 12.6 AZ Phoenix 803,332 97,640 12.2 DC Washington 179,667 13,536 7.5 MA Boston 360,875 22,141 6.1 LA New Orleans 173,554 9,028 5.2 MI Detroit 222,316 10,038 4.5 WI Milwaukee 398,033 16,316 4.1 OH Cleveland 250,234 8,682 3.5 PA Philadelphia 848,586 22,747 2.7 FL Jacksonville 456,529 10,007 2.2 WA Seattle 388,858 8,435 2.2 MD Baltimore 287,753 3,566 1.2 IN Indianapolis 554,423 4,374 0.8 TN Memphis 268,600 2,110 0.8 OH Columbus 471,025 3,673 0.8
Table 3c indicates the cities in which Hispanics of white race are expressed as percentage of all Hispanics. The last column represents an intriguing phenomenon. Although in the entire United States, 51.7% of the Hispanics indicated their race as white, in these places, Hispanics identified their race as white at much higher percentages. Because many of these places are too small to have local Hispanic communities, it seems likely that these Hispanics are highly acculturated and assimilated into the general white population.
Table 3c. Hispanic White Population as a Percent of Hispanic Population
United States Cities Over 10,000 Persons, 1990
State City Hispanic Population Hispanic Persons Indicating
% of Hispanic Origin that are Hispanic White United States 22,354,059 11,557,774 51.7 TX Pecos 8,769 8,689 99.1 WV Moundsville 121 117 96.7 FL Kings Point 60 58 96.7 MI Grosse Pointe Farms 72 69 95.8 OH Ironton 23 22 95.7 FL Aventura 1,067 1,017 95.3 FL Coral Gables 16,778 15,989 95.3 NJ Holiday City -Berkeley 99 94 94.9 AL Albertville 77 73 94.8 OH Tallmadge 77 73 94.8 NY Hamburg 76 72 94.7 OH Norton 52 49 94.2 NJ Hanover Twp. 266 250 94.0 AZ Sun City West 33 31 93.9 NY Massapequa Park 336 315 93.8 AL Alabaster 80 75 93.8 FL Hamptons at Boca Raton 319 299 93.7 OH North Canton 95 89 93.7 FL Olympia Heights 29,922 27,984 93.5 FL Westchester 24,554 22,924 93.4
C. Mapping a Distribution
Whenever one analyzes a large number of census observations, it is often very helpful to also produce a map of the data to see if there are any spatial patterns that may not be apparent in a table. Maps reveal spatial qualities that are rarely evident in statistical tabulations. A researcher may notice that certain places seem to occur near one another when values are sorted in a table, but maps provide this information in detail and at a glance. For example, one can see from Table 2 that many of the densely populated cities are in California and New Jersey, but the table doesn't indicate if these cities are clustered together or linked to certain geographical features such as industrial areas, central cities, or agricultural areas.
To produce maps one needs either a file of the boundaries of the geographic units or a single point for the centroid (spatial center) of the unit. Fortunately, the Census Bureau provides a centroid value in latitude and longitude terms for each of its geographic units. It also publishes the area of these units that can be used to calculate the density of a variable within the unit. The actual boundaries can be obtained in several ways: by using software that will generate them from the street segments in a census TIGER file, by purchasing them from one of several data vendors, or by downloading them (often for free) over the Internet. Usually boundary files provided by data vendors are better in quality than those from other sources. In addition, many geographic information systems (GIS) software packages include boundaries in their sample data for nations, states, counties and ZIP codes.
The size of a statistical area used for analysis can be significant. It is important to realize that the results of analyses are applicable to only the selected units - not to individual people or to units of different sizes. Larger areas mask some of the variability found between smaller areas. Within the state, for example, county rates might vary greatly from that reported for the state and possibly significant differences could be masked if only the state averages are used.
The Census Bureau reports data for blocks, block groups, tracts, counties, states, divisions, and regions. In addition, tabulations are made for places, Congressional districts, metropolitan and rural areas, and various administrative units such as Indian reservations. A block contains about 100 persons, a block group about 1000, and a census tract about 4000. However, there may be a considerable range in these values. In Los Angeles County, census tracts range in value from 0 to over 35,000 persons. The average size is around 5500 persons.
For local area analysis, tracts have long been the preferred areal unit, while at the regional or national level, counties have been used. Within a local area, block-level statistics are occasionally used to compare neighborhoods, but tabulations of data from the sample questions are unavailable for blocks, and so analysis possibilities are more limited.
Mapping Counts and Percents
Examining patterns of counts of population on maps reveals only part of a picture. Such maps indicate where there are more or fewer people, but they may not indicate differences in the relative concentration of one ethnic group compared to another. For example, mapping the number of Hispanics indicates where the numbers are, but one also would expect to find more Hispanics where there are more people. Thus, mapping counts of population components yields maps that are often very much alike. It is usually more valuable to additionally map the percentage of the total population that is Hispanic to reveal where the group is proportionately more concentrated.
Mapping a group by density (i.e. dividing by the sampling unit area) may also be helpful since it readjusts the total population count for the varying areas of the statistical units. A potential problem with mapping population counts is that larger statistical areas generally contain larger numbers of a population.
Although a very large number of mapping styles are possible for portraying statistical information, in practice only a few are used. This is especially true when using computer software, which typically presents few mapping options.
The most common census mapping product is probably the choropleth map. Here the statistical areas are shaded in relation to the data values. The technique is very common with census data because values are reported for statistical units. The values for the areal units are sorted and divided into four to eight classes. Each class is assigned a progressively darker or brighter tone such that a visual order is apparent that approximates increasing magnitude of the values. This would seem a straightforward relationship, but many people assign colors to categories in an almost random way. An alternative approach is to use a bi-variate color scheme that uses two hues that progressively darken as values depart from an average or selected base value. For example, one might create employment categories that become more brown for counties in those categories that are below the state average. Employment categories above the state average might be in shades of a progressively darker green.
A real challenge in choropleth mapping is to decide on an appropriate number of classes and on a method for selecting class breaks. There is no simple answer to this problem. As a rule of thumb the method proposed by George Jenks (the default method and currently misnamed "natural breaks" in ArcView) would be preferable to others. This method seeks to minimize variation between values within the classes. In many situations, especially when a number of maps are to be compared, quantile breaks are appropriate. An alternative method occasionally used is to compute the mean of the distribution and to create class breaks based on standard deviation values about the mean.
On choropleth maps data should be expressed as a ratio, index, percentage, or density. Such maps are not appropriate for showing counts of people. This is because large areas tend to appear in higher classes not because of any characteristic, but because larger areas encompass a greater portion of a population distribution. Obviously Texas will have more people than Oklahoma because it covers more area.
Another concern with the difference in size of areal units on choropleth maps is that larger areas will visually dominate on the map and many of these are in rural areas with small populations. Such large areas call undue attention to themselves on the map. On the otherhand, large populations occur in very small areas such as the boroughs of New York or in Washington D. C. and might not be noticed by map readers. An inset map can be helpful in drawing attention to some of these smaller areas if they are not discernible on a map of a large area such as the entire United States.
Graduated Symbol Maps
A second method often found in census mapping is graduated symbols. With this approach the area of a circle or square is made proportional to the value of an attribute. Graduated symbols may be used for point features such as cities and may represent counts of things. A frequent problem with this technique is that the range of values far exceeds the range that can be effectively presented on the map. Thus, it may be necessary to set a lower limit to be displayed. Values below the threshold are either not shown or are assigned a standard symbol. An alternative strategy available in some programs is to define a set of groups and then assign a single symbol size to all values falling within the range of a given group. This method, referred to as "range-graded symbols" is somewhat like the classification scheme used for the choropleth map.
A third method is the dot map, a technique that requires the assignment of a given number of individuals to a dot. The dot is then located to represent the approximate location of a group of individuals. When done manually, additional maps and aerial photographs may be used to help determine the appropriate dot placement. It also permits the overlay of multiple distributions on the same map by using dots of different shapes or colors.
Unfortunately, computer programs can only locate the dots randomly within a statistical area. The patterns only begin to become meaningful when statistical areas as shown on the map are very small. In other cases, the look of the distribution can be improved by moving the map to a graphic arts program where dots can be moved individually away from unpopulated areas within the statistical units.
Mapping with ArcView
The California State University currently has a site-license for ESRI software that includes a mapping/GIS package called ArcView. This package, or any other, can be used to produce choropleth, graduated symbol, and dot maps from census data. Appendix M gives an explanation of the use of ArcView.
Mapping Cities with High Population Density
In the example below, population densities used for Table 2 have been mapped. To focus on the most densely populated cities, the frequency distribution of cities was divided equally into 20 classes. The top class was split in two. Thus, only the top 5 percent of the cities are presented. Note that most of the cities cluster around New York and Los Angeles. To better examine these areas, look at the two large-scale inset maps. (Two new Views were produced at large scale and added to the Layout Window.)