Regression, like correlation, does not determine causation. Its strength is that unlike correlation, it measures the parameters of the association. That is, correlation can show that disposable income and household consumption move together, but regression measures the amount by which consumption will increase when disposable income rises by a dollar. Because regression goes beyond correlation to a measurement of the size of the affect of one variable on another, it is the favorite statistical technique in empirical economics.
In the example of regression just cited there is an implicit assumption about causation, even though neither regression nor correlation can prove it. Economists assume that changes in disposable income cause changes in consumption (although we allow that the reverse is true as well, at least in the aggregate). In the regression statistical procedure, is assumed that one variable is dependent (consumption) and the other is independent (disposable income). This assumption tells us about the researcherís intuition, or the theory in use, but it cannot be validated or invalidated with the regression procedure. To repeat, whatever we think we know about causation must come from theories and histories and other pieces of information besides regression.
C = f(Yd),
where C is consumption and Yd is disposable income. The plot of c and dp1 in Chart 10 revealed that the relationship was linear, so we can convert the general functional notation into a specific, linear functional form:
C = c0 + c1 Yd.
In this form, the consumption function is a straight line, with intercept c0 and slope c1. In economics, the intercept, c0, is called autonomous consumption since it is independent of (autonomous from) disposable income. The slope, c1, measures the rate of change in consumption given a change in Yd. For example, if Yd increases by $1, then C changes by (c1 )*($1) = c1.
Let D stand for the change in a variable, so D Yd is read as "the change in disposable income." Then, if Yd changes by D Yd, the change in C (D C) is c1 D Yd:
C = c0 + c1 Yd,, and D C = c1D Yd,
so that c1 = D C / D Yd = the marginal propensity to consume = MPC
Autonomous consumption and the marginal propensity to consume are the parametersof the linear consumption function. Mathematically, they are the intercept and slope of a line that describes the relationship between disposable income and consumption. Regression analysis is an exercise in estimating their values, but before we do regression, we have to take into account one more element of every regression model.
The linear consumption function, C = c0 + c1 Yd, is a deterministic model. It allows no room for variation away from the relationship. Once Yd is known, C is completely determined (given the parameters c0 and c1). In fact, the consumption function describes a tendency, not a mathematically fixed relationship. The relationship between consumption and disposable income is probabilistic, or to say the same thing with a 5 dollar word, it is a stochastic relationship. Stochastic relationships are not fixed like deterministic relationships, there is always margin for variation away from the general tendency. Therefore, if the consumption function describes the deterministic relationship, we need to add a term to let the actual behavior of consumption in the actual economy deviate from the value predicted by disposable income:
C = c0 + c1 Yd + e,
where e is a random error term. On average, e is zero, so C = c0 + c1 Yd, but in any given year, C could be more than predicted (e > 0), or less than predicted (e < 0). Graphically, the inclusion of a random error terms allows for the possibility that the scatter points of the consumption function do not fall on a straight line.
With the regression procedure in SPSS we can compute the values of the parameters c0 and c1.
The results are in Table 6 where I have divided them into 3 parts. In each part, the most important numbers are in bold. Part 1 has 1 number (Adj. R Square =.99964), Part 2 has none, and Part 3 has several. One of the keys to using SPSS or any statistical package, is to not become overwhelmed by the amount of output it generates; the trick to that is to know what you can ignore, at least initially. As you become more skilled, you will find uses for the things we are going to ignore for now.
- Select Statistics from the menu bar, choose Regression, then select Linear . . .;
- Highlight c in the variable list, and click the arrow to put it into the Dependent box;
- Highlight dp1 in the variable list, and click the arrow to put it into the Independent(s) box;
- Click OK.
Elements of Regression Output
Part 1 provides 4 measures of goodness of fit. These are statistics that tell how well the data fits the model. R Square and its adjustment, Adj. R Square, can be interpreted as the percentage of the variation in the dependent variable that is explained by the independent variable. In our model, C is the dependent variable and Yd is independent, so movements in Yd explain nearly all (>99%) of the movement in C. There is no threshold for the R squared or adjusted R squared where they go from bad to good, but by any criteria, our model explains nearly all the variation in C.
Part 1 Multiple R 0.99982 R Square 0.99965 Adj. R Square 0.99964 Standard Error 27.4352 Part 2 Analysis of Variance DF Sum of Squares Mean Square Regression 1 141122061.6 141122061.6 Residual 66 49677.6 752.7 F=187489.9 Signif F=0.000 Part 3 Variable B SE B Beta T Sig T DP1 0.918815 0.002122 0.99984 433.001 0.000 (Constant) -12.33177 4.243069 -2.906 0.0050
Adjusted R square is an adjustment to R square (duh!) that takes into account the number of independent variables. Since we only have one, Yd, the two are close in value. The adjusted R squared of 0.99964 looks too good to be true and it probably is; for various technical reasons, some statistical, some economic, it makes the model look better than it is. (Two reasons: autocorrelation, and nominal data.) It pays to be skeptical, even (especially) when things look great.
Part 2 provides a number of statistics that are grouped together under the subject of analysis of variance. Basically, Part 2 provides measures that break down the variation in C and attribute the different parts to the deterministic part of the model (c0 + c1 Yd ) and the stochastic part (e). These are useful measures in more advanced routines, but they are unnecessary at this point.
Part 3 is the core of the output. Part 3 has the estimated values of c0 (-12.33) and c1 (0.9188). These are in the column labeled B. The next column, SE B, is the standard error of the estimates (0.002122 for c1, and 4.243069 for c0.) These are measures of the precision of our estimates of c0 and c1. The smaller the standard errors, the more precise are our estimates. The column labeled Beta can be ignored, but the following column, T, has important information. T is the value of the t-statistic that is constructed to test the hypothesis that the "true" values of c0 and c1 are zero. Let the unobserved true values be symbolized with Greek letters, b0 and b1. We want to test the hypotheses:
H0: b0 = 0 versus H1: b0 ¹ 0, and
H0: b1 = 0 versus H1: b1 ¹ 0.
If we accept the second null, then it means that disposable income has no affect on consumption. Since this is one of the primary reasons for doing regression (i.e., to see if disposable income affects consumption, and if so, how much), every statistical package automatically turns out a t-statistic to test this hypothesis. The formula for the t-statistic is:
t-value = (c1 - value in null hypothesis)/(standard error of the estimate) =
(B - 0)/(SE B) = (0.9188 - 0)/(0.002122) = 433.
The last column of the SPSS printout in Part 3 is labeled Sig T. It is the probability of the t-statistic, which is also the probability of getting the data in the dataset when the null hypothesis is true (H0: b1 = 0). Since the probability (to four decimal places) of getting a sample value, c1 , that is 0.9188 with a standard error of 0.002122, is 0, we should reject the null hypothesis.
The estimated equation is
Dur = 1.274321 - 0.361428(Percent change in real GDP).
The interpretation of this relationship is that, on average, each 1% increase in the rate of growth of real GDP, reduces the unemployment rate by 0.36 percent. You should check the goodness of fit statistics, R square and adjusted R square, and the t-statistics for the slope and the intercept. Follow the procedure outlined for the consumption function.
The implications of Okunís Law are that output must grow by about 3.5% per year (1.27/0.36) just to keep unemployment from rising. Why? The answer is that the labor force grows about 1% a year (check this), so output has to grow at about the same speed to provide enough new jobs. Second, labor productivity (output per hour worked, prod1 in the dataset) grows at about 2.3 percent a year (check this) so even if no new jobs are created, output goes up 2.3 percent. Put these two forces together, and real GDP has to grow over 3 percent a year on average just to keep the unemployment rate from going up. Because of this relationship, many economists view "normal" economic growth as approximately 3-3.5%.
Okunís Law has also been used to try to measure the costs of unemployment to the national economy. When unemployment holds constant (Dur = 0), real GDP grows about 3.5%. Now solve for the percent change in real GDP if unemployment rises by 1 percentage point (D ur = 1):
1 = 1.2743 - 0.36142(Percent change in real GDP)
Þ Percent change in real GDP = 0.75.
When GDP growth falls from 3.5% to 0.75%, we lose about 2.75 percent of potential GDP. Given that our GDP is roughly 8,000 billion in nominal terms, a loss of 2.75 percent represents a loss of about $220 billion (0.0275*8,000). In other words, each 1% increase of unemployment costs the US economy around $220 billion in lost output.
To see the Phillips relationship that economists in the 1950s and 1960s worked with, we should omit the data from the 1930s and World War II. In addition, since the relationship broke down in the 1970s, we will work with data limited to 1948-1969. Algebraically, the relationship can be expressed as
pt = b0 + b1ut + et,
where pt is inflation in year t, b0 is the intercept of the regression line, b1 is the slope parameter which is expected to be negative, ut is the unemployment rate in year t, and et is the random error terms that measures deviations from the average relationship.
In the data set, the unemployment rate is variable ur, and the inflation rate is a computed variable that is the percentage change in the CPI. We calculated this in several earlier exercises.
Now run the regression using your inflation variable as the dependent variable and ur for the independent variable. You should get
- Select Data from the menu bar, then Select Cases. . .;
- Click the button for Based on time or case range, then click Range;
- In the boxes type 1948 and 1969;
- Click OK.
pt = 6.917 - 0.987ut + et,
This is the relationship that broke down during the 1970s. To see this, change Select Cases to the years 1970 to 1996 and re-run the regression. Look at the R squared. Does ut explain anything about inflation? Is the sign on ut what you expected (i.e., is your estimate of b1 negative)? Is it significantly different from zero? That is, do you accept or reject the null hypothesis H0: b1 = 0?
Needless to say, most economists were puzzled by this. As early as the mid-1970s it was apparent that the Phillips relation no longer worked. What could have gone wrong? The answer was waiting in the wings in the form of a earlier prediction made by Milton Friedman. Friedman had argued that as soon as people changed their expectations about inflation, the Phillips curve would breakdown. Friedmanís point was that inflation partly depended on what people expected it to be. If everyone thought it was going to be high, then workers would demand wage increases, and businesses would expect higher costs, so they would raise their prices. The net result would be inflation--in part because everyone expected it and acted to protect themselves by raising their wage demands and their prices.
Until the late 1960s, prices seemed to have no trend; they were about as likely to fall as they were to rise. Consequently, it made sense to expect zero inflation since that was close to the long term average. In the late 1960s and early 1970s, this changed. Inflation was ratcheted up by a combination of events--the Vietnam War, domestic spending for the War on Poverty, bad harvests in the early 1970s, and, in 1973, the first oil crisis. Households and businesses began to expect that inflation would not be zero, facts bore out the correctness of this view, and the inflation rate rose. Friedmanís arguments led economists to the "expectations augmented" Phillips curve, which is just the old Phillips curve with another variable, expected inflation, on the right hand side:
pt = b0 + pe + b1ut + et,
where pe is the expected rate of inflation. The old Phillips curve is a variety of this one in which pe is zero. Here, pe is expected to be positive, so that for a given unemployment rate, inflation is higher by that amount.
The obvious question is whether or not this can be measured. That is, how do we know (measure) the expected rate of inflation? Friedmanís answer was to point out that most of us use the recent past to form our expectations about the future. For example, will it be hot or cold today? When my kids ask me that in the morning, I always tell them that it will be just like yesterday. (Of course, I could look it up in the weather section of the morning paper, and sometimes I do if there is reason to believe the weather might be changing. Looking it up--seeking additional information--is the rational thing to do and conforms to the economic idea of rational expectations. It is forward looking and incorporates all readily available information that is not too costly to obtain. Friedmanís idea--today is like yesterday--is called adaptive expectations.)
For the sake of simplicity, we assume that our expectation of inflation today is that it will be like last periodís rate. Algebraically,
pe = pt-1,
where pt-1 is the inflation rate in year t-1 (i.e., last year if this is year t). Using this notation, we can re-write the expectations augmented Phillips curve as
pt = b0 + pt-1 + b1ut + et,
or, moving the expected inflation term to the left:
pt - pt-1 = b0 + b1ut + et,
which we can easily estimate for 1970 to 1996. After selecting the years 1970 to 1996, and computing a new variable pt - pt-1, re-run the regression. You should get
pt - pt-1 = 7.078 - 1.085ut + et.
Notice the similarity to the regression for 1948 to 1969.
This regression has many uses in policy making. For example, it implies that if unemployment is too low, the left hand side will be positive and inflation will be accelerating (pt > pt-1). Economists have a special fondness the rate of unemployment that keeps inflation from rising. Note that this is not the same thing as zero inflation. The unemployment rate that prevails when pt - pt-1 equals 0 is known as (get ready!) the non-accelerating inflation rate of unemployment, or the NAIRU. A prettier but misleading name for it is the natural rate of unemployment.
What is the natural rate? Set the above equation equal to zero, and solve:
0 = 7.078 - 1.085ut,
or ut = 6.5. Anything less, and inflation is supposed to increase; anything more and it decreases. Unemployment is currently less than 5%, so you can guess why the Federal Reserve and inflation hawks are nervous. Inflation should be ratcheting up, but it is not. We donít know why, and the debate rages on among economists. It is clear, however, that the natural rate of unemployment, or the NAIRU, changes over time. It seems to have fallen in the 1990s, but no one can say how low. The data is not loud and clear enough for us to be certain.