EXERCISE 5 - ANSWER

Graph total monthly food value on total monthly income. Now logarithmically transform both total monthly food value and total monthly income and graph them. Which do you think will produce a stronger regression model? Regress total monthly food value on total monthly income and the logarithmic transformation of total monthly food value on the logarithmic transformation of total monthly income. Were you right? Why or why not?

For this regression, we will be using total monthly food value (tmxcon) and total monthly income (totminc).

graph tmxcon totminc

gen logcon=ln(tmxcon)
gen loginc=ln(totminc)

graph logcon loginc

With the logarithmic transformations, we can see an upward trend whereas with the non-transformed values, the great majority of the observations are clustered at low values of total monthly income. We would believe, then, that the logarithms of the variables rather than the variables themselves will provide a better regression.

regress tmxcon totminc

  Source |       SS       df       MS                  Number of obs =     942
---------+------------------------------               F(  1,   940) =   54.50
   Model |  7594013.55     1  7594013.55               Prob > F      =  0.0000
Residual |   130977470   940  139337.734               R-squared     =  0.0548
---------+------------------------------               Adj R-squared =  0.0538
   Total |   138571484   941  147259.813               Root MSE      =  373.28
------------------------------------------------------------------------------
  tmxcon |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
 totminc |   .0188845    .002558      7.382   0.000       .0138644    .0239046
   _cons |   481.9917   13.07734     36.857   0.000       456.3276    507.6559
------------------------------------------------------------------------------

regress logcon loginc

  Source |       SS       df       MS                  Number of obs =     922
---------+------------------------------               F(  1,   920) =  301.65
   Model |  120.032947     1  120.032947               Prob > F      =  0.0000
Residual |  366.082745   920  .397916027               R-squared     =  0.2469
---------+------------------------------               Adj R-squared =  0.2461
   Total |  486.115692   921  .527812912               Root MSE      =  .63081
------------------------------------------------------------------------------
  logcon |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
  loginc |   .2916718   .0167934     17.368   0.000       .2587139    .3246297
   _cons |   4.032556   .1161886     34.707   0.000       3.804531    4.260582
------------------------------------------------------------------------------

Our suspicions have been confirmed. The regression using the logarithms of the variables explains nearly five times the variation of the non-logarithmic model (an R-squared of 0.2469 vs. 0.0548). Also, the F statistic, and hence the t statistic for slope are larger in the logarithmic model.

 

BACK TO EXERCISE QUESTIONS