For this regression, we will be using total monthly food value (tmxcon) and total monthly income (totminc).
graph tmxcon totminc
gen logcon=ln(tmxcon)
gen loginc=ln(totminc)
graph logcon loginc
With the logarithmic transformations, we can see an upward trend whereas with the non-transformed values, the great majority of the observations are clustered at low values of total monthly income. We would believe, then, that the logarithms of the variables rather than the variables themselves will provide a better regression.
regress tmxcon totminc
Source | SS df MS Number of obs = 942 ---------+------------------------------ F( 1, 940) = 54.50 Model | 7594013.55 1 7594013.55 Prob > F = 0.0000 Residual | 130977470 940 139337.734 R-squared = 0.0548 ---------+------------------------------ Adj R-squared = 0.0538 Total | 138571484 941 147259.813 Root MSE = 373.28 ------------------------------------------------------------------------------ tmxcon | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- totminc | .0188845 .002558 7.382 0.000 .0138644 .0239046 _cons | 481.9917 13.07734 36.857 0.000 456.3276 507.6559 ------------------------------------------------------------------------------
regress logcon loginc
Source | SS df MS Number of obs = 922 ---------+------------------------------ F( 1, 920) = 301.65 Model | 120.032947 1 120.032947 Prob > F = 0.0000 Residual | 366.082745 920 .397916027 R-squared = 0.2469 ---------+------------------------------ Adj R-squared = 0.2461 Total | 486.115692 921 .527812912 Root MSE = .63081 ------------------------------------------------------------------------------ logcon | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- loginc | .2916718 .0167934 17.368 0.000 .2587139 .3246297 _cons | 4.032556 .1161886 34.707 0.000 3.804531 4.260582 ------------------------------------------------------------------------------
Our suspicions have been confirmed. The regression using the logarithms of the variables explains nearly five times the variation of the non-logarithmic model (an R-squared of 0.2469 vs. 0.0548). Also, the F statistic, and hence the t statistic for slope are larger in the logarithmic model.