The variables of interest here are stxfood (monetary value of food eaten out in a month) and totminc (total monthly income at the household level). Because both of these variables are at the household level, drop all observations that are not the respondent of the survey.
keep if pers_res==1
Then do the regression to generate the output table:
regress stxfood totminc
Source | SS df MS Number of obs = 958 ---------+------------------------------ F( 1, 956) = 122.26 Model | 497515.095 1 497515.095 Prob > F = 0.0000 Residual | 3890284.87 956 4069.33564 R-squared = 0.1134 ---------+------------------------------ Adj R-squared = 0.1125 Total | 4387799.96 957 4584.95294 Root MSE = 63.791 ------------------------------------------------------------------------------ stxfood | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- totminc | .0048313 .0004369 11.057 0.000 .0039738 .0056887 _cons | 13.63799 2.215559 6.156 0.000 9.290068 17.98591 ------------------------------------------------------------------------------
The t-stat of 11.057 is greater than 1.96 so income has a significant effect
on the amount of money spent eating out.
The coefficient of .0048, however, suggests that the effect isn't that big
though. A income increase of one Rand means that a household will spend .0048
more Rand eating out.
Next, predict the regression values:
predict outhat
Now graph the results with the following command:
graph stxfood outhat totminc, connect(.s) symbol (oi) ylabel xlabel
As one can see in this graph, all of the data is scrunched to the left of the picture. It looks like outliers have presented a major problem. To get rid of this drop any observation whose income is more than 50,000 Rand per month and see if the line has a better fit.
drop if totminc>50000
regress stxfood totminc
Source | SS df MS Number of obs = 957 ---------+------------------------------ F( 1, 955) = 205.92 Model | 764623.026 1 764623.026 Prob > F = 0.0000 Residual | 3546160.96 955 3713.25755 R-squared = 0.1774 ---------+------------------------------ Adj R-squared = 0.1765 Total | 4310783.98 956 4509.18827 Root MSE = 60.937 ------------------------------------------------------------------------------ stxfood | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- totminc | .0111038 .0007738 14.350 0.000 .0095853 .0126223 _cons | 3.102669 2.382612 1.302 0.193 -1.573091 7.778429 ------------------------------------------------------------------------------
If you compare this output table to the one from before, you will see a higher t-stat and R-squared value. These both indicate that the regression line now has a better fit. Re-predict values and re-graph to see this difference.
predict out1hat
graph stxfood out1hat totminc, connect(.s) symbol (oi) ylabel xlabel
The data are definitely more dispersed in this picture. Removing the outlier was a productive thing to do.