QUESTION 2 - ANSWER

Among people who reported working, to what extent does the number of hours worked in a week predict a person's monthly income? Without running any further regressions, what other variables might also help predict a person's monthly income?

To investigate this issues, we first need to drop the negative cases in hours_wo. Remember that is usually a good idea to tabulate categorical variables before using to check them for negative values and other possible coding problems that need to be resolved before you arrive at any definitive conclusions about relationships. In our case, hours_wo has negative values that should be omitted when regressing this variable. There are several ways of handling this issue, but in this case, we choose to creat a new variable that will contains only 0 and positive values. To accomplis this we need the following:

gen hours = hours_wo
replace hours = . if hours <= 0
[Note: we are interested in those who "work" thus we eliminate 0's]

Now we can run the regression:

reg incmon hours

Which yields the following results:

  Source |       SS       df       MS                  Number of obs =     621
---------+------------------------------               F(  1,   619) =    0.49
   Model |  2203133.62     1  2203133.62               Prob > F      =  0.4832
Residual |  2.7704e+09   619  4475575.66               R-squared     =  0.0008
---------+------------------------------               Adj R-squared = -0.0008
   Total |  2.7726e+09   620  4471910.43               Root MSE      =  2115.6

------------------------------------------------------------------------------
  incmon |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
   hours |  -4.578381   6.525533     -0.702   0.483      -17.39325    8.236484
   _cons |   1785.673   303.2978      5.888   0.000       1190.055     2381.29
------------------------------------------------------------------------------

It seems that the number of hours a person works does not effectively predict a person's income. How do we know that? The R-squared tells us how much "predicting power" our independent variable has. From the table above, we can see that the R-squared is only 0.0008 and that hours is not statistically significant (P>|t| = 0.483).

Without going through all the regressions, what are some other variables that may be important in predicting income? Someone living in South Africa may guess that province, urban vs. rural, and race all may play a role in predicting income. The following sections will teach us how to run regressions using these types of third variables.

Back to Questions