The first part to this answer is that so far in modules 6 and 7, we have only been examining household variables, thus why we have included the term "keep if rel_head==1". Netpay is an individual level variable in that it takes on a unique value for every person in the data set. We can see this by sorting by hhid, and listing the two variables simultaneously.
sort hhid,
list hhid p_netpay
hhid p_netpay 1. 1006 1036 2. 1008 . 3. 1012 330 4. 2001 . 5. 2001 . 6. 2001 . 7. 2001 . 8. 2001 . 9. 2001 . 10. 2001 . 11. 2001 450 12. 2008 . 13. 2008 . 14. 2008 . 15. 2008 . 16. 2012 . 17. 2012 192 18. 2014 . 19. 2014 350 20. 2014 . 21. 2014 800 22. 2014 . 23. 2025 . 24. 2025 . 25. 2025 . --more--
This is just a small example of the entire list, but you can clearly see different net pay amounts for different people within a household. So, if you have been working with this data set under the conditions that STATA is only computing the responses of the heads of households, it is now time to clear the data and reopen the data set so that we can examine all the individuals in it.
There are two data cleaning procedures that must occur prior to running a regression on this question. The first is re-coding the education variable so that it more closely fits our version of a continuous variable.
tab educ_c
6 :education code Freq. Percent Cum. -4 26 0.51 0.51 -3 2 0.04 0.54 00-none 1300 25.27 25.82 01-sub a 663 12.89 38.71 02-std 2 293 5.70 44.40 03-std 3 315 6.12 50.52 04-std 4 331 6.43 56.96 05-std 5 374 7.27 64.23 06-std 6 435 8.46 72.69 07-std 7 252 4.90 77.59 08-std 8 326 6.34 83.92 09-std 9 224 4.35 88.28 10-std 1 344 6.69 94.97 11-std 7 10 0.19 95.16 12-std 1 29 0.56 95.72 13-std 1 6 0.12 95.84 14-std 1 54 1.05 96.89 15-std 1 19 0.37 97.26 16-compl 35 0.68 97.94 17-crech 68 1.32 99.26 18-pre-p 35 0.68 99.94 19-other 3 0.06 100.00 Total 5144 100.00
As discussed earlier under the "number of observations" section, there are codes in this data set for validly missing responses. As well, there are three variable codes, past completing of college that do not make this variable continuous. So to recode education correctly, click here. Once the previous two steps have been completed, we need to generate two dummy variables representing male and female.
tab gender_n, gen(gender)
reg p_netpay educ_ne gender2
Source | SS df MS Number of obs = 609 ---------+------------------------------ F( 2, 606) = 101.19 Model | 244571399 2 122285700 Prob > F = 0.0000 Residual | 732355621 606 1208507.63 R-squared = 0.2503 ---------+------------------------------ Adj R-squared = 0.2479 Total | 976927021 608 1606787.86 Root MSE = 1099.3 ------------------------------------------------------------------------------ p_netpay | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- educ_ne | 153.9233 11.2919 13.631 0.000 131.7473 176.0993 gender2 | 368.4642 90.34464 4.078 0.000 191.0376 545.8908 _cons | -323.6353 99.05085 -3.267 0.001 -518.1599 -129.1108 ------------------------------------------------------------------------------
As you can see, being male in South Africa has an additive effect to earning net pay, as does education. Controlling for education, a male will earn 368 Rand more per month in net pay than his female counter part. In other words, if a woman and a man had the same amount of education, the women would then have to go through more than two years of education to earn at least the same amount as the man.
Now if we believe that there is an interaction effect, we must create the variable of education times being a male in South Africa. To do this, type:
gen maleint=gender2*educ_ne
Then compute the regression.
reg p_netpay educ_ne gender2 maleint
Source | SS df MS Number of obs = 609 ---------+------------------------------ F( 3, 605) = 75.22 Model | 265397786 3 88465928.6 Prob > F = 0.0000 Residual | 711529235 605 1176081.38 R-squared = 0.2717 ---------+------------------------------ Adj R-squared = 0.2681 Total | 976927021 608 1606787.86 Root MSE = 1084.5 ------------------------------------------------------------------------------ p_netpay | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- educ_ne | 94.64908 17.95806 5.271 0.000 59.38138 129.9168 gender2 | -237.8644 169.4218 -1.404 0.161 -570.5905 94.86179 maleint | 96.34509 22.89504 4.208 0.000 51.3817 141.3085 _cons | 49.51226 131.9498 0.375 0.708 -209.6231 308.6476 ------------------------------------------------------------------------------
So we can see here that once the interaction effect is inserted both being male and the constant term become insignificant, meaning that the interaction effect does not work for our hypothesis. There is only an additive effect of being male in South Africa for net pay, not an interactive effect.