QUESTION 6 - ANSWER

In figuring out what predicts someone's net pay, is there an interaction effect between education and gender? Compute a regression where education and gender are the independent variables explaining someone's net pay.

The first part to this answer is that so far in modules 6 and 7, we have only been examining household variables, thus why we have included the term "keep if rel_head==1". Netpay is an individual level variable in that it takes on a unique value for every person in the data set. We can see this by sorting by hhid, and listing the two variables simultaneously.

sort hhid,
list hhid p_netpay

hhid   p_netpay
   1.      1006       1036
   2.      1008          .
   3.      1012        330
   4.      2001          .
   5.      2001          .
   6.      2001          .
   7.      2001          .
   8.      2001          .
   9.      2001          .
  10.      2001          .
  11.      2001        450
  12.      2008          .
  13.      2008          .
  14.      2008          .
  15.      2008          .
  16.      2012          .
  17.      2012        192
  18.      2014          .
  19.      2014        350
  20.      2014          .
  21.      2014        800
  22.      2014          .
  23.      2025          .
  24.      2025          .
  25.      2025          .
--more--

This is just a small example of the entire list, but you can clearly see different net pay amounts for different people within a household. So, if you have been working with this data set under the conditions that STATA is only computing the responses of the heads of households, it is now time to clear the data and reopen the data set so that we can examine all the individuals in it.

There are two data cleaning procedures that must occur prior to running a regression on this question. The first is re-coding the education variable so that it more closely fits our version of a continuous variable.

tab educ_c

6
:education
code       	Freq.	Percent	Cum.

-4         	 26	0.51	0.51
-3         	  2	0.04	0.54
00-none        1300	25.27	25.82
01-sub a         663	12.89	38.71
02-std 2         293	5.70	44.40
03-std 3         315	6.12	50.52
04-std 4         331	6.43	56.96
05-std 5         374	7.27	64.23
06-std 6         435	8.46	72.69
07-std 7         252	4.90	77.59
08-std 8         326	6.34	83.92
09-std 9         224	4.35	88.28
10-std 1         344	6.69	94.97
11-std 7          10	0.19	95.16
12-std 1          29	0.56	95.72
13-std 1           6	0.12	95.84
14-std 1          54	1.05	96.89
15-std 1          19	0.37	97.26
16-compl          35	0.68	97.94
17-crech          68	1.32	99.26
18-pre-p          35	0.68	99.94
19-other           3	0.06	100.00

Total        5144	100.00

As discussed earlier under the "number of observations" section, there are codes in this data set for validly missing responses. As well, there are three variable codes, past completing of college that do not make this variable continuous. So to recode education correctly, click here. Once the previous two steps have been completed, we need to generate two dummy variables representing male and female.

tab gender_n, gen(gender)

reg p_netpay educ_ne gender2

Source   |       SS       df       MS                  Number of obs =     609
---------+------------------------------               F(  2,   606) =  101.19
   Model |   244571399     2   122285700               Prob > F      =  0.0000
Residual |   732355621   606  1208507.63               R-squared     =  0.2503
---------+------------------------------               Adj R-squared =  0.2479
   Total |   976927021   608  1606787.86               Root MSE      =  1099.3
------------------------------------------------------------------------------
p_netpay |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
 educ_ne |   153.9233    11.2919     13.631   0.000       131.7473    176.0993
 gender2 |   368.4642   90.34464      4.078   0.000       191.0376    545.8908
   _cons |  -323.6353   99.05085     -3.267   0.001      -518.1599   -129.1108
------------------------------------------------------------------------------

As you can see, being male in South Africa has an additive effect to earning net pay, as does education. Controlling for education, a male will earn 368 Rand more per month in net pay than his female counter part. In other words, if a woman and a man had the same amount of education, the women would then have to go through more than two years of education to earn at least the same amount as the man.

Now if we believe that there is an interaction effect, we must create the variable of education times being a male in South Africa. To do this, type:

gen maleint=gender2*educ_ne

Then compute the regression.

reg p_netpay educ_ne gender2 maleint

  Source |       SS       df       MS                  Number of obs =     609
---------+------------------------------               F(  3,   605) =   75.22
   Model |   265397786     3  88465928.6               Prob > F      =  0.0000
Residual |   711529235   605  1176081.38               R-squared     =  0.2717
---------+------------------------------               Adj R-squared =  0.2681
   Total |   976927021   608  1606787.86               Root MSE      =  1084.5
------------------------------------------------------------------------------
p_netpay |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
 educ_ne |   94.64908   17.95806      5.271   0.000       59.38138    129.9168
 gender2 |  -237.8644   169.4218     -1.404   0.161      -570.5905    94.86179
 maleint |   96.34509   22.89504      4.208   0.000        51.3817    141.3085
   _cons |   49.51226   131.9498      0.375   0.708      -209.6231    308.6476
------------------------------------------------------------------------------

So we can see here that once the interaction effect is inserted both being male and the constant term become insignificant, meaning that the interaction effect does not work for our hypothesis. There is only an additive effect of being male in South Africa for net pay, not an interactive effect.

Back to Questions