To get the answer to this question, we must first clean up our data (just like we did in Exercise 1), and create four dummy variables for the values encoded for the variable race. To create the dummy variables, enter the following in the Stata Command window:
tab race, gen(race)
This will generate four dummy variables for race: race1, race2, race3, and race4.
To answer our question, enter the following in the Stata Command window: (Notice that we leave out the variable race2 because this variable represents the value "Coloured", against which we are comparing the household size of the Indian population group.)
reg hhsizem race1 race3 race4
This results in the following table:
Source | SS df MS Number of obs = 4196 ---------+------------------------------ F( 3, 4192) = 82.43 Model | 2147.71946 3 715.906488 Prob > F = 0.0000 Residual | 36406.2624 4192 8.68470001 R-squared = 0.0557 ---------+------------------------------ Adj R-squared = 0.0550 Total | 38553.9819 4195 9.19046052 Root MSE = 2.947 ------------------------------------------------------------------------------ hhsizem | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- race1 | .107201 .1682205 0.637 0.524 -.2226003 .4370024 race3 | -.3357978 .2943267 -1.141 0.254 -.912834 .2412385 race4 | -1.892201 .1973674 -9.587 0.000 -2.279146 -1.505257 _cons | 4.800587 .159588 30.081 0.000 4.487709 5.113464 ------------------------------------------------------------------------------
We can see from the table that the effect on household size of being Indian relative to being Coloured is a decrease of .33 persons.