Before running a regression, we want to see how STATA will code the languages.
tab lang_cod
21 |
:language |
code | Freq. Percent Cum.
------------+-----------------------------------
01-engli | 105 10.61 10.61
02-afrik | 132 13.33 23.94
03-xhosa | 178 17.98 41.92
04-zulu | 203 20.51 62.42
05-tswan | 110 11.11 73.54
06-north | 99 10.00 83.54
07-south | 64 6.46 90.00
08-venda | 14 1.41 91.41
09-shang | 37 3.74 95.15
10-swazi | 26 2.63 97.78
11-ndebe | 15 1.52 99.29
12-other | 7 0.71 100.00
------------+-----------------------------------
Total | 990 100.00
We now know that STATA will code English as 1, Afrikaans as 2, Xhosa as 3, etc. Now we can run the regression with totmexp as the dependent variable and totminc and lang_cod as the independent variables. With a dozen categories within lang_cod, it especially makes sense to use the xi option in running this regression.
xi: regress totmexp totminc i.lang_cod
i.lang_cod Ilang_1-12 (naturally coded; Ilang_1 omitted) Source | SS df MS Number of obs = 956 ---------+------------------------------ F( 12, 943) = 74.03 Model | 1.3469e+09 12 112241344 Prob > F = 0.0000 Residual | 1.4298e+09 943 1516219.62 R-squared = 0.4851 ---------+------------------------------ Adj R-squared = 0.4785 Total | 2.7767e+09 955 2907530.09 Root MSE = 1231.3 ------------------------------------------------------------------------------ totmexp | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- totminc | .1274279 .0090614 14.063 0.000 .1096451 .1452107 Ilang_2 | -939.8534 170.0297 -5.528 0.000 -1273.534 -606.1729 Ilang_3 | -2666.121 164.0949 -16.247 0.000 -2988.155 -2344.088 Ilang_4 | -2257.174 159.787 -14.126 0.000 -2570.753 -1943.595 Ilang_5 | -2345.244 178.4947 -13.139 0.000 -2695.537 -1994.951 Ilang_6 | -2502.428 182.6328 -13.702 0.000 -2860.842 -2144.015 Ilang_7 | -2556.602 205.0399 -12.469 0.000 -2958.989 -2154.215 Ilang_8 | -2465.706 355.1803 -6.942 0.000 -3162.741 -1768.671 Ilang_9 | -2509.393 243.0882 -10.323 0.000 -2986.45 -2032.337 Ilang_10 | -2208.885 280.332 -7.880 0.000 -2759.032 -1658.738 Ilang_11 | -2395.913 344.7786 -6.949 0.000 -3072.535 -1719.291 Ilang_12 | -823.2165 519.294 -1.585 0.113 -1842.322 195.8891 _cons | 3307.052 136.4961 24.228 0.000 3039.181 3574.923 ------------------------------------------------------------------------------
Notice that English (Ilang_1) was dropped from the regression; thus, it is the category to which households speaking all other languages are compared. Being that all of the coefficients for the other languages are negative, it should be obvious that English speaking households having the highest average total monthly expenditure. Households speaking the language with the "highest" negative coefficient thus have the lowest total monthly expenditure on average. This is Xhosa, with a coefficient of -2666.12. The coefficient for Xhosa represents the average expenditure difference between English and Xhosa-speaking households at all levels of income.