The variables we need to answer this question have already been used in this module so there is probably no need to look them up. We will use incmon and satisfie. Answering this question proceeds in steps. The first step is to figure out the median level of household income. To do this you could type:
sum incmon, detail
monthly gross pay
-------------------------------------------------------------
Percentiles Smallest
1% 0 0
5% 8 0
10% 120 0 Obs 636
25% 320 0 Sum of Wgt. 636
50% 879.5 Mean 1585.805
Largest Std. Dev. 2106.602
75% 1950 12000
90% 3939 15000 Variance 4437771
95% 6000 15000 Skewness 2.976268
99% 10000 16400 Kurtosis 14.93181
to learn that the median level of income is 879.5 Rand.
Then we can investigate the mean of satisfie. However, we have to be careful
that satisfie is household level variable. Thus we need to create new variable
that shows only one value per household to eliminate the household size bias.
We can create the new variable using [_n]:
gen satisfi2=satisfie if hhid[_n] ~= hhid[_n-1]
Now, let's find out what the means are.
means satisfi2 if incmon >= 879.5 & incmon ~= .
Variable | Type Obs Mean [95% Conf. Interval]
---------+----------------------------------------------------------
satisfi2 | Arithmetic 266 2.879699 2.725533 3.033866
| Geometric 266 2.586001 2.440213 2.740498
| Harmonic 266 2.294422 2.157769 2.449554
---------+----------------------------------------------------------
means satisfi2 if incmon < 879.5 & incmon ~= .
Variable | Type Obs Mean [95% Conf. Interval]
---------+----------------------------------------------------------
satisfi2 | Arithmetic 234 3.222222 3.037661 3.406783
| Geometric 228 3.035702 2.85658 3.226055
| Harmonic 228 2.686567 2.499823 2.903465
---------+----------------------------------------------------------
We see that the average level of satisfaction among the top half of group is 2.87 while that of the bottom half is 3.22. What does these number imply? For example, tabulate will show values. It seems like the richer half are more satisfied.
Lastly, it would be natural to first type:
means satisfie if incmon >= 879.5
means satisfie if incmon < 879.5
If you did this, STATA will count missing observations as being above the median level, and the result would be biased by the household sizes. It is always a good idea to use list and check if a variable we are using is household level or individual level.