Module 1: Introduction to Surveys
Module 2: Getting Started with STATA
Module 3: Understanding Distributions
Module 4: Measures of Central Tendency
Module 5: Bivariate Analysis
Module 6: Simple Regression Analysis
Module 7: Multiple Regression Analysis
Module 8: Discrete Outcome Analysis
Graphing with STATA 8

BINARY DEPENDENT VARIABLE ANALYSIS

 

TABLE OF CONTENTS

Introduction
Perceived quality of life, expectations of new government
Recoding: preparation of data for analysis
Exploring and describing the data
Statistical Analysis
Interpretation of Results
Conclusion
Further sources of information

 

 

 

 

 

 

 

INTRODUCTION

In module 7 we learned how to compute and interpret multiple (linear) regression, a technique which relies on the method of ordinary least squares (OLS). (Unless you are particularly interested, you should not concern yourself too much with understanding how OLS is actually computed, but you should know that it is the computational technique underlying multiple regression.) For linear regression, recall that we used the “reg” command, followed by the dependent variable, and finally the independent variables, to estimate linear regression models of the form:

Y = a + b1X1 + b2X2 + ... + bkXk

Linear regression is the appropriate model for many situations. One time it is not appropriate is when the dependent variable is categorical, instead of continuous (See Module 3: Understanding Distributions). In some cases a category of statistical analysis techniques, commonly known as maximum likelihood estimation techniques, such as logit, probit, ordered logit, multinomial logit and nested logit, can be turned to. Recall that one type of categorical variable is a dichotomous, or “dummy,” variable, which can take on two values, usually, but arbitrarily labeled 0 and 1. For example, “victim of crime” might be coded 1 if the person has been a victim of crime, and 0, if not. Sometimes, there are more than two categories of a variable; for example the metro variable takes on three values, where rural=1, metro=2, and urban=3. The values assigned to these categories are arbitrary; they simply indicate that rural, metro, and urban are different in terms of area. Another example of a categorical variable is one indicating how satisfied a person is with their quality of life. This one goes from 1-5, with 1 being most dissatisified to 5 being most satisfied. This is an example of an ordered categorical variable.

In this module, we present a simple theoretical question and use SALDRU data and ordered logit to explore the issue of perceived quality of life. This module is not intended to teach the mathematics behind use of maximum likelihood estimation, but more to highlight further means of statistical analysis that are available and that STATA can do. At the end of the module we point you towards several sources of further information, quite a few which are available via the internet.

PERCEIVED QUALITY OF LIFE

Of particular interest are questions relating to satisfaction with life assessment and expectations of new government in 1994. Found in Section 9, these questions contribute to our understanding of a household’s quality of life assessment.

The first of these two questions is as follows:
"Taking everything into account, how satisfied is this household with the way it lives these days?"

There are five possible responses:

The second question is:
"Suppose we get a new government. Do you think the situation for your household will get better, stay the same, or get worse?"

In trying to explore household quality of life, we have to think about what might influence how satisfied a household with life? How about expectations of a new government? This is an important step in analyzing household quality of life because we have to think theoretically about all the possible factors which could impact quality of life and then decide which are the most important to consider in this particular analysis. In general, the factors we thought could best explain satisfaction with life we also thought could provide in-depth insight into expectations of new government. The explanatory variables we look at include: -- education -- income -- employment status -- occupation -- population group (included as a control variable) -- province (included as a control variable) In the table below the expected direction of the relationship (i.e. positive effect/+, negative effect/-, unclear/+ or - is laid out for each variable listed above. In a more formal presentation these might be presented as hypotheses in addition to, or instead of, a table. At the same time it would be necessary to justify the expected direction of the relationship. Do you think these expected directions are correct? Which ones would you change? Why? Variable Expected Direction of the Relationship Satisfaction with Life Expectations with New Government Education + or - + or - Income + + or - Employment Status (1=employed) + + or - Occupation_a (1=professional) + - Occupation_b (1=labourer) - + Population Group Black African Coloured Indian White African - - - + + - - - Province Without having prior knowledge about each province it is difficult to think about the expected direction of this relationship. If you have more knowledge, you might want to keep track of these expectations and see if they are confirmed with the statistical analysis. One final variable to consider is whether or not a household selected either “political settlement” or “peace—cessation of violence” as something the government could do to help improve the households living conditions. This is included on the possibility that those households who would like to see either one of these events occur would be more likely to think things will get better with a new government. At this point it would be great to go right into the statistical analysis. However, before we can jump right into analysis, we must first prepare the data for the kind of analysis we will be doing here. Since the two dependent variables are household level, we will have to make sure that all of our independent variables are household level or manipulated in some way to be household level. RECODING: PREPARATION OF DATA FOR ANALYSIS In the following, all the variables that are used in the analysis are recoded to be either household level and one observation per household (i.e. _n) or are just recoded to be one observation per household. In the case of individual-level variables (e.g. education) the value for the head of household was taken as being the value for the household. A few other variables have been recoded to better mirror the theoretical construct laid out above (e.g. occupation). Dependent Variables Steps Code New Government new_govt 1. Look for negative values 2. Create variable without negative values 3. Create dummy variables 3. Generate single observation per household codebook new_govt [there are no negative values] gen new_gov2 = new_govt if new_govt > 0 tab new_gov2, gen(gov_opinion) sort hhid gen new_gov3 = new_govt2 if hhid~=hhid[_n-1] Satisfaction satisfie 1. Look for negative values 2. Create variable without negative values 3. Create dummy variables 4. Generate single observation per household codebook satisfie gen satisfie2 = satisfie if satisfie >=0 tab satisfie2, gen(sat_opinion) sort hhid gen satisfie3 = satisfie2 if hhid~=hhid[_n-1] Independent Variables Steps Code Education educ_c 1. Look for negative values 2. Create variable without negative values and create continuous variable 3. Create household level variable using head of household’s education 4. Generate single observation Per household codebook educ_c generate educ_new=educ_c replace educ_new=0 if educ_c==17 replace educ_new=0 if educ_c==18 replace educ_new=. if educ_c==-1 replace educ_new=. if educ_c==-3 replace educ_new=. if educ_c==-4 replace educ_new=. if educ_c==19 replace educ_new=9 if educ_c==11 replace educ_new=12 if educ_c==12 replace educ_new=12 if educ_c==13 replace educ_new=12 if educ_c==14 replace educ_new=12 if educ_c==15 sort educ_new gen educ1=educ_new if rel_head==1 egen hheduc=max(educ1), by(hhid) sort hhid gen hheduc2 = hheduc if hhid~=hhid[_n-1] Income totminc 1. Look for negative values 2. Generate a categorical variable containing 10% of the observations in each category; generate single observation per household; create variable without negative values codebook totminc sort hhid xtile totminc2 = totminc if hhid~=hhid[_n-1] & totminc > 0, nq(10) Race race 1. Look for negative values 2. Create a household level variable 3. Generate dummy variables for each category codebook race sort hhid egen race2 = max(race), by(hhid) gen hhrace = race2 if hhid~=hhid[_n-1] tab hhrace, gen(hhrace) Occupation k_occ_c 1. Look for negative and strange values 2. Create variable without negative and strange values 3. Create household level variable using head of household’s occupation 4. Create a dummy variable professionals. 5. Create a dummy variable labourers. codebook k_occ_c tab k_occ_c gen occup2=k_occ_c if k_occ_c >0 & k_occ_c<17282 sort occup2 gen occup3=occup2 if rel_head==1 egen hhoccup=max(occup3), by(hhid) sort hhid gen hhoccup2 = hhoccup if hhid~=hhid[_n-1] tab hhoccup2, gen(hhoccup2) gen hhoccup3=hhoccup2 if hhoccup2==. replace hhoccup3=1 if hhoccup2==1 | hhoccup2==2 replace hhoccup3=0 if hhoccup2==3 replace hhoccup3=0 if hhoccup2==4 replace hhoccup3=0 if hhoccup2==5 replace hhoccup3=0 if hhoccup2==6 replace hhoccup3=0 if hhoccup2==7 replace hhoccup3=0 if hhoccup2==8 replace hhoccup3=0 if hhoccup2==9 replace hhoccup3=0 if hhoccup2==10 replace hhoccup3=0 if hhoccup2==11 gen hhoccup3_2=hhoccup2 if hhoccup2==. replace hhoccup3_2 =1 if hhoccup2==10 replace hhoccup3_2 =0 if hhoccup2==1 replace hhoccup3_2 =0 if hhoccup2==2 replace hhoccup3_2 =0 if hhoccup2==3 replace hhoccup3_2=0 if hhoccup2==4 replace hhoccup3_2=0 if hhoccup2==5 replace hhoccup3_2=0 if hhoccup2==6 replace hhoccup3_2=0 if hhoccup2==7 replace hhoccup3_2=0 if hhoccup2==8 replace hhoccup3_2=0 if hhoccup2==9 replace hhoccup3_2=0 if hhoccup2==11 Employment unempl_q 1. Look for negative values 2. Create household level variable using head of household’s employment status; create variable without negative and values codebook unempl_q gen unempl_new = unempl_q if rel_head==1 & unempl_q > 0 sort hhid egen hhunempl=max(unempl_new), by(hhid) gen hhunempl2 = hhunempl if hhid~=hhid[_n-1] Crime crime_q 1. Look for negative values 2. Create a unique value for each household; create variable without negative values codebook crime_q sort hhid gen crime_new = crime_q if crime_q > 0 & hhid~=hhid[_n-1] Choice choice1_ - choice3_ 1. Look for negative values 2. Create a variable without negative values; create a dummy variable for choice of political settlement 3. Create a variable without negative values; create a dummy variable for choice of peace 4. Create one unique value per household for political settlement 5. Create one unique value per household for peace codebook choice1_ codebook choice2_ codebook choice3_ gen new_political = 0 replace new_political =1 if choice1_==18 | choice2_==18 | choice3_==18 gen new_peace=0 replace new_peace =1 if choice1_==17 | choice2_==17 | choice3_==17 sort hhid gen new_political2 = new_political if hhid~=hhid[_n-1] gen new_peace2 = new_peace if hhid~=hhid[_n-1] Province province 1. Look for negative values 2. Create a household level variable 3. Generate dummy variables for each category codebook province sort hhid egen province2 = max(province), by(hhid) gen hhprovince = province2 if hhid~=hhid[_n-1] tab hhprovince, gen(hhprovince) EXPLORING AND DESCRIBING THE DATA Having recoded the data to fit our statistical analysis needs, we can begin with some simple exploration and description of the data using graphs, cross-tabs, and other descriptive techniques. At any point before and during your analysis, you may obtain simple descriptive statistics for the distribution and central tendency of the recoded variables. (See Module 5 for review of these techniques.) For example, you might be interested in knowing about the occupation variable. How many respondants are in the new professional category you created? In the laborer category? How about by race? The following command tells us that laborers are disproportionately black Africans, coloureds, and Indians, although the highest proportion of laborers in any race is in the African race. For example, there are 1,599 respondents who are both black Africans and laborers. You will see in the key that the second number in the chart is a column percentage. This means, for example, that 91% of black Africans are laborers. . tab hhoccup3 race, col

+-------------------+
| Key               |
|-------------------|
|     frequency     |
| column percentage |
+-------------------+

           |            19 :population group
  hhoccup3 |  01-afric   02-colou   03-india   04-white |     Total
-----------+--------------------------------------------+----------
         0 |     1,599        317         94        285 |     2,295 
           |     90.85      88.80      72.87      48.47 |     80.98 
-----------+--------------------------------------------+----------
         1 |       161         40         35        303 |       539 
           |      9.15      11.20      27.13      51.53 |     19.02 
-----------+--------------------------------------------+----------
     Total |     1,760        357        129        588 |     2,834 
           |    100.00     100.00     100.00     100.00 |    100.00 
As we would expect, whites and Indians are disproportionately employed as professionals. The relationship is not perfect, however, given some limitations with the coding of the occupation categories. That is, some response categories could include both professionals or laborers, so the variable cannot be coded perfectly. Notice, for example, that 99.8% of whites, 99.2% of Indians, 72.2% of coloureds, and 68.2% of black Africans are managers or professionals. . tab hhoccup3_2 race, col
+-------------------+
| Key               |
|-------------------|
|     frequency     |
| column percentage |
+-------------------+

           |            19 :population group
hhoccup3_2 |  01-afric   02-colou   03-india   04-white |     Total
-----------+--------------------------------------------+----------
         0 |     1,201        258        128        587 |     2,174 
           |     68.24      72.27      99.22      99.83 |     76.71 
-----------+--------------------------------------------+----------
         1 |       559         99          1          1 |       660 
           |     31.76      27.73       0.78       0.17 |     23.29 
-----------+--------------------------------------------+----------
     Total |     1,760        357        129        588 |     2,834 
           |    100.00     100.00     100.00     100.00 |    100.00 
Now, use simple descriptive techniques to explore the relationship between three key independent variables—race, income, and education—and the dependent variables: new government and level of satisfaction. First, we see by using the simple sum command that the new government variable has a mean of approximately 1.7, meaning that about half the households believe that things will get better under a new government and half believe that it will get worse. It is more likely that a respondent will believe things will get worse because the mean is less than 2. . sum new_gov3
    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
    new_gov3 |      8117    1.679808    .8493482          1          3
Knowing this information is not very helpful, however, unless we understand whether expectations of conditions under a new government vary by race, income, and education (and potentially by other variables of interest to you). First, we would expect to find a relationship between expectations under the new government and race. . tab new_gov3 race, col chi2;
+-------------------+
| Key               |
|-------------------|
|     frequency     |
| column percentage |
+-------------------+

           |            19 :population group
  new_gov3 |  01-afric   02-colou   03-india   04-white |     Total
-----------+--------------------------------------------+----------
         1 |     4,024        165         71         78 |     4,338 
           |     69.70      39.01      34.30       7.50 |     58.28 
-----------+--------------------------------------------+----------
         2 |       832        101         36        313 |     1,282 
           |     14.41      23.88      17.39      30.10 |     17.22 
-----------+--------------------------------------------+----------
         3 |       917        157        100        649 |     1,823 
           |     15.88      37.12      48.31      62.40 |     24.49 
-----------+--------------------------------------------+----------
     Total |     5,773        423        207      1,040 |     7,443 
           |    100.00     100.00     100.00     100.00 |    100.00 

          Pearson chi2(6) =  1.6e+03   Pr = 0.000
The Pearon chi-squared tells us that the races are significantly different in their average expectation of conditions under the new govenrment. Most black Africans (about 70%) believe that conditions will get better, whereas most whites (about 62%) believe things will get worse. White Africans are most likely to say that things will stay the same. Now consider the relationship between (a) total monthly income and (b) education level and expectations under the new government by race. . tab new_gov3 race, sum(hheduc2);
           Means, Standard Deviations and Frequencies of hheduc2

           |          19 :population group
  new_gov3 |  01-afric   02-colou   03-india   04-white |     Total
-----------+--------------------------------------------+----------
         1 | 4.0898204  5.5151515  9.2631579  10.513514 | 4.3942766
           | 3.4342614  3.1036391   3.885044  3.8486767 | 3.6639288
           |      1169         33         19         37 |      1258
-----------+--------------------------------------------+----------
         2 | 3.2932862    6.53125        5.6  10.614286 | 5.7698925
           | 3.1310939  3.3695446  3.3399933  3.3876324 | 4.6079903
           |       283         32         10        140 |       465
-----------+--------------------------------------------+----------
         3 | 4.2846442  6.6428571      8.375  9.9930314 | 7.2451613
           | 3.6399863  3.1064169  2.5162515  3.2061927 | 4.3212544
           |       267         42         24        287 |       620
-----------+--------------------------------------------+----------
     Total | 3.9889471  6.2616822  8.1698113  10.221983 | 5.4216816
           | 3.4320945  3.1959185  3.4179157  3.3215494 | 4.2212194
           |      1719        107         53        464 |      2343
There does not seem to be a strong relationship between total monthly income or education level and expectations under the new government by race. However, the differential in income level and education level by race is evident. tab new_gov3 race, sum(totminc2);
  Means, Standard Deviations and Frequencies of 10 quantiles of totminc 

           |          19 :population group
  new_gov3 |  01-afric   02-colou   03-india   04-white |     Total
-----------+--------------------------------------------+----------
         1 | 4.2902967  5.7777778          8  8.5789474 | 4.5018671
           | 2.2419667  2.1792674  2.4009802  2.0352512 | 2.3906184
           |      1247         36         18         38 |      1339
-----------+--------------------------------------------+----------
         2 | 3.9411765  6.7586207        6.9   8.610687 | 5.5163399
           | 2.2653795  2.5022158  1.7919573  1.7032662 | 2.9900225
           |       289         29         10        131 |       459
-----------+--------------------------------------------+----------
         3 | 4.3992933  6.3658537  7.6363636  8.5475285 | 6.4400657
           |  2.540369  2.6434456  2.0597146  1.9547904 | 3.0262183
           |       283         41         22        263 |       609
-----------+--------------------------------------------+----------
     Total | 4.2517867  6.2735849       7.62  8.5694444 | 5.1857084
           | 2.2975428  2.4631881  2.1370397  1.8852933 | 2.8060015
           |      1819        106         50        432 |      2407
Now let us consider the same analysis for satisfaction instead of new government. Now it is your turn to interpret these descriptive statistics. . sum satisfie3
    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
   satisfie3 |      8763    3.384229    1.299803          1          5
tab satisfie3 race, col chi2;
+-------------------+
| Key               |
|-------------------|
|     frequency     |
| column percentage |
+-------------------+

           |            19 :population group
 satisfie3 |  01-afric   02-colou   03-india   04-white |     Total
-----------+--------------------------------------------+----------
         1 |       264         44         30        268 |       606 
           |      4.40       6.74      12.24      22.96 |      7.51 
-----------+--------------------------------------------+----------
         2 |     1,104        250        122        615 |     2,091 
           |     18.39      38.28      49.80      52.70 |     25.92 
-----------+--------------------------------------------+----------
         3 |       530         83         19        109 |       741 
           |      8.83      12.71       7.76       9.34 |      9.19 
-----------+--------------------------------------------+----------
         4 |     2,400        160         58        119 |     2,737 
           |     39.99      24.50      23.67      10.20 |     33.93 
-----------+--------------------------------------------+----------
         5 |     1,704        116         16         56 |     1,892 
           |     28.39      17.76       6.53       4.80 |     23.45 
-----------+--------------------------------------------+----------
     Total |     6,002        653        245      1,167 |     8,067 
           |    100.00     100.00     100.00     100.00 |    100.00 

         Pearson chi2(12) =  1.6e+03   Pr = 0.000
. tab satisfie3 race, sum(hheduc2);
           Means, Standard Deviations and Frequencies of hheduc2

           |          19 :population group
 satisfie3 |  01-afric   02-colou   03-india   04-white |     Total
-----------+--------------------------------------------+----------
         1 | 3.7007874          8     11.125  10.870229 | 7.4802867
           | 3.2325599  4.1231056  3.4408263  3.6698291 | 4.9397964
           |       127         13          8        131 |       279
-----------+--------------------------------------------+----------
         2 | 4.2582418  5.7846154  8.1666667  10.071161 | 6.6942149
           | 3.6400456     3.1893  3.5241368  3.3423957 | 4.4125395
           |       364         65         30        267 |       726
-----------+--------------------------------------------+----------
         3 | 4.0903955      5.625          9  9.7631579 | 5.2098765
           | 3.5822746  2.8255434  2.9439203  3.4596845 | 4.0565712
           |       177         24          4         38 |       243
-----------+--------------------------------------------+----------
         4 | 3.9778157  6.1904762  6.8571429  9.5185185 | 4.6216216
           | 3.4553839  2.9403675  3.2293299  2.3208712 | 3.6940222
           |       586         42         21         54 |       703
-----------+--------------------------------------------+----------
         5 | 3.8342967  4.8695652          6  9.0322581 | 4.1634783
           | 3.2178764  2.7847694          0  2.7139623 | 3.3818106
           |       519         23          2         31 |       575
-----------+--------------------------------------------+----------
     Total | 3.9847716  5.9101796  8.0923077  10.130518 | 5.4853523
           | 3.4248287  3.1543169  3.5254569  3.3417881 | 4.2096051
           |      1773        167         65        521 |      2526
tab satisfie3 race, sum(totminc2);
  Means, Standard Deviations and Frequencies of 10 quantiles of totminc 

           |          19 :population group
 satisfie3 |  01-afric   02-colou   03-india   04-white |     Total
-----------+--------------------------------------------+----------
         1 | 4.5877863  7.5454545      7.875  9.0172414 | 6.7406015
           | 2.0451452  2.0670576  1.5526475  1.6150529 | 2.8344921
           |       131         11          8        116 |       266
-----------+--------------------------------------------+----------
         2 | 4.4607595  6.9508197  7.9655172  8.4409449 | 6.1718539
           | 2.2977266  2.2017132  2.2277913  1.9526147 | 2.8677193
           |       395         61         29        254 |       739
-----------+--------------------------------------------+----------
         3 | 4.3645833  6.1304348          8  8.8285714 | 5.1968504
           | 2.3339955  2.5814786  2.7080128  1.9171933 | 2.7976257
           |       192         23          4         35 |       254
-----------+--------------------------------------------+----------
         4 | 4.1203852  5.7804878          7  8.4791667 | 4.5745554
           | 2.3386763  2.6973338  2.4037009  1.5435464 | 2.6015503
           |       623         41         19         48 |       731
-----------+--------------------------------------------+----------
         5 | 4.1524164  5.7272727          8        7.5 | 4.3830508
           | 2.3236705  2.0972049  2.8284271  2.2852182 | 2.4402644
           |       538         22          2         28 |       590
-----------+--------------------------------------------+----------
     Total | 4.2586482  6.3987342  7.6612903  8.5571726 | 5.2728682
           | 2.3094353  2.4258162  2.2246278   1.884519 | 2.8277348
           |      1879        158         62        481 |      2580
In addition to descriptive statistics, graphs are also a very nice way to assess the distribution of your data. Note that all of the graphs use variables that have already been recoded above for the household size, if this were not the case they would have to be recoded appropriate to avoid number bias (see Module 3). As you look through the graphs, make sure to ask yourself if the distribution seems realistic and why, or if not realistic, why not. This first two graphs represent the distribution of the “new government” and “satisfaction with life” variable across the different population groups. From the first graph above we can see that most Black African households expect the situation for the household to get better with a new government, whilst most White African households expect it to get worse. Coloured households appear to be split almost evenly between thinking it will get better or get worse, with the smallest percentage thinking things will stay the same. Similar to White African households, Indian households for the most part think the situation will get worse. However, unlike White Africans, a substantial portion of Indians also think the situation will get better. In the second graph we are looking at “satisfaction with life” and it appears that in general most Indian households and White African households are satisfied with life, while the majority of Black African households are not satisfied. Amongst Coloured households there seems to be a split between those who are satisfied and those who dissatisfied. The third and fourth graphs portray the main variables of interest by population group. These graphs are similar to that above, but are more concise. In the third graph above we can see the stark trend noted above occurring as the population group changes from Black African to White African. Almost an equal percentage of Indian households and White African households think the situation will get worse with a new government, whereas the majority of Black African households expect the situation to get better. It might be expected that a very small percentage of White African households think the situation will get better, but what might explain the split between and amongst Coloured and Indian households that a new government will make things better? As already seen in the second graph above, a convincing majority of White African households were satisfied with life (about 63%), followed by Indians (at about 62%), and then finally Coloured households (about 52%). Only 22% of Black African households were satisfied with life. Some people might be surprised to find even a number this high! Can you think of a way to explain this 22%? The fifth and sixth graphs pictured below take our main dependent variables of interest across total monthly income level, as recoded in the section above. There is perhaps nothing too surprising about these graphs, but it is still instructive to plot and look over them. The first of these graphs shows that the majority of households expect a new government to be a positive step for the household, while the wealthiest households overwhelmingly expect a new government to make things worse for the household. Does it seem logical to you that the wealthier households would expect this? Is there an underlying systematic factor that could explain this? The second graph confirms what we expected, that those with less income tend to be less satisfied, and as income increases more households are satisfied. The majority of households, except for those in top 20% of the total monthly income brackets, are not satisfied at all, in fact a very substantial percentage are very dissatisfied! For those households in the top income percentile, what kinds of households possibly comprise the approximately 26% that is dissatisfied? The two final graphs below representing the distribution of “satisfaction with life” and “new government” are across household education level. The expected direction of the relationship between education and the two main variables of interest was uncertain. Here, it is clear that those with the highest education level (i.e. Standard 10 and above, completed university degree) expect a new government to mean things will get worse for the household. Is there is a systematic underlying factor, for example population group, which might correlate with education level? Might those with a high level of education expect things to get worse, yet mean something subjectively different than those with high income or White African, Coloured and Indian households? For this graph, the same trend shows up. As education level increases, so does level of satisfaction. In particular, it is interesting to note that from Standard 7 until Standard 10 an almost equal percentage of households are satisfied or dissatisfied, but after Standard 10 by far most households are satisfied or very satisfied. Since some similar trends were noted across the different graphs above, it is instructive to look at how related are race (or population group) and income level as well as education level. These two graphs suggest the strongest underlying factor for income and education is race. This indicates that in any multi-variable statistical analysis not including race/population group would be a clear example of omitted variable bias as the impact of variables such as education and income would appear to be quite large, when in fact this effect was partially due to the underlying correlation between race and those variables. Having explored and described the data, we are now ready to run ordered logit on the models presented above. Thinking back on the expected direction of the relationship between the main dependent variables and explanatory factors, what do you expect to be statistically significant based on the descriptive statistics and graphs? Keep this in mind as you look over the ordered logit results. LOGIT ESTIMATION/PREDICTING SATISFIE AND NEW_GOVT If the dependent variables we were interested in had had two outcomes, for example employed versus not being employed, we would probably have been able to use Stata’s logit or probit commands. However, in this case, since we have more than two outcomes on the dependent variables we could not use simple logit or probit. ologit and oprobit provide maximum-likelihood ordered probit and logit. These types of models are used when the outcome variable has a natural ordering. In the case of the two dependent variables considered here, satisfaction with life and expectation of new government, both variables have more than two outcomes and are ordered, ranging from very satisfied to very dissatisfied on the one hand, and from thinking things will get better to things will get worse, on the other. Recalling the factors thought to be of theoretical importance in explaining the level of satisfaction with life and in explaining the level of expectation of new government, including population group, education, and total monthly income, below are some basic results showing which of the factors are statistically significant. Before getting to these results though it is important to note that just because a variable is not found to be statistically significant in the statistical analysis does not justify dropping it from a theory or the analysis. For example, looking at the results below, it appears that total monthly income is not statistically significant. Theoretically speaking, unless you suddenly found there to be no merit in considering total monthly income as an explanatory factor, you would present the results as is and perhaps even say something about why this variable did not turn out to be statistically significant, what further analyses you could look into, etc. ologit satisfie3 hheduc2 totminc2 hhrace2-hhrace4 crime_new new_political2 new_peace2 hhprovince2-hhprovince14 hhunempl2;
Iteration 0:   log likelihood = -2202.3011
Iteration 1:   log likelihood = -2062.9544
Iteration 2:   log likelihood = -2061.1272
Iteration 3:   log likelihood = -2061.1226

Ordered logit estimates                           Number of obs   =       1431
                                                  LR chi2(22)     =     282.36
                                                  Prob > chi2     =     0.0000
Log likelihood = -2061.1226                       Pseudo R2       =     0.0641

------------------------------------------------------------------------------
   satisfie3 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     hheduc2 |  -.0287199   .0151276    -1.90   0.058    -.0583694    .0009296
    totminc2 |  -.0335756   .0256453    -1.31   0.190    -.0838395    .0166883
     hhrace2 |  -.6126639   .2566002    -2.39   0.017    -1.115591   -.1097368
     hhrace3 |  -1.543302   .4274846    -3.61   0.000    -2.381156   -.7054473
     hhrace4 |  -1.491355   .1848257    -8.07   0.000    -1.853607   -1.129104
   crime_new |  -.3293625   .1746643    -1.89   0.059    -.6716982    .0129731
new_politi~2 |    .940123    .134101     7.01   0.000     .6772899    1.202956
  new_peace2 |  -.7171168   .1118343    -6.41   0.000    -.9363079   -.4979257
 hhprovince2 |    .041645   .2515057     0.17   0.868    -.4512972    .5345872
 hhprovince3 |  -.1904667   .1647559    -1.16   0.248    -.5133823    .1324489
 hhprovince4 |   .0173792   .2366291     0.07   0.941    -.4464052    .4811637
 hhprovince5 |  -.5310072   .2842382    -1.87   0.062    -1.088104    .0260895
 hhprovince6 |  -.2075358   .6766626    -0.31   0.759     -1.53377    1.118699
 hhprovince7 |  -.6444195   .6341802    -1.02   0.310     -1.88739    .5985508
 hhprovince8 |  -.6436745    .541271    -1.19   0.234    -1.704546    .4171972
 hhprovince9 |  -.2540455   .3205194    -0.79   0.428     -.882252    .3741611
hhprovince10 |  -2.732796   1.269794    -2.15   0.031    -5.221547   -.2440443
hhprovince11 |   -.056086   .2882927    -0.19   0.846    -.6211294    .5089573
hhprovince12 |   .0099098   .2423798     0.04   0.967    -.4651458    .4849655
hhprovince13 |  -.5975041    .610728    -0.98   0.328    -1.794509    .5995006
hhprovince14 |    -.31769   .3758058    -0.85   0.398    -1.054256    .4188758
   hhunempl2 |   .3425182    .132026     2.59   0.009      .083752    .6012845
-------------+----------------------------------------------------------------
       _cut1 |  -3.312912   .4604477          (Ancillary parameters)
       _cut2 |   -1.46462   .4523667 
       _cut3 |  -.9656568   .4512423 
       _cut4 |   .3094569   .4508268 
------------------------------------------------------------------------------
On the whole, the results confirm most of our hypothesized expectations. Interestingly the variables about whether a household selected political settlement or cessation of violence seem to be very significant, a non-intuitive link. On the one hand, if a household selected political settlement, this has a positive and significant effect on level of satisfaction, whereas if the household selected cessation of violence we see a negative and significant effect. How could this be explained? Might the people who would like cessation of violence share an underlying systematic factor? Might the people who would like political settlement share an underlying factor? For example, perhaps mostly those who would like cessation of violence are Black African and as we know from the descriptive statistics and graphs above, Black Africans are more dissatisfied in general than any other population group. Other results that are interesting to point out are the negative and significant relationship between if someone in the household has been a victim of crime (understandably). Being employed (in this case the higher value indicated a not being employed) also is a positive and significant explanatory factor for satisfaction with life. Because implicit in using ordered logit we are saying that the dependent variable has ordered categories and that the underlying function is probabilistic. Thus, we cannot just talk in terms of a “one unit increase in one independent variable, leading to an increase or decrease in the dependent variable by the magnitude of that independent variable’s coefficient.” So rather than just being able to say, as we would with OLS, that a one unit increase in household education leads to a negative -.0287 drop in satisfaction with life. Of course this wouldn’t mean anything because as we know, satisfaction with life has five categories and what would it mean to drop by -.0287? There are quite a few ways to interpret ordered logit results. We have listed a few references below that can be of help in this regard. Here we present just a few ways of interpreting the results using an ado file that can be downloaded from the internet from within Stata. Without using an ado file, there are several built-in ways in Stata to do interpretation and you should become familiar with those too. We prefer the ado file because it streamlines the process and presents the results in a nice looking format. In the reference section below you will be able to find all of the necessary information you need to in the References section. Using the ado file in Stata, we present results from using “prtab” and looking at how the probability of being in any one of the five outcome categories changes as level of education changes. Notice that we have set some of the independent variables to particular values. This is a good thing to do when you have non-continuous variables as otherwise Stata simply takes the mean for this computation. We have not presented all of the output for space concerns. . prtab hheduc2, x(hhrace2=0 hhrace3=0 hhrace4=0 new_political2=1 new_peace2=1 crime_new=0 hhunempl2=1)
ologit: Predicted probabilities for satisfie3

Predicted probability of outcome 1

----------------------
  hheduc2 | Prediction
----------+-----------
        0 |     0.0313
        1 |     0.0317
        2 |     0.0322
        3 |     0.0327
        4 |     0.0331
        5 |     0.0336
        6 |     0.0341
        7 |     0.0346
        8 |     0.0351
        9 |     0.0356
       10 |     0.0361
       12 |     0.0372
       16 |     0.0394
----------------------

Predicted probability of outcome 2

----------------------
  hheduc2 | Prediction
----------+-----------
        0 |     0.1471
        1 |     0.1488
        2 |     0.1506
        3 |     0.1524
        4 |     0.1542
        5 |     0.1560
        6 |     0.1578
        7 |     0.1596
        8 |     0.1615
        9 |     0.1633
       10 |     0.1652
       12 |     0.1690
       16 |     0.1768
----------------------

Predicted probability of outcome 3

----------------------
  hheduc2 | Prediction
----------+-----------
        0 |     0.0807
        1 |     0.0813
        2 |     0.0820
        3 |     0.0827
        4 |     0.0834
        5 |     0.0841
        6 |     0.0847
        7 |     0.0854
        8 |     0.0861
        9 |     0.0867
       10 |     0.0874
       12 |     0.0887
       16 |     0.0914
----------------------

Predicted probability of outcome 4

----------------------
  hheduc2 | Prediction
----------+-----------
        0 |     0.3071
        1 |     0.3079
        2 |     0.3087
        3 |     0.3094
        4 |     0.3101
        5 |     0.3108
        6 |     0.3114
        7 |     0.3121
        8 |     0.3127
        9 |     0.3132
       10 |     0.3137
       12 |     0.3147
       16 |     0.3162
----------------------

Predicted probability of outcome 5

----------------------
  hheduc2 | Prediction
----------+-----------
        0 |     0.4338
        1 |     0.4301
        2 |     0.4265
        3 |     0.4228
        4 |     0.4192
        5 |     0.4155
        6 |     0.4119
        7 |     0.4083
        8 |     0.4047
        9 |     0.4011
       10 |     0.3975
       12 |     0.3904
       16 |     0.3762
----------------------
Taking education for example, we interpret these results as follows: Given a Black African household that choose both cessation of violence and political settlement, has not been a victim of crime and is unemployed, the probability of being very dissatisfied goes from .43 to .37 as the level of education goes up. This is still very high and it suggests that no matter what the level of education, Black African households fitting the criteria as described will be most likely very dissatisfied. If we look at the probability of being very satisfied, even with more education the probability does not change much from .03. You could then go through each variable of interest and use prtab to see how the predicted probability changed as the values changed. Looking at expectations of a new government, our expected hypotheses are not as well confirmed. Unemployment, total monthly income, and education are not statistically significant (though of course this does not mean that we drop them from the theory or future statistical analyses). The most determinant factors appear to be population group, province and if a person has been a victim of crime in a household. We will not go through the same process as above with predicted probabilities and marginal effects, but this is something you should definitely do. ologit new_gov3 hheduc2 totminc2 hhrace2-hhrace4 crime_new new_political2 new_peace2 hhprovince2-hhprovince14 hhunempl2;
Ordered logit estimates                           Number of obs   =       1355
                                                  LR chi2(22)     =     401.58
                                                  Prob > chi2     =     0.0000
Log likelihood = -1154.1347                       Pseudo R2       =     0.1482

------------------------------------------------------------------------------
    new_gov3 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     hheduc2 |  -.0132061   .0183919    -0.72   0.473    -.0492536    .0228413
    totminc2 |   -.030091   .0310992    -0.97   0.333    -.0910442    .0308623
     hhrace2 |   2.171511   .3347623     6.49   0.000     1.515389    2.827633
     hhrace3 |   1.170792   .5453856     2.15   0.032     .1018555    2.239728
     hhrace4 |   2.552456   .2220364    11.50   0.000     2.117273    2.987639
   crime_new |  -.3658462   .1899203    -1.93   0.054    -.7380832    .0063908
new_politi~2 |  -.0913094   .1547693    -0.59   0.555    -.3946516    .2120328
  new_peace2 |   .0107371   .1281773     0.08   0.933    -.2404859      .26196
 hhprovince2 |   1.709099   .3239736     5.28   0.000     1.074123    2.344076
 hhprovince3 |   .7182976   .2293962     3.13   0.002     .2686894    1.167906
 hhprovince4 |   .5881541   .3107656     1.89   0.058    -.0209352    1.197243
 hhprovince5 |   2.021277   .3569827     5.66   0.000     1.321604     2.72095
 hhprovince6 |   1.061478   .7644426     1.39   0.165    -.4368022    2.559758
 hhprovince7 |   .9768853   .7725143     1.26   0.206    -.5372149    2.490985
 hhprovince8 |  -.5524372    .791652    -0.70   0.485    -2.104047    .9991722
 hhprovince9 |   .1478828   .4450379     0.33   0.740    -.7243755    1.020141
hhprovince10 |   1.004283   1.297549     0.77   0.439    -1.538866    3.547432
hhprovince11 |    .849354   .3950246     2.15   0.032       .07512    1.623588
hhprovince12 |  -.0505895   .3507427    -0.14   0.885    -.7380325    .6368535
hhprovince13 |   .7803404   .7487557     1.04   0.297    -.6871939    2.247875
hhprovince14 |   .7144717   .4899447     1.46   0.145    -.2458023    1.674746
   hhunempl2 |   .1634522   .1617751     1.01   0.312    -.1536212    .4805256
-------------+----------------------------------------------------------------
       _cut1 |   .7079463   .5213763          (Ancillary parameters)
       _cut2 |   1.919961   .5244263 
------------------------------------------------------------------------------

You’ll note that we do not include either one of the occupation dummy variables in the statistical analysis. Check for yourself that including hhoccup3 and hhoccup3_2 does not change the results for new government and are not statistically significant. However, you will note that when including these in the satisfaction analyses, Stata cannot produce standard errors and thus we cannot say anything about the statistical significance of these variables. You might find it instructive to go back and look at the distribution of professionals and managers to see why Stata might have had a problem with giving meaningful results! You might also want to check for yourself how different the results would be if you had forgotten to use ordered logit and had instead used OLS! How different is it to fit ordered and categorical data to the OLS model (based on a normal curve) than to use the appropriate model based on a probability density function? CONCLUSION This is a very simple introduction to ordered logit and is meant to highlight just one of the different ways available to you to do statistical analysis. OLS is a very powerful means of estimation, and in fact, there are some cases when even with an ordered categorical variable using OLS is more efficient and thus better. With this module what we basically intended to show is an introduction to ordered logit and how all of the tools learnd thus far in Stata could be drawn upon to explore perceived quality of life. FURTHER REFERENCES OF INFORMATION First of all, as a rule NEVER UNDERESTIMATE what Stata itself can help you with, either via the “help” or “search” command. If you have access to the internet, typing in “search” plus the issue you need help with will bring up a whole host of helpful resources, including Frequently Asked Questions from Stata’s website and modules/examples, all available on-line. The ado files we used above to produce predicted probabilities were done using a resource found in this way.

Adler, E. Scott and Forrest D. Nelson. 1985. Linear Probability, Logit, and Probit Models. Delhi: Sage Publications.

Liao, Tim Futing. 1994. Interpreting Probability Models: Logit, Probit, and Other Generalized Linear Models. New Delhi: Sage Publications.

Long, J. Scott. Website. “SPost: Post-Estimation with Stata.” Retrieved December 6, 2003 from http://www.indiana.edu/~jslsoc/spost.htm. He is also the author of the ado file used above which you can find by typing in “net search prchange”. It is available for Stata 6, 7 and 8. If you do not want to do it with this, you can also download the Excel spreadsheets that do the same thing using your output from this website.

Stata. “help for fitstat.” Retrieved December 6, 2003 from http://www.indiana.edu/~jslsoc/stata/spostado/fitstat.hlp.

Stata Textbook Examples Applied Logistic Regression, 2nd Edition. “Chapter 8: Special Topics.” Retrieved December 6, 2003 from http://www.ats.ucla.edu/stat/stata/examples/alr2/alr2stata8.htm.