# delimit ;
* Code for estimating exercise 8: multinomial model specification in order
to analyze the choice between four cracker brands;
* First, we should import the data from a Excel spreadsheet (xlsx format). We need to indicate that the first row of the Excel spreadsheet contains the variable names (firstrow option). If we do not indicate this, Stata will treat the row as the values for the first observation).;
import excel using "C:\Users\AutoLogon\Desktop\chapter5_multinomial.xlsx", firstrow clear;
log using "C:\Users\AutoLogon\Desktop\exercise8.log", replace;
* Construction of the dependent variable - branch choice coding;
generate choice = 0;
replace choice = 1 if nabisco==1;
replace choice = 2 if private==1;
replace choice = 3 if keebler==1;
replace choice = 4 if sunshine==1;
* Descriptive statistics for the depende variable: frequency of each brand choice (sample market share);
tabulate choice;
* We observe that brand Nabisco has the largest share in our sample (54.4% of the total purchase);
* Construction of relative prices (normalized by Nabisco);
generate relpr_private = priceprivate/pricenabisco;
generate relpr_keebler = pricekeebler/pricenabisco;
generate relpr_sunshine = pricesunshine/pricenabisco;
* Estimation of the multinomial logit model;
mlogit choice relpr_private relpr_keebler relpr_sunshine displprivate displsunshine displkeebler displnabisco featprivate featsunshine featkeebler featnabisco featdisplprivate featdisplsunshin featdisplkeebler featdisplnabisco;
* Obervation 1: by default, Stata uses the alternative with highest sample frequency (in our case, Nabisco) as the base outcome. If we want to change the outcome of reference, we should specify the outcome code in the option baseoutcome (.). Illustration: changing the reference for
brand Keebler (choice = 3);
mlogit choice relpr_private relpr_keebler relpr_sunshine displprivate displsunshine displkeebler displnabisco featprivate featsunshine featkeebler featnabisco featdisplprivate featdisplsunshin featdisplkeebler featdisplnabisco,
baseoutcome(3);
* Observation 2: in terms of marginal effects, nothing can be said from the estimated coefficients. When estimating binomial models, we could interpret the coefficient signs. In the multinomial case, we cannot do it anymore.
However, if we take the exponential of the coefficient, we can intrepret the exponentiated values as the marginal effect of a change of the variable on the probability ratio (or on the relative risk rate);
mlogit choice relpr_private relpr_keebler relpr_sunshine displprivate displsunshine displkeebler displnabisco featprivate featsunshine featkeebler featnabisco featdisplprivate featdisplsunshin featdisplkeebler featdisplnabisco,
rrr;
* The coefficients exhibited in the RRR table can be interpreted as the
effect of a marginal variation of the associated variable on the
probability ratio of choosing alternative j with respect to the alternative we consider our reference outcome (in this case, Nabisco). For example, if the relative price of Private with respect to Nabisco (variable relpr_private) increases by one unit, the odds ratio (i.e., the ratio of the probability of buying Private with respect to the probability of buying Nabisco) would be 0.092 times the original probability. This is to say that the odds ratio of buying Private compared to buying Nabisco will decrease considerably: it will be only 9.2% of the original odds ratio.
* Observation 3: rrr > 1 means that the marginal variation in the variable increases the probability ratio (that is, it increases the probability of choosing alternative j with respect to the probability choosing the base
outcome), while rrr < 1 means that it decreases the probability ratio.;
* Suppose now we are interested in evaluating the rrr with respect to the choice of Keebler. In this case, we have to change the base outcome in order to interpret rrr as the probability ratio regarding the choice of Keebler. In addition to that, we should adjust our relative prices to facilitate the interpretation of our results.;
generate relpr_nabisco2 = pricenabisco/pricekeebler;
generate relpr_private2 = priceprivate/pricekeebler;
generate relpr_sunshine2 = pricesunshine/pricekeebler;
mlogit choice relpr_nabisco2 relpr_private2 relpr_sunshine2 displprivate displsunshine displkeebler displnabisco featprivate featsunshine featkeebler featnabisco featdisplprivate featdisplsunshin featdisplkeebler featdisplnabisco,
rrr baseoutcome(3);
* We observe that a marginal increase in the relative price of Nabisco with respect to the price of Keebler will reduce the probability ratio of buying Nabisco with respect to Keebler. Specifically, the original probability ratio will be multiplied by 0.02 (that is to say it will correspond to only 2% of the original probability ratio);
* We can also estimate the absolute marginal effect of the variation of a certain variable of the choice probabilities. For example, suppose we are interested in analyzing the marginal impacts on the choice of Nabisco
(choice =1);
mlogit choice relpr_private relpr_keebler relpr_sunshine displprivate displsunshine displkeebler displnabisco featprivate featsunshine featkeebler featnabisco featdisplprivate featdisplsunshin featdisplkeebler featdisplnabisco;
margins, dydx(*) predict(outcome(1));
* We observe that a marginal increase in the relative price of Private with respect to Nabisco will increase the probability of buying Nabisco by 39.4 percentage points. A marginal increase in the relative price of Keebler would increase the probability of buying Nabisco by 46.8 percentage points (similar interpretation for the other variables);
* Finally, we analyze the fit of our model. That is to say, how well our model is predicting consumer choices. We adopt a three step procedure.
Step 1: compute the probabilities associated to the choice of each alternative;
predict pchoice1 pchoice2 pchoice3 pchoice4, pr;
* First assessment of the fit of the model;
summarize pchoice* nabisco private keebler sunshine, separator(4);
* (The symbol * is shortcut to include all variables starting with the
expression pchoice. The separator option is a device to organize the separation of variables in your table);
* We observe that the average estimated probabilities are very close the observed sample frequencies. But letÂ´s have a closer look at the matching of the model.;
* Step 2: we define the model prediction as the brand with the highest estimated probability.;
generate choicehat = 0;
replace choicehat=1 if pchoice1==max(pchoice1, pchoice2, pchoice3, pchoice4);
replace choicehat=2 if pchoice2==max(pchoice1, pchoice2, pchoice3, pchoice4);
replace choicehat=3 if pchoice3==max(pchoice1, pchoice2, pchoice3, pchoice4);
replace choicehat=4 if pchoice4==max(pchoice1, pchoice2, pchoice3, pchoice4);
* Step 3: finally, we construct a table to compare the observed consumer choices with the predictions of the model;
tabulate choicehat choice;
* We observe that the model predicts correctly 1,882 out of the 3,292 consumer choices (1,587 correct predictions for Nabisco, 270 for Private and 25 for Sunshine).
We also observe that the model does not predict any choice of Keebler (choicehat=3). That is why "line 3" of the table was omitted.;
log close ;