# delimit ;
* Exercise 9: assessing householdsĀ“ choice regarding heating technology;
* Opening the database. Use the option "firstrow" to indicate that the first row will provide the variable names. The option "sheet" is important when you have several sheets in your Excel file. In our case it is redundant, since we have just one sheet.;
import excel using "C:\Users\AutoLogon\Desktop\heating.xlsx", firstrow sheet("Plan1") clear;
log using "C:\Users\AutoLogon\Desktop\exercise9.log", replace;
* First, we use the command tabulate to have a first look at the sample
distribution of the technology;
tabulate heatingchoice;
* We notice that centralized gas is the prevailing heating choice (63.67% of the households), followed by decentralized gas (14.33%).;
* For a first look at the characteristics that affect heating choice, letĀ“s compute the descriptive statistics of the variables;
summarize ic_gascentral ic_gasroom ic_electcentral ic_electroom ic_heatpump oc_gascentral oc_gasroom oc_electcentral oc_electroom oc_heatpump income age rooms ncoast scoast mountn valley;
* We notice that centralized technologies have lower installation costs
when compared to centralized ones. We also see that gas technologies have a clear advantage in terms of operation costs. These features are consistent with the prevalence of centralized gas choices.;
* Estimating the multinomial logit model for the heating technology choice;
mlogit heatingchoice ic_gascentral ic_gasroom ic_electcentral ic_electroom ic_heatpump oc_gascentral oc_gasroom oc_electcentral oc_electroom oc_heatpump income age rooms ncoast scoast mountn valley;
* Remember that we cannot interpret the coefficients from a multinomial model in terms of magnitude neither in terms of sign. With the option "rrr", Stata provides us the exponentiated values of these coefficients. These exponentiated terms may be interpreted in terms of the marginal effects on the probability ratios (i.e., the marginal effects of the relative risk rates).;
mlogit heatingchoice ic_gascentral ic_gasroom ic_electcentral ic_electroom ic_heatpump oc_gascentral oc_gasroom oc_electcentral oc_electroom oc_heatpump income age rooms ncoast scoast mountn valley, rrr;
* Analyzing the relative risk ratio with respect to tehcnology 4 (decentralized electricity), we observe that a marginal increase in the installation cost of decentralized electricity will reduce the odds ratio of choosing decentralized electricity with respect to choosing centralized gas (our base outcome), since the original probability ratio would be multiplied by 0.996. Similarly, an increase in the operation costs of the decentralized electricity system will decrease the odds ratio of choosing decentralized electricity with respect to central gas (the new probability ratio would be 0.994 times the original one).;
* Remember that the denominator of the odds ratio will always be the base outcome. So, if you want to change the reference outcome, you should declare it as an option. For example, suppose we want to assess the odds ratio with respect to heating pump (heatingchoice=5);
mlogit heatingchoice ic_gascentral ic_gasroom ic_electcentral ic_electroom ic_heatpump oc_gascentral oc_gasroom oc_electcentral oc_electroom oc_heatpump income age rooms ncoast scoast mountn valley, rrr baseoutcome(5);
* In order to assess the marginal impact of the variables on the probability of choosing a certain technology choice, we use the margins command. Remember that we have to declare which outcome we want to analyze.;
margins, dydx(*) predict(outcome(1));
* We notice we have only two statistical significant marginal effects.
A one dollar increase in the installation cost of centralized gas systems will reduce the probability of adopting this technology by 0.05 pencentage points. In addition to that, people that lives in the North Coast of California have a higher probability (11.6 percentage points) of adopting a centralized gas system than people living in the Central Valley.;
* If we want to assess a different technology alternative, we have to declare in the baseoutcome option. For example, suppose we are interested in the marginal impact of adopting a centralized electricity
technology (outcome 3);
margins, dydx(*) predict(outcome(3));
* We notice that if we increase operation costs of centralized electricity by USD 1 the probability of adopting this technology decreases by 0.03 percentage points.;
* Computing the estimated probabilities of adopting each technology.;
predict ptech1 ptech2 ptech3 ptech4 ptech5, pr;
* In order to assess the model fit, we will adopt the standard classification criteria: model selects the techology associated to the
highest estimated probability.;
generate estim_choice=0;
replace estim_choice=1 if ptech1==max(ptech1, ptech2, ptech3, ptech4, ptech5);
replace estim_choice=2 if ptech2==max(ptech1, ptech2, ptech3, ptech4, ptech5);
replace estim_choice=3 if ptech3==max(ptech1, ptech2, ptech3, ptech4, ptech5);
replace estim_choice=4 if ptech4==max(ptech1, ptech2, ptech3, ptech4, ptech5);
replace estim_choice=5 if ptech5==max(ptech1, ptech2, ptech3, ptech4, ptech5);
tabulate estim_choice;
* We notice that the model has concentrated the predictions on two technologies: centralized gas (99.67% of total predictions) and decentralized electricity (0.33% of total predictions);
* Finally, we compare the predictions of the model with the observed choices in order to evaluate the fit of our statistical model.;
tabulate estim_choice heatingchoice;
* We observe that the overall prediction of the model is quite reasonable: the model correctly predicts 573 out of the 900 technology choices. On the other hand, we can also observe that the good performance is concentrated on the technology with the highest sample frequency (in our case, centralized gas). The model does not perform well for the other for alternatives. This is a numerical result similar to the one observed in binomials: models tend to have a good performance for high frequency alternatives.;
log close;