# delimit ;
* Code for exercise 1: analysis of endogeneity of the price variable in a demand equation.
Opening the database;
use "C:\Users\AutoLogon\Desktop\orange2.dta", clear;
* Creation of the log file to save our results;
log using "C:\Users\AutoLogon\Desktop\exercise1.log", replace;
* Construction of the variables to be used in the regression specification;
generate p = rev/qty;
generate lnp = ln(p);
generate lnq = ln(qty);
generate lninc = ln(inc);
generate lna = ln(curadv);
generate lntemp = ln(temp);
* Item 1: applying the Hausman test to assess the endogeneity of
the price variable.
We adopted the temperature as our instrument (i.e., excluded
exogenous variable).Intuition for the choice of instrument:
temperature (weather conditions) are correlated with market prices since it influcences supply conditions. Additionally, temperature
should not be correlated to the error of the demand regression.
First step: estimate an OLS regression of the potential
endogenous variable on all exogenous variables including the
instrument (lntemp) and obtain the residuals;
regress lnp lninc lna lntemp if year> 1919;
predict e, resid;
* Second step: estimate by OLS the original equation adding the residuals computed from the first step;
regress lnq lnp lninc lna e if year>1919;
* Decision rule
- if the coefficient associated to the residual is statistically
significant, we have evidence that the variable lnp is
endogenous and we should use a instrumental variable estimator.
In the presence of endogeneity, estimation by OLS would provide
biased and inconsistent estimators.
- if the coefficient of the residual is not significant, there is
no evidence of endogeneity and you may proceed applying OLS
estimation (OLS is unbiased and efficient when variables are
exogenous).
In our case, since the coefficient associated to the residual
is significant at the 5% level (p-value below 0.05), we have
evidence on the endogeneity of prices and we should adopt a
instrumental variable estimator.
Two-stage least squares regression;
ivregress 2sls lnq lninc lna (lnp=lntemp) if year>1919, vce(robust) ;
* Interpretation of regression coefficients:
- we obtain a price-elasticity of - 0.65: if prices increase by 1%, demand for oranges decrease by 0.65%.
- a 1% increase in income is associated to a 1% increase in orange demand (unitary income elasticity of demand).
- Impact of advertising on orange sales is null, since the
coefficient of lna is not significant.;
* Sometimes we are interested in assessing the results of the first stage. This is particularly important if our model is just identified and we suspect that our instrument is weak (in our case,
correlation between lntemp and lnp close to zero). If the number
of (excluded) instruments is equal to the number of endogenous
variables, we do not have a formal statistical test to check the
validity of instruments. But we can check the statistical
significance of the instrument in the firs-step regression.;
ivregress 2sls lnq lninc lna (lnp=lntemp) if year>1919, vce(robust) first;
* Since the coefficient associated to lntemp in the first-stage regression is not singificant, we have evidence that our instrument is weak;
* A more formal test for the validity of our instruments is the Sargan overidentification test. In order to implement the Sargan test, we should have a overidentified model (number of instruments > number
of endogeneous regressors). We will use the square of lntemp as
an additional instrument to create an overidentified model.;
generate temp2 =temp^2;
ivregress 2sls qty inc curadv (p=temp temp2) if year>1919, vce(robust);
estat overid;
* Since we do not reject the null hypothesis (p-value close to 0.14), the Sargan test validates the choice of our instrument.;
* A cautionary note on estimating 2SLS by running OLS recursively.
First stage regression;
regress lnp lninc lna lntemp, vce(robust);
predict lnp_hat, xb;
* Second stage regression: we estimate the original equation by
replacing the endogenous variable lnp by its predicted value
from the first stage;
regress lnq lnp_hat lninc lna, vce(robust);
* Comparing with the ivregress command;
ivregress 2sls lnq (lnp = lntemp) lninc lna, vce(robust);
* The difference between ivregress command and estimating OLS
recursively is on the computed standard error. The recursive OLS procedure does not take into account that we are using predicted values from previous stages. So we get biased standard errors. Always
use the ivregress command.;
log close;