* Stata tutorial: basic commands for descriptive statistics and linear regression
# delimit ;
* The command use opens our dataset ;
use "C:\Users\AutoLogon\Desktop\orange.dta",clear ;
* Pre-formatted descriptive statistics tables;
summarize;
codebook;
* In order to customize your table, use the command tabstat ;
tabstat rev inc qty,stat(mean, median, variance);
* Construction of graphs: line graphs, histograms and scatter diagrams;
* In order to construct line graphs, we need first to indicate the variable
that indexes time ("time variable");
tsset year;
graph twoway tsline qty;
graph twoway scatter inc qty;
graph twoway histogram inc, freq;
graph twoway histogram inc, freq bin(4);
* Correlation matrix;
correlate;
correlate inc rev;
* Covariance matrix;
correlate, cov;
* We can construct new variables by using the command generate;
generate p = rev/qty;
generate lnq = ln(qty);
generate lnp = ln(p);
generate lninc = ln(inc);
generate lncuradv = ln(curadv);
generate lnaveadv = ln(aveadv);
* Estimation of the demand equation by ordinary least squares;
regress lnq lnp lninc lncuradv lnaveadv;
* The p-value for the overall F-test allow us to reject the joint
hypothesis that our explanatory variables are not statistically
significant. The coefficient of adjustment (R2) is extremely high:
approximately 96% of the variation in orange sales is explained
by the variation of the explanatory variables included in the model.;
* From the regression coefficient tables, we see that all variables except
the log of previous yearsÂ´ advertising are significant at 1% significance
level (since p-value is lower 0.01). The regression specification is in
log-log form, so we can interpret estimates in terms of elasticity: a 1%
increase in prices is associated to a 0,3% decrease in quantity sales,
a 1% increase in income is associated to a 0,78% increase in orange sales,
a 1% increase in advertising in the current year is associated to a 0,36%
increase in sales. On the other hand, there is no association between
advertising expenditures in previous years and current sales.;
* Stata OLS estimation options;
* Subsample estimation.
Suppose we want to exclude observations related to the First World War
period.;
regress lnq lnp lninc lncuradv lnaveadv if year <=1913 | year >=1919;
* Regression for the period 1920 - 1940;
regress lnq lnp lninc lncuradv lnaveadv if year>=1920 & year <=1940;
* Regression without constant;
regress lnq lnp lninc lncuradv lnaveadv, noconstant;
* OLS regression with robust standard errors (i.e., robust to heterocedasticity
problems);
regress lnq lnp lninc lncuradv lnaveadv, vce(robust);
* Regression diagnostics for outlier detection: "residual versus fitted value
graph";
rvfplot;
* We observe we have one observation with a residual higher than 0.3 (in absolute
value). This is an indication of a potential outlier in our database.;
* Hypothesis test for price-elasticity of demand equal to -1;
test lnp=1;
* We observe that the p-value for the test is much lower than 0.01. We
strongly reject the null hypothesis of unitary price elasticity.
(Note: to obtain the t-statistics, you could take the square root of the
F-value);