Hand Coding Instumental Variables

Hand Coding Instumental Variables

Hand Coding Instumental Variables

In a previous post we discussed the linear model and how to write a function that performs a linear regression. In this post we will use that linear model function to perform a [Two-Stage Least Squares estimation]. This estimation allows us to […]

Recall that we built the follow linear model function.

{% highlight r %} ols <- function (y, X, intercept=TRUE) { if (intercept) X <- cbind(1, X) solve(t(X)%%X) %% t(X)%*%y # solve for beta } {% endhighlight %}

We used the following data.

{% highlight r %} data(iris) x1 <- iris$Petal.Width x2 <- iris$Sepal.Length y <- iris$Petal.Length {% endhighlight %}

This allowed us to estimate a linear model.

{% highlight r %} X <- cbind(x1, x2) ols(y = y, X = X) {% endhighlight %}

{% highlight text %}

[,1]

-1.5071384

x1 1.7481029

x2 0.5422556

{% endhighlight %}

Which includes the intercept, since the default value it TRUE (see function definition above), we could estimate it without an intercept using.

{% highlight r %} ols(y = y, X = X, intercept = FALSE) {% endhighlight %}

{% highlight text %}

[,1]

x1 1.9891655

x2 0.2373159

{% endhighlight %}

Having revisited the above, we can continue with instumental variables. We will replicate an example from the AER (Applied Econometric Regressions) package.

{% highlight r %} library(AER) data(“CigarettesSW”) rprice <- with(CigarettesSW, price/cpi) tdiff <- with(CigarettesSW, (taxs - tax)/cpi) packs <- CigarettesSW$packs {% endhighlight %}

We first need to obtain our first stage estimate (putting the whole function between parentheses allows us to both write it to the object s1 and print it).

{% highlight r %} ( s1 <- ols(y = rprice, X = tdiff) ) {% endhighlight %}

{% highlight text %}

[,1]

91.103739

X 4.163002

{% endhighlight %}

We can now obtain the predicted (fitted) values

{% highlight r %} Xhat <- s1[1]*tdiff + s1[2] {% endhighlight %}

Using these fitted values, we can finally estimate our second stage.

{% highlight r %} ols(y = packs, X = Xhat) {% endhighlight %}

{% highlight text %}

[,1]

126.89145456

X -0.04658554

{% endhighlight %}

Now compare this to the results using ivreg() fucntion from the AER package.

{% highlight r %} ivreg(packs ~ rprice | tdiff) {% endhighlight %}

{% highlight text %}

Call:

ivreg(formula = packs ~ rprice | tdiff)

Coefficients:

(Intercept) rprice

219.576 -1.019

{% endhighlight %}

When we compare this estimate to the estimate from the linear model, we find that the effect of the price is significantly overestimated there.

{% highlight r %} lm(packs ~ rprice) {% endhighlight %}

{% highlight text %}

Call:

lm(formula = packs ~ rprice)

Coefficients:

(Intercept) rprice

222.209 -1.044

{% endhighlight %}

We can also do this using R’s built in lm function.

{% highlight r %}

first stage

s1 <- lm(rprice ~ tdiff)

estimate second stage using fitted values (Xhat)

lm(packs ~ s1$fitted.values) {% endhighlight %}

{% highlight text %}

Call:

lm(formula = packs ~ s1$fitted.values)

Coefficients:

(Intercept) s1$fitted.values

219.576 -1.019

{% endhighlight %}

As an intermediate form, we can manually calculate Xhat (fitted.values) and estimate using that.

{% highlight r %}

manually obtain fitted values

Xhat <- s1$coefficients[2]*tdiff + s1$coefficients[1]

estimate second stage using Xhat

lm(packs ~ Xhat) {% endhighlight %}

{% highlight text %}

Call:

lm(formula = packs ~ Xhat)

Coefficients:

(Intercept) Xhat

219.576 -1.019

{% endhighlight %}

Note that if you estimate a TSLS using the lm function, that you can only use the coefficients, the error terms will be wrong.

This post is licensed under CC BY 4.0 by the author.