INSTRUMENTAL VARIABLES WITHOUT ENDOGENEITY OR INFINITE SAMPLES
I just realized that instrumental variables could be useful in a context
without endogeneity or large samples: to add datapoints.
Suppose
y = beta X + U, t= 1,…T
We use IV if x and u are correlated. Then, the problem is that the X we have
are contaminated, and need to be replaced by Z, which is correlated with it. If
we use Z instead, we get a consistent estimator, good in infinite samples.
Another problem is if we don’t have many observations on X, or if X does not
vary much in the observations we have. If we had infinite observations, this
would not be a problem.
We can’t solve that problem by just replacing the X’s we have with Z’s. Our
current data is not contaminated. It just doesn’t have enough info. Z would be
even worse info.
Suppose, though, that we have another set of data, n = 1,…N with observations
on Z and X but not on Y. Maybe we can use Z’s relationship with X to make use of
the N data. This would help in finite samples, I think.
That is the same idea as imputing for missing data. It is just that I use a
variable completely outside the regression equation to do the imputing. And I do
have a theoretical rationale for the variable– not just picking allt hat are
included, for example.
A problem is that if X doesn’t vary much in our original data, it won’t give us
much variation to estimate its correlatoin with Z in the supplemental data. But
maybe X and Z have a very strong correlation and little error, unlike X and Y.