Weighted Least Squares and Why More Data is Better

In doing statistics, when should we weight different observations differently?

Suppose I have 10 independent observations of $x$ and I want to estimate the population mean, $\mu$. Why should I use the unweighted sample mean rather than weighting the first observation .91 and each of the rest by .01?

Either way, I get an unbiased estimate, but the unweighted mean gives me lower variance of the estimator. If I use just observation 1 (a weight of 100% on it) then my estimator has the variance of the disturbance. If I use two observations, then a big positive disturbance on observation 1 might be cancelled out by a big negative on observation 2. Indeed, the worst case is that observation 2 also has a big positive disturbance, in which case I am no worse off by having it. I do not want to overweight any one observation, because I want mistakes to cancel out as evenly as possible.

All this is completely free of the distribution of the disturbance term. It doesn’t rely on the Central Limit Theorem, which says that as $n$ increases then the distribution of the estimator approaches the normal distribution (if I don’t use too much weighting, at least!).

If I knew that observation 1 had a smaller disturbance on average, then I *would* want to weight it more heavily. That’s heteroskedasticity.

2 Responses to “Weighted Least Squares and Why More Data is Better”

  1. Max Robinson Says:

    Heteroskedasticity — an eight syllable word for varying variance. Wow, that must be the 100 million dollar word of the day!

  2. admin Says:

    Yes, it’s a favorite of first-year econ PhD students. The adjectival form sounds good too, in a different way– it’s heteroskedastic.

Leave a Reply


Bad Behavior has blocked 772 access attempts in the last 7 days.